Predict Stock Prices with an AI - Machine Learning in PHP

Currently I am experimenting and playing with Artificial Intelligence (AI). Although PHP is not the ideal language to build an AI, it is possible and works quite well. In this example I will predict the closing prices (5.30pm) of the German DAX30 companies at 4pm. Of course you can use NASDAQ as well. In doing so I get a recommendation for action (buy / sell) and see my expected return. Then I trade CFDs with a high leverage and ge rich ;)

Aktienvorhersage mit PHP AI

My database is based on all DAX30 prices at 5 minute intervals from the last years. I have the data since 2015 (> 2,000,000 records) - but the training lasts forever... For the first tests we therefore use significantly less data (~ 40,000 records).

How To Crawl Stock Prices in Real Time

Getting the AI up and running is very fast an simple. It is done in 3 lines of code right away:

  1. Defining variables
  2. Training
  3. Prediction

That's it. It is much more difficult to obtain the data nad to bring it into an appopriate format.

If you have the money you can buy access to an API to get the data. Since I do not want to (and firstly I need to generate the money with this AI ;)), I've written a tiny script, that retrieves the prices every 5 minutes of finanzen.net (a German stock page). You can find the real time stock prices of the DAX30 here: DAX30 Real Time Prices

This page let's you crawl the data quite well and you can fetch the data like this:

$html = file_get_contents('http://www.finanzen.net/aktien/DAX-Realtimekurse');
$table = explode('<table class="table table-vertical-center">',$html); 
$table = explode('</table>',$table[1]);
$table = explode('</thead>',$table[0]);
$rows = explode('<tr>',$table[1]);
unset($rows[0]);
$rows = array_slice($rows,0,31);
foreach($rows AS $row){
	$cols = explode('',$row);

	$name = utf8_encode(strip_tags($cols[1]));
	$lastday = toNumber(strip_tags($cols[3]));
	$bid = toNumber(strip_tags($cols[4]));
	$ask = toNumber(strip_tags($cols[5]));
	$percent = toNumber(strip_tags($cols[6]));
	$sql = "INSERT INTO dax30(name,lastday,bid,ask,percent,timestamp)VALUES(?,?,?,?,?,NOW())";
	$stmt = $db->prepare($sql);
	$stmt->bind_param('sdddd',$name,$lastday,$bid,$ask,$percent);
	$stmt->execute();
	$stmt->close();

	echo $name.': Last Day: '.$lastday.', Bid: '.$bid.', Ask: '.$ask.', Percent: '.$prozent."";
}
function toNumber($n){
	return trim(str_replace('%','',str_replace(',','.',str_replace('.','',$n))));
}

That's it. Although it is not the cleanest code, it works good enough. The toNumber() function interprets the input value as a number - no matter what. Now set up a cronjob to load the page every 5 minutes. You will get loads of data very fast. Keep in mind that you shouldn't crawl on weekends and not after the stock market is closed (in Germany it is open from 9am - 5:30 pm).

Install PHP-ML - The easy Way

As AI we use the PHP-ML library. You can find the sourcecode at Github und here the documentation. Install it fast and clean via composer (or download + unzip from Github):

composer require php-ai/php-ml

Prepare your data

Now that we have enough data (it's been about 2 weeks), we will fetch the data from the database and pack it into a large multidimensional array. We are primarily intereseted in the "percent" column. That it the change in percent compared to the closing price from the day before (lastday). We do it like this:

$sql = "SELECT percent,DATE_FORMAT(timestamp,'%d.%m.%Y'),DATE(timestamp),name FROM dax30 WHERE TIME(timestamp) >= '09:00:00' AND TIME(timestamp) < '17:36:00' ORDER BY name ASC, timestamp ASC";
$stmt = $db->prepare($sql);
$stmt->execute();
$stmt->bind_result($percent,$date,$date2,$name);
$stmt->store_result();
$i = 0;
$prev = '';
$prevD = '';
while($stmt->fetch()){
	if($prev != $name || $date != $prevD){
		$i = 0;
		$prev = $name;
		$prevD = $date;
	}

	if($i < $maxInput) // $maxInput should be the number of today's entries
		$stocks[$name][$date]['price'][] = $percent;
	$stocks[$name][$date]['closing'] = $percent;
	$i++;
}
$stmt->free_result();
$stmt->close();

Since we know on the day of the forecase the prices only up to the current time, the AI should only train with the data to this time from the previous day.

Let's re-format the array so we can use it for the AI. The AI needs two arrays: The TrainingSet and the ResultSet. The TrainingSet contains all prices of the day until 4pm. Whereas the ResultSet contains the respective closing price at 5.30pm.

foreach($stocks AS $name=>$stock){
	$today[$name] = array_pop($stock); // We skip the current day obviously
	foreach($stock AS $data=>$stockDay){
		$trainingsSet[] = $stockDay["price"];
		$resultSet[] = $stockDay["closing"];
	}
}

Which estimator is the right one?

If you look up the documentation of the AI, you will find that there are many different estimators (= estimation function of the AI) available. There is classification, regression, clustering, etc. In addition to preparing the data, the right choice of the estimator is fundamental. Either you try each or you are smart and more specific:

Estimator Cheet Sheet PHP-AI

Estimator CheetSheet - Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

Classification

Classification is used to classify specific inputs. Obviously. To be serious: Classification is used to assign inputs to a group. Example: You have different colors as input and want the AI to learn if the color is light or dark.

Clustering

Clustering is used to assign the inputs to different groups. For example you could use different characteristics of people (age, gender, income, ...) as inputs and the output could be the persons education.

Regression

A regression attempts to describe or predict relationships on a quantative basis. The inputs and outputs are quantitative values. For example: stock prices! Yeah =)

That doesn't help me much

Yes, I know and understand. Luckily scikit-learn.org has a flowchart for determining the right estimator. This makes it relativley easy to find the most suitable estimator for your project.

In PHP-ML there are only 2 different types of regression: LeastSquares and SVR. Looking at your CheetSheet above, SVR seems to fit quite well.

PHP-ML: Train the AI

Now that we have the data prepared, we can start training the AI. As I mentioned abouve PHP is not necessarily the fastest and best language for machin learning. Try not to feed too much data to it. For 2,000,000 records it needed almost 4hours to train.

Initialize PHP-ML

We start be defining the regression using the kernel "POLYNOMIAL" (Linear, RBF and SIGMOID did not deliver good values):

require_once '/var/www/vendor/autoload.php';
use Phpml\Regression\SVR;
use Phpml\SupportVectorMachine\Kernel;
use Phpml\ModelManager;

$regression = new SVR(Kernel::POLYNOMIAL, 3, 0.1, 10,0.3,0.1,3,200,true); 

The variables in new SVR() have given me the best results. Of course you can change them and test with different values.

Train with PHP-AI / PHP-ML

Oh yeah, here we go! Let's train! For what happens now, the code is super unspectacular:

$regression->train($trainingsSet, $resultSet);

That's all. Since we have prepared the data so beautiful above, we don't need anything more to do here. Somehow one would have expected more... How much magic happens in this one line of code you will notice at the runtime of the script. The training part is demanding and takes a while depending on the amount of test data.

Save the training for faster forecasts

This step is optional and worthwhile only with very large data sets. The nice thing about the PHP AI is the ability to store your training. Meaning that we don't have to re-train for every prediction. We can just open the saved training later and make the prediction based on that. It saves a lot of time and you can compare different settings fast and easily:

$filepath = VENDOR.'php-ai/php-ml/var/model.data';
$modelManager = new ModelManager();
$modelManager->saveToFile($this->regression, $filepath);

To load the training later, we just need to load the model.data file:

$filepath = VENDOR.'php-ai/php-ml/var/model_long.data';
$modelManager = new ModelManager();
$regression = $modelManager->restoreFromFile($filepath);

Predicting with PHP-ML

To predict today's closing price, we'll need the current day's prices (we've saved in $today). Now we hope that our predicted closing price is as close as possible to the real closing price:

foreach($today AS $name=>$predict){
	$predicted = round($regression->predict($predict["price"]),2);
	$predictedPerformance = $predicted - $predict["price"][(count($predict["price"])-1)];
	$price = $predict["price"][count($predict["price"])-1];
	echo $name.": Price: ".$price.", Predicted: ".$predicted.", Performance: ".$predictedPerformance."
"; }

The output should look something like this:

Aktienvorhersage mit PHP-AI

In other words, for all values with predicted < 0, you should sell at CFD traiding and buy at > 0. To make it more clear and beautiful I have the prediction saved in the database. I made a overview updating all values every 5 minutes:

Übersicht Aktienvorhersage PHP-AI

This overview shows me exactly which trade is worthwile and how I should trade. If the predicted performance is < 0.6%, I won't get a recommendation. If it is greater than 5% "Check" will appear as this is very unlikley and the AI has probably messed that up. In real mode you should choose a performance < 2.5% or so.

Conclusion

Unfortunately this will hardly make any maoney. I built this primarily to get to know AIs and machine learning. The PHP-AI / PHP-ML library is pretty neat and works really well. However, the regression estimator is not 100% suitable but gives correct trends in more than 60% of the cases. The more records you use, the more the tendencies go in one way and the extremes get much bigger. Instead of -0.46 you get something like -7.8. This is not very useful in our application...

The whole thing gets more interesting and much better with a Recurrent Neual Network, such as am LSTM Netowrk (Long Short-Term Memory Netowrk). Thus, it is possible that the AI - like a human - makes meaningful decisions in the light of past and past events. As far is I know, PHP does not have a ready-to-use library and probably never will. TensorFlow (Google's AI) it is possible... using Python instead of PHP ;)

Cheers, da Hansi

Weitere Artikel:

2018-dbs.jpg

Dual Battery System

A completely different topic: Independent of the shore power with a dual battery system and solar system. In this guide I explain step by step what to look for and how to build such a DBS. Simple and understandable.

0 Kommentare

Kommentar verfassen
Ersten Kommentar abgeben
Name:
E-Mail
Bewertung Qualätit:
Bewertung Lesbarkeit:
Bewertung Inhalt:
Überschrift:
Kommentar:
www.godlike.de

Diese Webseite verwendet Cookies, damit wir die Inhalte und Funktionen der Webseite optimal gestalten können. Durch die weitere Nutzung der Webseite stimmen Sie der Nutzung von Cookies zu.