Time Series Forecasting: Predicting Bitcoin Prices with Machine Learning

Bitcoin price over a ten year period

Time series data is just what it sounds like: a stream of data across time. A simple example might be rainfall per time period (year, week, etc), another would be sales per quarter in a business, later we’ll be looking at bitcoin prices over time. I don’t need to explain the possible benefits if we could predict these values ahead of time, but how would we go about that?

For something simple like rainfall we could probably just take the average of previous data, rainfall doesn’t change much year to year in a single location so the average of previous years is probably a pretty good guess. That approach wouldn’t work for a value that’s changing over time. A company’s sales per quarter should be growing every quarter (if it’s not they’re in trouble), for that case we can fit a trend line to forecast likely future sales, a simple linear regression can give us pretty good results because often if something like sales is growing the rate of growth will stay relatively constant. Lots of companies sales depend on time of year too, they might sell more right before Christmas every year, that’s called seasonality, when there’s a variance in the data due to a repetitive or seasonal change. A linear regression can’t capture seasonality very well, but there are other techniques like Holt-winter’s for instance that can capture that data pretty well and make pretty good predictions. But a company’s sales over time is still relatively simple, it has a general predictable trend and some degree of seasonality, let’s look at something more complicated, in this case bitcoin.

price of bitcoin is much harder to predict. There’s certainly no clear trend line. There may be seasonality but it’s definitely not obvious and if there is what interval it would take place. Seemingly random spikes and dips. Using any of the previous methods would yield quite frankly terrible results. That’s where Recurrent Neural Networks come in. One of the best methods for forecasting time series especially complex ones.

Unfortunately before we can get there we have to figure out what data we can use and get it in the form we want. For the source I used Bitstamp data over roughly ten years, where each minute was recorded. The set contains 8 data features for each point in time. In order to get that dataset into a usable format we must do a preprocessing step.

Sample of the Bitstamp data (there’s another couple million lines)

Each minute was recorded, but there wasn’t always a transaction during each minute, so many of the spots are empty, Neural networks aren’t suited well to deal with empty or missing data, so that’s the first thing we should fix. We need to fill in those values that are missing in some way, a.k.a. data interpolation. There are many ways to do that, the simplest is to just forward fill the data, so that each empty space is just filled in with the most recent data we have. That works pretty well and ifs often preferable for prices like bitcoin because the price doesn’t move until a transaction occurs so in that sense it more accurately captures the data. However just forward filling the data doesn’t smooth the data at all, there are still jumps from one price to another. That’s not a problem in itself, jumps in price are what’s really happening after all, but when using an RNN, we can sometimes get better results in training with a smooth dataset. So instead of forward filling I used a linear interpolation, which fills in the data with a simple linear function of the closest two data points. Another advantage of linear interpolation arises when you start looking at derivatives, but we won’t be looking at that today because we already accomplished our goal of getting rid of the empty values.

Now that we have a full dataset we can start to look at the individual features in it. you can see from the snippet above there are 8 fields: Timestamp, open, high, low, close, volume (in bitcoin), volume (in USD), and weighted price. We need o determine if each of these categories is actually useful for our prediction. A straightforward approach is to take a look at the correlations of each field with the others, via a correlation matrix. Since our data is in a Pandas datafram we can use the corr() function to get the matrix.

The number are the correlation between the fields from the top and left. Each field obviously has a perfect, 1, correlation with itself. But, several fields also correlate perfectly with each other. Open, high, low, close, and weighted average are all the same. Since they’re correlated so highly we won’t be gaining any extra information by including them all. Therefore we can completely drop those columns form the data. Additionally we’re using an RNN, RNNs take in data for each time step, because of that we don’t need the timestamp field because it’s just a list of minutes that increases linearly and won’t aid our prediction. Therefore we can also drop the timestamp column.

At this point we have the three remaining columns, volume (btc) volume(usd), and weighted price, and none of these fields are missing data. Each of those fields is unique and will help our prediction, which means we are pretty much done preprocessing our data, with an exception. The data varies a lot in absolute value and grows a lot over time, for instance weighted price starts at 4 dollars, but ends up many thousands of times higher. For input data in a neural network model it will help to normalize the data. To normalize we get the mean and standard deviation of each field, then for each value in the field subtract the mean followed by dividing by the standard deviation. (This step will greatly aid our accuracy, and can be performed here or when we split the data into training and validation sets.)

At this point we have the data we want in the form we want, but it’s not in a way we can feed it to a network model yet. In fact we haven’t discussed our model yet at all. We’ll be looking at a Tensorflow Keras recurrent neural network model. Recurrent networks can use different types of cells, there are some tradeoffs but generally the best is an LSTM, long short term memory network. If you’re unfamiliar with LSTMs or the math behind these concepts that’s ok, we’ll only be looking at use here. In Keras we can make a sequential model, and use LSTM() layers for each step, and then one dense layer at the end to make our prediction. That means our model will take in a sequence of data as input data, and a single data point as an output/prediction (the prediction is basically the label for that sequence).

For bitcoin we want to take in 24 hours of data and then predict the next hour price. Since our dataset is in minutes our input sequence will be a list of 1440 consecutive minutes (minutes in a day) and our output will be the weighted price of the next hour (the last minute of the hour following our 1440 minutes). As an example, the first sequence minutes will look like [1, 2, 3, … 1439, 1440] and the output minute will be [1500], then our next sequence will be he next hour: [61, 62, 63, … 1499, 1500] and the next label will be [1560].

Since we’re using Tensorflow we want to make out dataset from earlier into a tensorflow.data.Dataset in the shape of those input/output sequences. We need to make a custom function for this process that takes in a dataset and changes it into the right format, luckily this function isn’t too hard because we can use Tensorflow’s ‘keras.preprocessing.timeseries_dataset_from_array()’ This will take our dataset and change it into many smaller sequences, you can picture it as if we slide a window across our dataset and copy what’s inside that window every hour. With those sequences we just need to split apart the input from the output and we have a Tensorflow dataset that can be fed to our model for training.

Results:

So we can train a model! How does it perform? We use a mean squared error loss and within about 3 epochs we can get into single digit loss, pretty good for a very simple model.

Epoch 1/20
909/909 [==============================] - 823s 904ms/step - loss: 0.0153 - mean_absolute_error: 0.0492 - val_loss: 0.3321 - val_mean_absolute_error: 0.4762
Epoch 2/20
909/909 [==============================] - 825s 908ms/step - loss: 0.0093 - mean_absolute_error: 0.0326 - val_loss: 0.1164 - val_mean_absolute_error: 0.2810
Epoch 3/20
909/909 [==============================] - 808s 889ms/step - loss: 0.0015 - mean_absolute_error: 0.0163 - val_loss: 0.0852 - val_mean_absolute_error: 0.2467
Epoch 4/20
909/909 [==============================] - 801s 881ms/step - loss: 8.6400e-04 - mean_absolute_error: 0.0094 - val_loss: 0.1261 - val_mean_absolute_error: 0.3135
Epoch 5/20
909/909 [==============================] - 821s 903ms/step - loss: 0.0018 - mean_absolute_error: 0.0142 - val_loss: 0.1043 - val_mean_absolute_error: 0.2449

The training loss is much lower than validation so we have a large amount of room for improvement. At this point we have many options open to us, changing the size of the LSTM layer, maybe adding a new layer entirely. We could change our loss function or our optimizer if you want to keep the model structure the same but change the training. If you want more data we could make use of the timestamp feature by transforming it into more useful data.

We could even change the model itself to use another architecture such as transformers or convolution instead of an RNN with LSTMs.

Conclusion:

We can make a simple quick recurrent neural network with LSTM cells and use it to make pretty good predictions about the future price of bitcoin. Ultimately this is more of learning example because I make no predictions about buying, selling or risks involved, I wouldn’t try to make money by trading bitcoin based on this, but as something to build on for that purpose it is absolutely the start of a useful tool!

Ultimately there are many factors in the bitcoin price like news events (or more recently famous tweets about crypto) that have big impacts on the price but can’t be easily tracked in a recurrent neural network such as this, without making major changes. This example may be simplistic, but the goal of using neural networks to forecast bitcoin price is definitely one worth pursuing.

To view the code or additional sources please view the github repository:

Additional Sources:

Cornell University Engineering and Holberton Software engineering schools. Studied operations research and information engineering now learning machine learning