Time-series forecasting quickstart: Bitcoin price movement with Conv1D

5 min readAug 3, 2019

Introduction

Imagine you have two blank sheets of A4 paper, placed directly on top of one another, the folded in half horizontally. If you wanted to describe this transformation, it would simply be “unfold over height/2”. Imagine you then folded over the horizontal axis. The resulting un-transformation would be

unfold over width/2
unfold over height/2

If you gave these instructions to a computer it could separate any two pieces of paper that were folded in that manner.

Now imagine you crumple up the two pieces of paper into a ball

What are the instructions to separate these two sheets? This is the question that machine learning attempts to solve. In a mathematical sense, machine learning is all about finding the mathematical transformations that represent your obfuscated inputs as meaningful, ordered outputs. It does this by decomposing complex transformations (i.e crumpling a paper ball) into chains of simple ones (i.e folding paper over a line).

This exposition is an extension of a paper ball analogy in Deep Learning with Python by François Chollet, a highly recommended, intuitive introduction into machine learning and deep learning.

Bitcoin

Financial forecasting is considered one of the most complex problems to solve in the whole of data science, financial math, and machine learning alike. The unfathomable number of factors make it extremely difficult to develop a strategy that beats the market in the long run. Even by traditional market standards, Bitcoin is considered to be highly volatile and unpredictable.

Previous research

Financial forecasting with neural networks is still a fresh idea, with the majority of the research being preformed in the last couple of years. After reading dozens of blogs and papers, these are the key takeaways:

Most implementations suffer from the f(t + 1) = f(t) trap, where the model learns that “tomorrows price will be pretty much today’s price”. This is obviously very wrong but will still provide high accuracy. This is frustrating to see again, and again, and again, and again because the authors put so much work into pre-processing and building complex models, only to be defeated by this novel bug.
Convolutional networks, although designed for image processing, can be used to forecast time series and might even outperform RNN/LSTM architecture.

Plan

If we can beat the f(t + 1) = f(t) bug we will be doing better than most models. I think the best way to do this is to price difference the data, because we aren’t actually trading the price of bitcoin but price movements. You don’t make your decision to buy based off it’s current price, but where you think it will move in the future.

Features:
- Daily open, high, low, close data
- Volume as a percentage of market cap
- Bitcoin vs altcoin dominance
- r/bitcoin subreddit subscribers
- These features will all be differenced to calculate the delta between days.

Real data:

Price differenced data:


def delta_time_series(data):
 return data[1:] — data[:-1]columns = [“close”, “open”, “high”, “low”, “vol_market”, “dom_btc”, “dom_eth”, “dom_alt”, “r_btc_subs”]
new_df = pd.DataFrame(columns=columns)
for index, row in data.iterrows():
 # close price, our label
 close = row[“close”]
 
 # difference between close and OHL
 open_ = row[“open”] — row[“close”]
 high = row[“high”] — row[“close”]
 low = row[“low”] — row[“close”]
 
 # volume as a percentage of market cap
 size_volume = row[“volume”]/row[“marketcap”]
 
 # dominance of bitcoin compared to eth and alts
 dominance_row = dom_data.loc[dom_data[“Date”]==row[“date”], :].squeeze()
 dom_btc = dominance_row[“Bitcoin”]/100
 dom_eth = dominance_row[“Ethereum”]/100
 dom_alt = dominance_row[“Others”]/100
 
 # subscribers to the bitcoin subreddit.
 r_btc_subs = r_bitcoin.loc[r_bitcoin[“Date”]==row[“date”], :].squeeze()[“Subs”]
 new_df.loc[index] = [close, open_, high, low, size_volume, dom_btc, dom_eth, dom_alt, r_btc_subs]new_df = delta_time_series(new_df)

The data was then scaled to values between -1 and 1 using MinMaxScaler, and the values saved to invert the scaling. This is a crucial step to ensure that our model has the smallest weights possible.

The way time series forecasting works in supervised learning is it assigns labels to groups of features the size of the lookback. If we have the time series

[10, 20, 30, 40, 50, 60]

and we want to determine the next item from the last 3, we would train the neural network on the data:

[10, 20, 30] -> [40]
[20, 30, 40] -> [50]
[30, 40, 50] -> [60]

The easiest way to transform a table of data into this format is to use Keras’ TimeseriesGenerator, which does all the boilerplate work for you! We will split the data into 70% train and 30% test.

train_size = int(len(dataset_y) * 0.7)
train_x, train_y = dataset_x[:train_size], dataset_y[:train_size]
test_x, test_y = dataset_x[train_size:], dataset_y[train_size:]# look at a weeks worth of data and predict the next day
generator = TimeseriesGenerator(train_x, train_y, length=7, shuffle=False, batch_size=1)
validation_gen = TimeseriesGenerator(test_x, test_y, length=7, shuffle=False, batch_size=1)

Deep-learning models

I loved the idea of using a 1D CNN for time-series forecasting — it is a very elegant and intuitive technique. Nils Ackermann did an amazing introduction to 1D CNNs which I highly recommend. Instead of the usual 2D CNN with 2 width dimensions and 3 (for RGB) depth channels, we will use a 1D CNN with 1 width dimension (time) and 9 depth channels, one for each feature.

model = Sequential()
model.add(Conv1D(filters=8, kernel_size=3, activation=”relu”, padding=”same”, input_shape=(7, 9)))
model.add(Flatten())
model.add(Dense(8, activation=”relu”))
model.add(Dense(1, activation=”tanh”))
model.compile(loss=”mse”, optimizer=optimizers.Adam())
model.fit_generator(generator, epochs=25, validation_data=validation_gen, verbose=1)

Results

Prediction in red, actual in green

This is a pretty good result! It predicted large price movements during the bull run in late 2017, using stock data and r/bitcoin subscriber count alone. Although it doesn’t appear to suffer from the deadly f(t+1) = f(t) bug, it is probably overfitting and would not turn a profit in real life trading (yet).

Conclusions

Convolutional neural networks are not limited to image classification, and have been shown to be effective at multivariate time series forecasting.
I believe improvements to this model will come from more sentiment analysis, like tweet volume, google trends, and news article processing (maybe with NLP to determine bullish/bearish attitudes).
Hype aside, deep learning is an incredibly useful **tool** that can assist in finding order in otherwise overwhelming chaos.