[D] I created forecasting model to forecast cryptocurrency using sentiment data, and this is the result.

Written by torontoai on August 17, 2019. Posted in Reddit MachineLearning.

How we gather the data, provided by Bitcurate, bitcurate.com

Because I don’t have sentiment data related to stock market, so I will use cryptocurrency data, BTC/USDT from binance.

close data came from CCXT, https://github.com/ccxt/ccxt, an open source cryptocurrency aggregator.
We gather from streaming twitter, crawling hardcoded cryptocurrency telegram groups and Reddit. And we store in Elasticsearch as a single index. We trained 1/4 layers BERT MULTILANGUAGE (200MB-ish, originally 700MB-ish) released by Google on most-possible-found sentiment data on the internet, leveraging sentiment on multilanguages, eg, english, korea, japan. Actually, it is very hard to found negative sentiment related to bitcoin / btc in large volume.

And the we use elasticsearch-dsl, https://elasticsearch-dsl.readthedocs.io/, to query,

s = s.filter( 'query_string', default_field = 'text', query = 'bitcoin OR btc', )

We only do text query only contain bitcoin or btc.

We have 2 questions here when saying about consensus, what happened,

to future price if we assumed future sentiment is really positive, near to 1.0 . Eg, suddenly China want to adapt cryptocurrency and that can cause huge requested volumes.
to future price if we assumed future sentiment is really negative, near to 1.0 . Eg, suddenly hackers broke binance or any exchanges, or any news that caused wreck by negative sentiment.

So, we use deep-learning to simulate for us! I use CNN-Seq2Seq architecture this time, not required to bring last memory last RNN and fast to train.

We pulled last 100 hours data and aggregated every 20 minutes, Split the dataset to train and test. Test size is last 10 hours (30 datapoints, 3 * 10), and early remaining use to train.
Initiate the model and train the model by 200 epochs. learning_rate is very sensitive, I found 1e-3 is perfect. Here I never tried to do hyperparameters searching.

The model learn, if positive and negative sentiments increasing, both will increase the price. That is why, using positive consensus or negative consensus caused price going up.
Volatility of price is higher if negative sentiment is higher, still positive volatility.
Momentum of price is higher if negative sentiment is higher, still positive momentum.
Even predicted trends are far from actual test trend, for me, it quite fascinating because I can simulate the models by N times to get different variances and from here I can calculate VaR, potential volatilities and momentums, trading ratios and etc. Well, if forecasted trends follow really close with actual test trend, do not believe it too much, there is no such model able to simulate stochastic trend that depends on a lot of real world parameters.

Any comment or feedback?