Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] How to feed variable length text data with a temporal structure?

I am working on a project that aims to predict stock returns using tweet data. I have been playing with an online dataset from here: https://github.com/yumoxu/stocknet-dataset. My aim is to feed, for example, tweets for 30 stocks in a day (variable number of tweets every day), and output a vector of stock return predictions for those 30 stocks. Since each tweet has different length, I was thinking to implement a RNN to feed in the words sequentially. It then seems to me the model will then capture the “temporal structure” of the text, but I am not sure how to capture the time series aspect of the data.

My questions can be summarised as follows:

(1) How to incorporate the time series as well as the textual temporal structure in the data I have?

(2) Or I am modelling my problem wrongly?

Edit: I have heard of encoder-decoder structures in sophisticated models like BERT, and the use of <EOS> tags to notify the model where to stop for each sentence (tweet). I think that might be something I should look into but it seems a little complicated when I was reading the BERT paper. I am rather amateur in this area so I prefer something a little beginner friendly to start with. Thanks!

Any ideas or references will be greatly appreciated. Cheers!

submitted by /u/blueclover
[link] [comments]