Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Methods to handle streaming/real-time data storage, wrangling and prediction?

Say that there is data being streamed into Python (Kafka, Kinesis etc) every 10 seconds that I would like to wrangle and predict on. What is the best way to store this streaming data in order to do this? In the past, I have used online learning methods to do this. I am curious how to do this with a batch learning method.

I was thinking we iteratively populate a DataFrame with this data until stream stops, preprocess on the entire dataframe, predict, clear/delete the DataFrame. A caveat of this method that I am able to think of would be scenarios in which this preprocessing and predicting takes longer than 10 seconds.

What are some ways to handle this?

submitted by /u/Straighteight424
[link] [comments]