Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Refit existing Spark ML PipelineModel with new data

Hi!

I’d like to refit an alerady existing PipeLineModel in my project from microbatch to microbatch.

I curently use a DecisionTreeRegressor. I load back the previusly used PipelineModel, set it’s stages to a pipeline and use that pipeline to refit with new data, but as I can understand my solution only saves the latest model.

I enclose my github repo/model for the easier understanding.

Does Spark Structured Streaming capable of streaming learning? Is it possible to refit my already fitted model with new data?

Do I have to use the RDD based Spark Streaming with StreamingLinearRegressionWithSGD?

submitted by /u/Hakuhun
[link] [comments]