[P] Refit existing Spark ML PipelineModel with new data
Hi!
I’d like to refit an alerady existing PipeLineModel in my project from microbatch to microbatch.
I curently use a DecisionTreeRegressor. I load back the previusly used PipelineModel, set it’s stages to a pipeline and use that pipeline to refit with new data, but as I can understand my solution only saves the latest model.
I enclose my github repo/model for the easier understanding.
Does Spark Structured Streaming capable of streaming learning? Is it possible to refit my already fitted model with new data?
Do I have to use the RDD based Spark Streaming with StreamingLinearRegressionWithSGD?
submitted by /u/Hakuhun
[link] [comments]