Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Spark ML – Saving PySpark custom transformers in a pipeline model

I created a spark pipeline where the first stage is a custom transformer, which only filters data on a particular attribute for a column

class getPOST(Transformer): @keyword_only def __init__(self, inputCol=None, outputCol=None): super(getPOST, self).__init__() kwargs = self._input_kwargs self.setParams(**kwargs) @keyword_only def setParams(self, inputCol=None, outputCol=None): kwargs = self._input_kwargs return self._set(**kwargs) def _transform(self, dataset): return dataset.filter(dataset.method=='POST') 

The model works great, I’m getting good performance, but when I go to save the model, I’m met with:

ValueError: ('Pipeline write will fail on this pipeline because stage %s of type %s is not MLWritable', 'getPOST_23cb579f79db', <class '__main__.getPOST'>) 

I’ve been reading up and I don’t think a transformer is the most applicable use in this case as I’m not appending any columns onto the dataset and not messing with any values or parameters that need declared, such as I found in this link. I can’t find other examples that allow you to filter out data in the Spark ML pipelines.

This is the last stage of a project I’m working on and I’d greatly appreciate any push in the right direction. Thank you for taking the time to read this!

submitted by /u/Octosaurus
[link] [comments]

Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.