[Discussion] Building scalable / reproducible ML pipelines
To the more experienced ML professionals in the community – I want to hear about what you use to build scalable ML pipelines at your work. I’ve been building models for a while now for research purposes. However, I’m totally in the dark about the other side of things, namely how to engineer and deploy data/ML pipelines that are scalable and provide reproducible results (whatever that may be in this context).
I’ve looked at scikit-learn pipelines, but they seem a bit clunky while handling pandas dataframes (although workarounds do seem to exist). Another sentiment I hear is that they don’t scale well to large datasets.
Care to part with your wisdom? Thanks!