Blog

Learn About Our Meetup

5000+ Members

GO >

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community.

JOIN

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

JOBS

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

CONTACT

[Discussion] Building scalable / reproducible ML pipelines

Written by torontoai on April 20, 2019. Posted in Reddit MachineLearning.

To the more experienced ML professionals in the community – I want to hear about what you use to build scalable ML pipelines at your work. I’ve been building models for a while now for research purposes. However, I’m totally in the dark about the other side of things, namely how to engineer and deploy data/ML pipelines that are scalable and provide reproducible results (whatever that may be in this context).

I’ve looked at scikit-learn pipelines, but they seem a bit clunky while handling pandas dataframes (although workarounds do seem to exist). Another sentiment I hear is that they don’t scale well to large datasets.

Care to part with your wisdom? Thanks!

submitted by /u/G_Balena
[link] [comments]