Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Help! How much does your data change in serious ML projects?

Too bad I can’t create polls on Reddit…

I was talking with a data scientist friend about versioning data in ML projects (I know there are a lot of great solutions and this post is not meant to focus on any of them).

What he said really flipped the notion I had in my head that data is an integral part of data science source code.

He claimed that in most data science projects the data and artifacts (intermediate stages of data processing not including models) don’t change that much. This is to say, the source data might be changed, but it is just one file (So you can get away with not versioning it) and intermediate stages should always be determined by code so you just need to manage the code you used to create the stage and not the result (the only exception is if you have some painful or resource intensive processing where you wouldn’t want to repeat that process).

I was wondering, from people here with experience in real world projects, how versatile is your data? Do you feel it’s hard to manage the data and artifacts?

I’m confused and your input would be greatly appreciated.

submitted by /u/Train_Smart
[link] [comments]

Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.