Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[R] Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis – TowardsDataScience

[R] Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis - TowardsDataScience

Keeping the data under version control with Git-LFS is a big improvement. But the lack of version control of the data files is not the entire problem.

The determining factors for the results of training a model or other activities include the following:

  • Training data — the image database or whatever data source is used in training the model
  • The scripts used in training the model
  • The libraries used by the training scripts
  • The scripts used in processing data
  • The libraries or other tools used in processing data
  • The operating system and CPU/GPU hardware
  • Production system code
  • Libraries used by production system code

Obviously the result of training a model depends on a variety of conditions. Since there are so many variables to this, it is hard to be precise, but the general problem is a lack of what’s now called Configuration Management.

DVC takes on and solves a larger slice of the machine learning reproducibility problem than does Git-LFS or several other potential solutions:

DVC workflow – code & data

Full article: Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis

submitted by /u/thumbsdrivesmecrazy
[link] [comments]