[R] Why Git and Git-LFS is not enough to solve the Machine Learning Reproducibility crisis – TowardsDataScience
Keeping the data under version control with Git-LFS is a big improvement. But the lack of version control of the data files is not the entire problem.
The determining factors for the results of training a model or other activities include the following:
Obviously the result of training a model depends on a variety of conditions. Since there are so many variables to this, it is hard to be precise, but the general problem is a lack of what’s now called Configuration Management.
DVC takes on and solves a larger slice of the machine learning reproducibility problem than does Git-LFS or several other potential solutions: