[R] Machine Learning Reproducibility Challenges and DVC
When ML models need to be regularly updated in production, a host of challenges emerges. Paramount among ML reproducibility concerns are the following:
- Effectively versioning your models
- Capturing the exact steps in your data munging and feature engineering pipelines
- Dependency management (including of your data and infrastructure)
- Configuration tracking
No one tool can do it all for you – organizations using a mix of Git, Makefiles, ad hoc scripts and reference files for reproducibility. The following overview explains how DVC enters this mix offering a cleaner solution, specifically targeting data science challenges: First Impressions of Data Science Version Control (DVC) (full tutorial)
submitted by /u/thumbsdrivesmecrazy
[link] [comments]