[R] Principled Machine Learning for Efficient Collaboration
Machine learning projects are often harder than they should be. We’re just running software, and the result is a trained ML model. But three months later do you remember how to rerun the software, the datasets may have changed, and therefore you might be unable to replicate the results. A lack of software tools to manage machine learning datasets is the culprit, and impede efforts to efficiently share of data with colleagues.
In our search for tools to efficiently manage machine learning projects these principles are important:
- Transparency: Inspecting every part of the ML project
- Audibility: Inspecting all intermediate results, and the final result
- Reproducibility: Ability to robustly rerun the software and associated datasets from any stage in the project
- Scalability: Ability to support ML projects containing any number of people, and to work on multiple projects at a time
The article explains implementation in ML projects and using some open source tools like MLFlow and DVC in this context: Principled Machine Learning – DEV Community