[P]: Best Practice for Cache/Processed Data Management
In ML research, we often have tons of cache files to read from disk. As a project progesses, I often end up with lots of cache files which I don’t remember how they were created. My current way to keeping track of cache files and preprocessed files is by writing it down when it is created and why.
I am wondering if anyone has a way to automate this process? I heard something like DVC (https://github.com/iterative/dvc), but it seems to be too complicated.