[D] Testing different linking strategies (hard links vs symlinks vs reflinks) for managing ML projects
Linking data or model files makes it possible to rearrange any amount of files very fast, while avoiding copying, thus saving disk space. Data science teams use symlinks to save space and avoid copying large datasets.
The tutorial start with a strategy of copying files into place, then using hard links and symbolic links, then ending up with a new type of link, reflinks, which implements Copy On Write capabilities available in modern file systems on Linux and Mac OS: Reflinks vs symlinks vs hard links, and how they can help machine learning projects
submitted by /u/cmstrump
[link] [comments]