[P] Git hook for large files: because who wants to have their 100TB data file committed to Git?
The usual disclaimer – this is not my project, but it is simple and awesome so I wanted to share.
Check out this Git pre-commit hook for large files.
What it does:
Most people working on serious ML projects have probably experienced this issue, where you accidentally do git add .
and after committing (or worse, pushing to the remote), realize that you added your ginormous model / data to the repository.
If you’re a Git expert, you can definitely fix it. But why fix something you can avoid?
It’s super easy to install (only Linux/Mac are currently supported):
By default limits files to 5MB max size, but this can be configured with:
GIT_FILE_SIZE_LIMIT=42000000 git commit -m “This commit is allowed file sizes up to 42MB”
The hook itself is based on this Gist – which deserves credit as well.
Thanks to both developers! What other hooks do you use as part of your ML work?