[Project] Training models and running Jupyter Notebooks on AWS Spot Instances (cheaper and simpler than SageMaker)
Hi everyone,
I’ve developed a tool to simplify training of deep learning models on AWS: https://github.com/apls777/spotty. My goal was to make training on AWS GPU instances as simple as training on a local computer. Spotty automatically manages all necessary AWS resources (AMIs, volumes, snapshots, SSH keys), runs Spot Instances to save up to 70% of the costs and uses tmux to easily detach remote processes from their SSH sessions.
To train the model (and make it trainable by everyone with a couple of commands), you just need to create 1 configuration file, where you describe a Docker container and AWS instance parameters.
Then the workflow is super-simple:
- Use the “spotty start” command to start your container on a cheap AWS Spot Instance. Your local project will be uploaded to the instance and available inside the container.
- Once the instance is up and running, use the “spotty ssh” command to connect to the container, or start Jupyter Notebook using the “spotty run jupyter” command (it’s a custom script from the configuration file).
Here is an article on how to train a model using Spotty with a real-life example: https://towardsdatascience.com/how-to-train-deep-learning-models-on-aws-spot-instances-using-spotty-8d9e0543d365.
I really hope you will find this tool useful and will be happy to hear any feedback.
submitted by /u/apls777
[link] [comments]