[D] Training on the cloud: GCP GPU pricing seems dramatically cheaper, why would you train on AWS or Azure?
Like the title says, looking at the cost of entry-level GPU instances on the major clouds:
AWS: p2.xlarge — 1 Tesla K80, 4 vCPUs, 61gb ram $0.900/hr
Azure: NC6 — 1 Tesla K80, 6 vCPUs, 56gb ram $0.900/hr
GCP: 1 Tesla K80, 6 vCPUs, 52gb ram $0.663/hr
Further, for training CNNs on the K80 I never exceed 4-5gb of memory usage or reasonable utilization of 4 vCPUs. Since GCP is the only cloud that gives me ability to finely tune specs I can even further decrease cost for ML applications. For example:
GCP: 1 Tesla K80, 4 vCPUs, 5gb ram $0.424/hr
When benchmarking resnet50, this cheaper configuration provides no performance decrease compared to the more expensive instance.
Perhaps spot instance pricing (low-priority for azure, preemptible for GCP) comes into play, where GCP is in the middle of the pack:
AWS: p2.xlarge — 1 Tesla K80, 4 vCPUs, 61gb ram $0.270/hr
Azure: NC6 — 1 Tesla K80, 6 vCPUs, 56gb ram $0.180/hr
GCP: 1 Tesla K80, 6 vCPUs, 52gb ram $0.236/hr
This kind of instance, however, does not work for every use case so the regular on-demand pricing difference is still significant.
This all leaves me wondering:
If you train your models on the cloud, which provider do you use?
Can you imagine any reasons/use cases/etc that might warrant picking a provider other than GCP?
What is GCPs business model? How can they make money selling for so much less? Is this a loss leader to gain market share?
submitted by /u/Obventio