[R] Benchmarking OpenAI’s GPT-2 on GPUs vs. CPUs (realtime inference)
I ran an experiment comparing both the latency and cost of serving realtime inference with GPT-2 on AWS using GPUs and CPUs. I also added a bit of context for how latency correlates to real world product requirements:
https://towardsdatascience.com/how-much-difference-do-gpus-make-in-model-serving-c40b885ac096?
submitted by /u/calebkaiser
[link] [comments]