[D] Why is PyTorch as fast as (and sometimes faster than) TensorFlow?
Since both libraries use cuDNN under the hood, I would expect the individual operations to be similar in speed. However, TensorFlow (in graph mode) compiles a graph so when you run the actual train loop, you have no python overhead outside of the session.run call. In PyTorch, you are in Python a lot due to the dynamic graph, so I would expect that to add some overhead. Not to mention the fact that having a static graph means you can graph optimizations like node pruning and ordering operations. But in many benchmarks I see online, PyTorch has no problems keeping up with TensorFlow on GPUs.
A specific example is the Adam implementations in both libraries:
https://github.com/pytorch/pytorch/blob/master/torch/optim/adam.py
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/adam.py
PyTorch has all the ops as you would expect. For TensorFlow in the {_resource}_apply_dense case (which is the common case, AFAIK), TensorFlow has a dedicated C++ implementation. So here, TensorFlow does not spend extra time in Python AND it has an optimized implementation in C++. In this case, why isn’t the TensorFlow version straight up faster?
I’ve heard that PyTorch is better optimized on the cuDNN level. Can anyone provide more details about this? What’s preventing TensorFlow from doing the same thing? The only optimization I know of is that PyTorch uses the NCHW format (which is better optimized for cuDNN) whereas TensorFlow by default uses NHWC.
I saw these two discussions but did not see a satisfactory answer:
https://www.reddit.com/r/MachineLearning/comments/8iguaw/d_why_is_tensorflow_so_slow/
submitted by /u/student_at_uw
[link] [comments]