Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Why is PyTorch as fast as (and sometimes faster than) TensorFlow?

Since both libraries use cuDNN under the hood, I would expect the individual operations to be similar in speed. However, TensorFlow (in graph mode) compiles a graph so when you run the actual train loop, you have no python overhead outside of the call. In PyTorch, you are in Python a lot due to the dynamic graph, so I would expect that to add some overhead. Not to mention the fact that having a static graph means you can graph optimizations like node pruning and ordering operations. But in many benchmarks I see online, PyTorch has no problems keeping up with TensorFlow on GPUs.

A specific example is the Adam implementations in both libraries:

PyTorch has all the ops as you would expect. For TensorFlow in the {_resource}_apply_dense case (which is the common case, AFAIK), TensorFlow has a dedicated C++ implementation. So here, TensorFlow does not spend extra time in Python AND it has an optimized implementation in C++. In this case, why isn’t the TensorFlow version straight up faster?

I’ve heard that PyTorch is better optimized on the cuDNN level. Can anyone provide more details about this? What’s preventing TensorFlow from doing the same thing? The only optimization I know of is that PyTorch uses the NCHW format (which is better optimized for cuDNN) whereas TensorFlow by default uses NHWC.

I saw these two discussions but did not see a satisfactory answer:

submitted by /u/student_at_uw
[link] [comments]