[D] Tensorflow GPU C API performance in C++
I recently wrote a wrapper for the Tensorflow GPU C API to run in a C++ project I’m working on. Since the library is in C, it can’t throw, and the only STL function I call is std::vector’s “push back”. Based on Herb Sutter’s recent talk, I thought, “hey, I might as well make this function noexcept”. Much to my surprise, the function (which took 40ms to run my CNN before) sped up to running in 19ms. Can anyone help me speculate why it’s that big of a performance difference? (Using Visual Studio 19, C++17, default optimization options)
submitted by /u/WalkingAFI
[link] [comments]