[D] Keras – Optimizing GPU usage
So I’m running a parameter search on a binary classification problem I’m working on, using the Talos library under Keras/tensorflow. (Really useful library by the way, though there are a few bugs and the documentation is not as thorough as I’d like).
The training set is about 55k examples with around 600 features each. The pc I’m running it on is an i9 9900k 5Ghz, 32gb ram, M2 drive and a 8GB RTX2080.
Running nvidia-smi (or any other monitoring tool) shows that I am using the GPU for the processing, but utilisation floats between 5-20%.
Clearly I have a bottle neck.
My CPU is showing 10-12% usage on the python process, which I initially thought would equal 100% usage on 1 of the 8 cores, however CPUID/hwInfo show that all the cores are being utilised, and none over ~50%. All temperatures are in the 40s C thanks to water cooling.
The M2 hard drive is showing essentially zero usage, and the python process is only using 2gb of ram, with the system having 13gb remaining unused. The GPU ram usage is ~7.3gb out of 8gb, but I’ve read that tensorflow just eats it up “in case” – so I’m suspecting that the training process is not actually using all of the available GPU ram (however, I’m not sure how to double check this).
If the bottleneck is not on the GPU (cores or ram), CPU, RAM, or Disc, where is it? Any help would be appreciated!
submitted by /u/Zman420