Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Tensorflow GPU memory management (TF_FORCE_GPU_ALLOW_GROWTH)

So this is more of an exploratory question. I am deploying models using a TF serving docker image with the flag TF_FORCE_GPU_ALLOW_GROWTH. I am deploying a small fashion mnist model, resnet (99MB), and inception v3(92MB) models. Because of the flag, the tf model server initially occupies only ~300 MB approx, then on sequential requests to the models it increases as follows (according to nvidia-smi):

~300 MB | after inception request ~4306MB | after resnet request ~ 8402 MB

if I send a request to resnet first, the GPU usage does not increase at all (Even when I add more models):

~300 MB | after resnet request ~7888MB | after inception request ~ 7888 MB

Why does the GPU usage not increase after adding more models? Are they flushed from memory when new models are loaded for inference? How can I accurately estimate how many similar sized models can be loaded on one GPU enabled machine without the trial and error method? Is there a pattern to what fraction of GPU memory is progressively allocated?

Note: This is run on an EC2 instance with available GPU memory 11441MiB [ Tesla K80 ] when I trey to run the same on a machine with lower capacity [Quadro P2000 – 5059 MB], I face a similar situation where there is no increase in memory usage. However, I also get the following in the logs:

2019-12-11 05:10:54.727985: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.25GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-12-11 05:10:54.736610: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.26GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available 

submitted by /u/annoyed_panda
[link] [comments]