[D] Has anyone here looked into evaluating multiple models on a single GPU in parallel?
I’m interested in running neuroevolution algorithms on a single GPU. The idea is to combine the forward pass for many models into a single matrix multiplication per layer. To be clear, I’m not talking about weight sharing between models. All models would have their own exclusive set of weights and in many cases, different inputs as well.
Uber actually made a blog post about doing exactly this here. But they don’t explain much about how they approached implementing this functionality and their provided source code quite difficult to understand (probably in large part because I’m much more experienced with pytorch than tf).
Has anyone here tried to implement anything like this or know of any other relevant projects? Or perhaps someone who understands how Uber achieved this could conceptually step through the matrix math involved in composing these conjoined networks?