Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Need advise with pytorch distributed setup worse then single gpu

Model is tacotron2 based on this repo.

So, I made it work with Pytorch DDP, and it works, but the gap between single and distributed train seems to me too large.

Loss single GPU vs 8 GPU

So, single GPU loss much better, stable and 8 GPUs give only x2 time gain with x8 costs.

Do I miss something obvious?

Maybe because of batchnorm? Tried sync batch norm, but it does not really make a difference.

submitted by /u/hadaev
[link] [comments]