Blog

5000+ Members

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community.

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Written by torontoai on December 27, 2019. Posted in Reddit MachineLearning.

Model is tacotron2 based on this repo.

So, I made it work with Pytorch DDP, and it works, but the gap between single and distributed train seems to me too large.

So, single GPU loss much better, stable and 8 GPUs give only x2 time gain with x8 costs.

Do I miss something obvious?

Maybe because of batchnorm? Tried sync batch norm, but it does not really make a difference.