Skip to main content

Blog

5000+ Members

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community.

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

A few questions about this paper:

If you train an ensemble of SOTA architectures on Imagenet and average their results, do you beat STNS?
Why not fine tune the teacher? Why involve the student at all? Why not have the teacher fine tune with noisy labels and get rid of the student completely?
The noisy part to the student seems odd to me. Why would this work other than the fact of adding noise you sort of anneal the solution. Why not add noise to the gradient or do what i suggest in 2.

I see Q Le has investigated noisy gradients already. https://arxiv.org/pdf/1511.06807.pdf