Learn About Our Meetup

4500+ Members

[Discussion][D] Gradient norm tracking

Are there any best practices on how one should track gradient norms during training? Surprisingly, I haven’t been able to find much reliable information on it, except the classical Glorot’s paper.

My current approach is to track 2-norm of weights raw gradients. However, I don’t have any practical intuition on which values should make me worried. Tracking the actual weight updates (e.g adjusted by Adam) makes make much more sense, but I haven’t seen anyone doing so.

A few words why am I concerned: I’m working on some exotic NN architecture for 3D, where different architecture choices implicate gradient behavior drastically, up to blow up.

submitted by /u/pubertat
[link] [comments]

Next Meetup




Plug yourself into AI and don't miss a beat