[R]: Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
The authors use a classic Armijo line-search approach in the context of SGD to automatically tune the line search parameter in training the neural networks. They’re also able to prove convergence results on minimizing convex and non-convex objective functions satisfying certain growth conditions. An aside, but as an optimization-head myself, it’s nice to see some of the traditional optimization ideas make their way into an ML context.
submitted by /u/sinsecticide
[link] [comments]