What does the global minima of a non-convex loss function look like?
For LeNet trained on MNIST with the lowest possible loss (global minima),
- What would the test error rate look like? Is there a benchmark for best possible performance?
- Can we achieve global minima on non-convex loss functions for a classification task with a minimum number of parameters? Or conversely, how does adding more parameters to a NN help with this?