[D] Please help me find this paper (Foundations of DL)!
I have been searching for some time now for a paper I read a while ago but misplaced.
The paper was very interesting and showed that finding optimisation with with deep neural networks is, in some sense, easier than with shallow neural networks.
The authors generated a data set by generating random input data and then using the predictions of a shallow neural network (A) to provide the ground truth labels of those data. They then tried to train another shallow network (B) with same architecture as (A) the one that created the labels, but with different initializations. It was shown that it was very difficult to find the optimal solution for this dataset. They then tried the same task with a deeper network (C) and found the optimal solution.
If anyone knows the name of this paper then please let me know where I can find it. I would be eternally grateful!