Visualizing Effect of Deep Double Descent on Model “Lottery Ticket” Architecture? [D]
Has anyone done any work on visualizing how the internal “lottery ticket” structure of a neural network changes as it goes through deep-double-descent?
One popular theory for explaining Deep Double Descent is that double descent occurs as a model truly learns to generalize by finding the “Occam’s Razor” model — the idea that the simplest model to fit the data is the best model for generalizing a solution. This is closely associated with Lottery Ticket Hypothesis and model compression, where you can cull a model’s under-used weights to arrive at a smaller model that provides almost identical accuracy. Lottery Ticket Hypothesis says (roughly paraphrased) that there is a “model within the model” that is the most significant portion of a deep neural network, and once you find that “winning ticket”, then the other nodes in the network aren’t that important.
What I’m wondering is — has there been any work done on visualizing the network architecture of most-significant weights as a model goes through the stages of Deep Double-Descent — from first trough, to plateau, to second descent?
I’m curious to know how much the core “internal architecture” changes in each of those stages, and if we can actually visualize the architecture narrowing in on that “Occam’s Lottery Ticket”…?