[D] Why is L2 preferred over L1 Regularization?
I understand L1 regularization induces sparsity, and is thus, good for cases when it’s required.
But In normal use cases, what are the benefits of using L2 over L1? If it’s just that weights should be smaller, then why can’t we use L4 for example?
I’ve seen mentions of L2 capturing energy, Euclidean distance and being rotation invariant. Could one explain these more explicitly as to how this happens?
submitted by /u/tshrjn
[link] [comments]