[D] Can dense network perform as good as any other architecture?
In a project I am working on currently, team got into a discussion over shall we go for Dense MLP or CNN? That discussion sort of made me wonder the question, “Can Dense MLP work as good as any other architecture (CNN, GCN) for every task?” A proper way of putting it will be, given we are able to properly train a huge dense network with enough expressive power for the task, and we have enough data for proper training, can a dense network perform as good as other network architecture, in theory? From what I understand, different architectures are just different ways of pooling/sharing information and feature extraction. The functions that can be realised by any network should also be realisable by some configuration of Dense network.