[D] Reasons for small RNN size in Neural Architecture Search paper
In the Neural Architecture Search paper it is stated that the controller RNN (used to generate architectures) had only 35 units in each of its 2 layers. This very small size seems strange to me. My initial explanation was that the authors had too few samples, but they actually used 15,000, which should be enough to train a bigger network. So what in your opinion could be a reason for a smaller network/why making the controller bigger wouldn’t influence the results?