[D] How confident are you of your own analysis ?
Say you want to add something to your model that you think might improve overall performance, e.g. some feature engineering or you decide to add to your training data, some new data you acquired. How do you make sure that this is actually increasing performance and that it is not just due to the randomness of the process ?
I kinda see how it goes for traditional ML with traditional IID assumption, fix a seed for anything that is random and just compare. But what about deep learning models ?
For instance, say you have your neural network tuned for some past state. Wouldn’t comparing past configuration with new configuration (with the added features or training data) on the same network be biased ? Maybe the feature engineering was relevant but because the network isn’t large enough, it is not able to process those additional features. Or maybe adding more data changed the loss surface and the learning rate/batch size tuned to the previous configuration is not well fitted to the new configuration ? Or maybe more data meant more updates per epoch (assuming the batch size is the same), so maybe we missed the optimal training state because we only look at the validation loss per epoch. So surely setting a seed for the randomness of the network training (weights initialization, shuffle after each epoch … ) is not enough.
I’ve thought of doing some sort of autoML/gridsearch to optimize on the learning rate/batch size for several seeds on the weights initialization and do some statistical significance on the results but this would take way too much time considering how many things I need to check. I feel like a statistical study on a given network (with hyperparameters fixed) for different weights initialization might not be relevant.
I’m asking this because whenever I change something on the preprocessing side (new feature, new data, different scaling …), or even weights initialization of the network, the “optimal” learning rate I find by hand tuning my network is never the same (and can differ a lot).
Any idea is welcome!