[D] Do you benchmark or track snapshots of model runs?
I’m doing research on deploying ML to production and am wondering how many of you benchmark your existing models before putting it into production? How extensive is your testing and do you run AB testing in production to validate that your new model is better than the existing one?
Another related question – do you take snapshots of everything that goes in and out of your models to eventually use it for troubleshooting models? How often do models have performance issues any way?