[N] Live observability for Tensorflow, Apache Spark
Unexpected behavior could be an error in the structure of the model or some bias in the data, or it could be a classic bug in the enveloping code. Each of these will have its own very different solution. If a model needs more training it could take weeks of computing time. If the model itself needs expanding, data scientists may have to do complicated design work. On the other hand, a logical error, once found, could be fixed in seconds.
An interesting solution here for production ML models that could take the strain off of data engineers when data scientists want to observe their ML models.
submitted by /u/ariehkovler