[D] Separate app server and deep learning server?
I am in charge of integrations and deployments for a small analytics firm. We have many apps that require various machine learning models (mostly deep learning) for image-based classification, context detection, and sequence predictions.
I want there to be a separate app server for each app, and a single heavy-duty deep learning server that hosts and runs all these models. We are currently using ONNX models to have a uniform runtime for all the models because our various DS teams prefer different frameworks.
I wanted to know the best practices in such a scenario. Would it be better to have the inference module in the app server or on a separate central server specialized for ML? If so, would it be better to preprocess the data on the app server or pass raw data to the ML server and perform preprocessing before inference?