Idea: MLOps Composer. Interested in the community’s opinion! [Project]
I have an idea that solves a frustration I got while working on various machine learning projects which I think is quite common. I’m curious what others think before I start building the solution. I’m curious to hear all your suggestions and feedback as I want to create a tool that can be beneficial for as many people in ML as possible. First let me describe the problem and then propose my solution.
During several machine learning endeavors, we often quickly ran into scaling and operation issues: frequently it felt as though keeping (preprocessing) pipelines working correctly for different datasets, optimizing hyperparameters (cost-) effectively and managing tests and deployments was taking much more time than the actual development of the individual pieces. Lots of glue code and extremely precise documentation was necessary to keep experiments easily reproducible. Although individual processes were often relatively simple to grasp, properly managing them to work in concert (in various ways) was extremely time-consuming and tedious.
As a solution to this problem I came up with an idea for an application: what if you could manage the different components of the pipelines (let’s call them modules) and hook them up in a GUI similar to Apple’s Quartz Composer. Several commonly used modules can directly be used (so even without any coding experience!), but the user can also write their own Python scripts that can be interacted with using the application. A simple associated Python library aims to create some consistency in input and outputs of Python scripts which enables them to be used in the GUI. The application would also enable users to easily manage deployments of training sessions, tests and inference endpoints on various cloud providers, local compute, or other computers over SSH. Hyperparameter tuning can also be done through modules. Basically the entire process from raw data to usable models for inference can be streamlined in visual pipelines.
I have seen similar tools, but they are often not as extensive as my idea, and sometimes suffer from vendor lock-in issues (such as with Google AutoML). If I’d build this application I might even do so open source, which would also speed up development.
As said, I’m extremely curious what y’all think. I am interested to hear suggestions, comments or similar ideas. Thanks a lot in advance!