[D] When asked what not-yet-existing tool they wish they had, the IBM AI team said tools to automate and generate training data. Is there a subfield for ML models to clean and format training data?
JS – A tool to automatically generate all the training data that we need…the problem is, however, development of this tool will likely need training data as well.
FT – Agree with all of the above. But in particular for Conversational AI (where I spend most of my time with Watson Assistant) – any automation tool that could take a client’s data and automatically build out intent/entity recognition AND the dialog.
RP – Once you are in the trenches, you realize, it all starts from data. I wish we had a tool that takes noisy data and makes it clear for AI – all automatically. Enterprises soon realize, they spend most of the time in getting data ready for AI, from different formats, in different places with different permission, with tons of noise. An automation tool to make that “look ma – no hands” will be great!
I am thinking, what sort of model would you use for this? What training data would you feed it? How would you generate that data?