[P] How can I build this simple text-based ML tool?
Hello everyone!
I work with spreadsheets a lot, doing tasks manually that are just a bit too complex for rules, but I believe they certainly fall into what ML can handle. In a nutshell, I spend 2+ hours a day going through company names, removing legal terms like “LLC” or “Limited”, and humanizing them.
For instance, I have a spreadsheet with company names and emails.
Company Name | Email Address |
---|---|
Concur Recruitment Limited – 02476 668 204 | sconvery@concurengineering.co.uk |
Confluent Technology Group | mark.anderson@confluentgroup.com |
Construction Maintenance and Allied Workers | donmelanson@cmaw.ca |
These would become (currently by hand):
Company Name | Email Address |
---|---|
Concur Engineering | sconvery@concurengineering.co.uk |
Confluent | mark.anderson@confluentgroup.com |
CMAW | donmelanson@cmaw.ca |
What we’re doing here is:
- Shorting names to their essence
- Removing legal terms and words
- Looking at domain names (in email addresses) as a clue for the “most human name”
Now, I very well believe this is something Google Cloud has capabilities for. Given the lack of programming involved with Google Cloud ML (and its potential integration with Google Sheets), I’d imagine it’s the best vehicle for this tool.
Some questions before I embark upon this journey:
- Would your recommend I use Google Cloud ML or another tool?
- How much data would you imagine would be necessary to train this tool? (uncleaned spreadsheets and cleaned spreadsheets)
- Am I critically misunderstanding something here? This is pretty much my first time practically applying ML.
Thank you very much for all your help!
submitted by /u/ventura__highway
[link] [comments]