[D] I started writing a book on practical considerations of ML, keen for feedback on its direction
I was finding I was constantly having the same conversations with people about implementing ML in practice, so I tried to find a resource I could provide to people that might help. However, I found there wasn’t much on the practical side of implementing ML – so about a year ago I drafted out a table of contents and started writing. I ended up shelving it for a bit, and have just picked it up again now – but am torn between just blogging what I’ve already got, or trucking on to create a unified resource (i.e. the book).
I’m keen to hear what the reddit ML community thinks – whether I should continue (maybe it’s been superceded?), and if I continue, if there’s anything you’d like to see covered in the book?
My intention – should I continue and complete it – is to self-publish online through something like LeanPub. I have no burning desire to see the book in print or make money off it, I’d really just like to raise awareness of what we all need to think about when we create ML solutions in the real world.
This is me: https://twitter.com/drkatnz
Here’s how the table of contents looks (about ~25% of the content is written already, and subsections aren’t shown. Feedback so far has been to include a section on biases, which has been added):
-
Introduction
1.1 Terminology
1.2 How do I get started using machine learning? -
Do you really need machine learning?
2.1 Data availability
2.2 Liability
2.3 Capability
2.4 Other solutions
2.5 Pre-requisite checklist -
Team
3.1 Skills
3.2 Common team structures
3.3 Forming a team and getting started -
Building your first machine learning solution
-
Data collection
5.1 Collecting the data
5.2 Data set size – how much is enough?
5.3 Labeled versus unlabeled data -
Pre-processing
6.1 Automatically cleaning the data
6.2 Dealing with missing values
6.3 Applying domain knowledge
6.4 Feature cleanup
6.5 Dealing with the minority class -
Algorithm considerations
7.1 Unsupervised versus supervised
7.2 ’Good enough’ accuracy
7.3 Storage
7.4 Speed -
Measuring accuracy
8.1 Metrics
8.2 Minimum required accuracy
8.3 Test set
8.4 Investigating prediction errors
8.5 A/B Testing -
Identifying and Mitigating Biases
9.1 Biases from data
9.2 Biases from trained models
9.3 Inventor’s bias
9.4 Biases caused by perception of machine learning
10 Getting an algorithm to production
10.1 Infrastructure
10.2 Documentation
10.3 User interface
10.4 Abstaining classifiers
10.5 Runtime environment
- Managing live algorithms
11.1 Monitoring
11.2 Effect on the real world
11.3 Auditing results
11.4 Updating models
11.5 Technical debt
What do y’all think?
submitted by /u/katnz
[link] [comments]