Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] I started writing a book on practical considerations of ML, keen for feedback on its direction

I was finding I was constantly having the same conversations with people about implementing ML in practice, so I tried to find a resource I could provide to people that might help. However, I found there wasn’t much on the practical side of implementing ML – so about a year ago I drafted out a table of contents and started writing. I ended up shelving it for a bit, and have just picked it up again now – but am torn between just blogging what I’ve already got, or trucking on to create a unified resource (i.e. the book).

I’m keen to hear what the reddit ML community thinks – whether I should continue (maybe it’s been superceded?), and if I continue, if there’s anything you’d like to see covered in the book?

My intention – should I continue and complete it – is to self-publish online through something like LeanPub. I have no burning desire to see the book in print or make money off it, I’d really just like to raise awareness of what we all need to think about when we create ML solutions in the real world.

This is me:

Here’s how the table of contents looks (about ~25% of the content is written already, and subsections aren’t shown. Feedback so far has been to include a section on biases, which has been added):

  1. Introduction
    1.1 Terminology
    1.2 How do I get started using machine learning?

  2. Do you really need machine learning?
    2.1 Data availability
    2.2 Liability
    2.3 Capability
    2.4 Other solutions
    2.5 Pre-requisite checklist

  3. Team
    3.1 Skills
    3.2 Common team structures
    3.3 Forming a team and getting started

  4. Building your first machine learning solution

  5. Data collection
    5.1 Collecting the data
    5.2 Data set size – how much is enough?
    5.3 Labeled versus unlabeled data

  6. Pre-processing
    6.1 Automatically cleaning the data
    6.2 Dealing with missing values
    6.3 Applying domain knowledge
    6.4 Feature cleanup
    6.5 Dealing with the minority class

  7. Algorithm considerations
    7.1 Unsupervised versus supervised
    7.2 ’Good enough’ accuracy
    7.3 Storage
    7.4 Speed

  8. Measuring accuracy
    8.1 Metrics
    8.2 Minimum required accuracy
    8.3 Test set
    8.4 Investigating prediction errors
    8.5 A/B Testing

  9. Identifying and Mitigating Biases
    9.1 Biases from data
    9.2 Biases from trained models
    9.3 Inventor’s bias
    9.4 Biases caused by perception of machine learning

10 Getting an algorithm to production
10.1 Infrastructure
10.2 Documentation
10.3 User interface
10.4 Abstaining classifiers
10.5 Runtime environment

  1. Managing live algorithms
    11.1 Monitoring
    11.2 Effect on the real world
    11.3 Auditing results
    11.4 Updating models
    11.5 Technical debt

What do y’all think?

submitted by /u/katnz
[link] [comments]