Learn About Our Meetup

4500+ Members

[D] An overlooked pitfall in data science is incorporation of numerical identifiers.

As a professional data sciencetis and algorithm developer I ofent encounter serious errors in the use and design of machine learning software. One of the most ignoring is incorporation of numerical identifiers in data sets used for machine learning. Many data sets come with identifiers in the feature set, however they should never be used when validating or estimating models!

Take for example Forensic Science Glass Identification data, it comes with a id column. Use the data as is, I can get an error rates of less than 1%, too good to be true, and it is. Removing the id column and the error rate is approximately 35%! Details can be found in my code-snippet here: and short in my blog post:

submitted by /u/at-roasted-space
[link] [comments]

Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.