# Blog

## 5000+ Members

### MEETUPS

LEARN, CONNECT, SHARE

### JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

### CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

# [D] Decision Tree Splitting strategy

 I have a dataset with 4 categorical features (Cholesterol, Systolic Blood pressure, diastolic blood pressure, and smoking rate). I use a decision tree classifier to find the probability of stroke. I am trying to verify my understanding of the splitting procedure done by Python Sklearn. Since it is a binary tree, there are three possible ways to split the first feature which is either to group categories {0 and 1 to a leaf, 2 to another leaf} or {0 and 2, 1}, or {0, 1 and 2}. What I know (please correct me here) is that the chosen split is the one with the highest information gain. I have calculated the information gain for each of the three grouping scenarios: {0 + 1 , 2} –> 0.17 {0 + 2 , 1} –> 0.18 {1 + 2 , 0} –> 0.004 However, sklearn’s decision tree chose the first scenario instead of the third (please check the picture). Can anyone please help clarify the reason for selecting the first scenario? is there a priority for splits that results in pure nodes. thus selecting such a scenario although it has less information gain? https://preview.redd.it/mkve4teopk641.jpg?width=1319&format=pjpg&auto=webp&s=fe487bedf67bc812d720ae2fe595fc41d9589dda submitted by /u/elmsha [link] [comments]

# Plug yourself into AI and don't miss a beat

Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.