Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Multi-level data, what is the best approach?

Hi guys,

I’m working on a dataset and having some problems. I hope you can give me your insight.

So my objective is to predict customer churn based on incidents. Each incident is related to a contract which is related to a client. I need to predict the termination of the contract. The features can be grouped in 3 categories:

Client: client’s ID and some basic information about them

Contract: contract’s ID with their specific information and the target ‘In service/Terminated’

Incidents: every entry is an incident related to a contract with information like number of calls, date of creation, last change, incident category

Some clients have up to 10 contracts, some contracts have up to 20 incidents.

What I did is create a fresh table with the contracts only (and client’s information) and I now have to add relevant information for every contract.

I couldn’t help but find myself cherry picking some ‘relevant’ information like: Total incidents for the contract, total calls, last incident’s full information and also higher-level features like: number of contracts the user has, how much are terminated, total incidents for the user.

I feel it’s getting very messy and I’m still losing A LOT of information by doing this. Is it the only approach I have?

This was supposed to be a machine learning problem but seriously there’s nothing about machine learning at all, it’s pure data science.

submitted by /u/throwwawwayz123
[link] [comments]