Blog

Learn About Our Meetup

4500+ Members

[D] Multi-level data, what is the best approach?

Hi guys,

I’m working on a dataset and having some problems. I hope you can give me your insight.

So my objective is to predict customer churn based on incidents. Each incident is related to a contract which is related to a client. I need to predict the termination of the contract. The features can be grouped in 3 categories:

Client: client’s ID and some basic information about them

Contract: contract’s ID with their specific information and the target ‘In service/Terminated’

Incidents: every entry is an incident related to a contract with information like number of calls, date of creation, last change, incident category

Some clients have up to 10 contracts, some contracts have up to 20 incidents.

What I did is create a fresh table with the contracts only (and client’s information) and I now have to add relevant information for every contract.

I couldn’t help but find myself cherry picking some ‘relevant’ information like: Total incidents for the contract, total calls, last incident’s full information and also higher-level features like: number of contracts the user has, how much are terminated, total incidents for the user.

I feel it’s getting very messy and I’m still losing A LOT of information by doing this. Is it the only approach I have?

This was supposed to be a machine learning problem but seriously there’s nothing about machine learning at all, it’s pure data science.

submitted by /u/throwwawwayz123
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.