[D] Multi-level data, what is the best approach?

Written by torontoai on August 18, 2019. Posted in Reddit MachineLearning.

Hi guys,

I’m working on a dataset and having some problems. I hope you can give me your insight.

So my objective is to predict customer churn based on incidents. Each incident is related to a contract which is related to a client. I need to predict the termination of the contract. The features can be grouped in 3 categories:

Client: client’s ID and some basic information about them

Contract: contract’s ID with their specific information and the target ‘In service/Terminated’

Incidents: every entry is an incident related to a contract with information like number of calls, date of creation, last change, incident category

Some clients have up to 10 contracts, some contracts have up to 20 incidents.

What I did is create a fresh table with the contracts only (and client’s information) and I now have to add relevant information for every contract.

I couldn’t help but find myself cherry picking some ‘relevant’ information like: Total incidents for the contract, total calls, last incident’s full information and also higher-level features like: number of contracts the user has, how much are terminated, total incidents for the user.

I feel it’s getting very messy and I’m still losing A LOT of information by doing this. Is it the only approach I have?

This was supposed to be a machine learning problem but seriously there’s nothing about machine learning at all, it’s pure data science.

submitted by /u/throwwawwayz123
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Multi-level data, what is the best approach?