Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Transforming the target variable, a bad idea ?

Hi Reddit,

I am currently working on a project and I would like to hear your ideas about it. Basically, the problem consists in predicting the length of an event (target variable is T = d2 – d1). The vast majority of lengths are short (less than 10 days), but some are really big.

It is more of a problem if we make a mistake for short events than for long ones (ex: predicting 10 days instead of 8 is more problematic than predicting 30 instead of 45). I wanted to transform the target variable, using for instance a logarithmic function.

The problem with predicting log(T) is that an estimate will be E[log(T) | X]. When taking the exp function, my estimate for T will be exp(E[log(T) | X]). As exp is convex, by the Jensen inequality, I will in fact underestimate T, which is in my case something I want to avoid.

I see many people transforming their target variable but I don’t really know if most of them care about the problems that may occur with it. Are there any common techniques that you are aware of to handle this issue ? Or could you suggest any other approach that would fit my needs ?

Thank you very much and have a nice day !

(PS: At least, the good thing is that I don’t have any censored data)

submitted by /u/lazywiing
[link] [comments]