[D] Transforming the target variable, a bad idea ?
Hi Reddit,
I am currently working on a project and I would like to hear your ideas about it. Basically, the problem consists in predicting the length of an event (target variable is T = d2 – d1). The vast majority of lengths are short (less than 10 days), but some are really big.
It is more of a problem if we make a mistake for short events than for long ones (ex: predicting 10 days instead of 8 is more problematic than predicting 30 instead of 45). I wanted to transform the target variable, using for instance a logarithmic function.
The problem with predicting log(T) is that an estimate will be E[log(T) | X]. When taking the exp function, my estimate for T will be exp(E[log(T) | X]). As exp is convex, by the Jensen inequality, I will in fact underestimate T, which is in my case something I want to avoid.
I see many people transforming their target variable but I don’t really know if most of them care about the problems that may occur with it. Are there any common techniques that you are aware of to handle this issue ? Or could you suggest any other approach that would fit my needs ?
Thank you very much and have a nice day !
(PS: At least, the good thing is that I don’t have any censored data)
submitted by /u/lazywiing
[link] [comments]