[D] End-to-end normalization for deep learning of time series?
Has anyone had any experience with this? I have a rather wide feature set in my time series data, many features with vastly different scales. I’m attempting to batch-train an LSTM autoencoder. So far, I have come to the following conclusions:
- I’m reluctant to use first differences as I don’t want to destroy level information.
- I don’t want to z-score normalize data before training as many components are non-stationary
- I don’t want to use sliding min-max or sliding z-score as that destroys any volatility information between subsequent minibatches
So far, the following thoughts have come to mind:
- Using layer normalization, yielding equal normalization statistics for all features accross my minibatch. Destroys however information between individual features in a given sample
- Manual z-score operations inside my network. Take the resulting statistics, pass through linear layer to adapt dimensionality and initialize the LSTM hidden layer. So far, doesn’t really work. But a variant of this (perhaps concatenating to the lstm output…?) seems to be my current focus.
- This approach seems promising:
- … but somehow, it doesn’t yield good results either. Seems very learning rate dependent which is a bit of a bummer.
Any thoughts or successful ideas?