[D] End-to-end normalization for deep learning of time series?

Has anyone had any experience with this? I have a rather wide feature set in my time series data, many features with vastly different scales. I’m attempting to batch-train an LSTM autoencoder. So far, I have come to the following conclusions:

I’m reluctant to use first differences as I don’t want to destroy level information.
I don’t want to z-score normalize data before training as many components are non-stationary
I don’t want to use sliding min-max or sliding z-score as that destroys any volatility information between subsequent minibatches

So far, the following thoughts have come to mind:

Using layer normalization, yielding equal normalization statistics for all features accross my minibatch. Destroys however information between individual features in a given sample
Manual z-score operations inside my network. Take the resulting statistics, pass through linear layer to adapt dimensionality and initialize the LSTM hidden layer. So far, doesn’t really work. But a variant of this (perhaps concatenating to the lstm output…?) seems to be my current focus.
This approach seems promising:
- https://arxiv.org/pdf/1902.07892.pdf
… but somehow, it doesn’t yield good results either. Seems very learning rate dependent which is a bit of a bummer.

Any thoughts or successful ideas?

submitted by /u/brokenAlgorithm
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] End-to-end normalization for deep learning of time series?