[D] LSTM with walk-forward validation and data normalization/standardization

Written by torontoai on July 28, 2019. Posted in Reddit MachineLearning.

I’m currently trying to build a multivariate model to predict stock market movements using LSTM. The model is not seq-to-seq, but rather seq-to-one, if that matters.

I’ve read that walk-forward validation is the ‘gold-standard‘ for validation in time-series forecasting and that crossvalidation doesn’t work due to the spatial-temporal relevancy of the data.

This creates some weird implications for data normalization…

I’ve firmly held the belief that information leakage can spoil a model by providing unreasonable in-sample performance accuracy/loss. Consequently, I’m pretty careful when train-test splitting and then using custom tranforming pipelines to standardize the data (i.e. fit_transform() vs. transform() ). How do you overcome this issue? Is it really that big of a deal to split before standardization?

Main question: If you’re using a moving-window walk-forward validation, how would you handle train/test data splits and data normalization?

submitted by /u/punknothing
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] LSTM with walk-forward validation and data normalization/standardization