Learn About Our Meetup

4500+ Members

[D] LSTM with walk-forward validation and data normalization/standardization

I’m currently trying to build a multivariate model to predict stock market movements using LSTM. The model is not seq-to-seq, but rather seq-to-one, if that matters.

I’ve read that walk-forward validation is the ‘gold-standard‘ for validation in time-series forecasting and that crossvalidation doesn’t work due to the spatial-temporal relevancy of the data.

This creates some weird implications for data normalization…

I’ve firmly held the belief that information leakage can spoil a model by providing unreasonable in-sample performance accuracy/loss. Consequently, I’m pretty careful when train-test splitting and then using custom tranforming pipelines to standardize the data (i.e. fit_transform() vs. transform() ). How do you overcome this issue? Is it really that big of a deal to split before standardization?

Main question: If you’re using a moving-window walk-forward validation, how would you handle train/test data splits and data normalization?

submitted by /u/punknothing
[link] [comments]

Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.