Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] LSTM with walk-forward validation and data normalization/standardization

I’m currently trying to build a multivariate model to predict stock market movements using LSTM. The model is not seq-to-seq, but rather seq-to-one, if that matters.

I’ve read that walk-forward validation is the ‘gold-standard‘ for validation in time-series forecasting and that crossvalidation doesn’t work due to the spatial-temporal relevancy of the data.

This creates some weird implications for data normalization…

I’ve firmly held the belief that information leakage can spoil a model by providing unreasonable in-sample performance accuracy/loss. Consequently, I’m pretty careful when train-test splitting and then using custom tranforming pipelines to standardize the data (i.e. fit_transform() vs. transform() ). How do you overcome this issue? Is it really that big of a deal to split before standardization?

Main question: If you’re using a moving-window walk-forward validation, how would you handle train/test data splits and data normalization?

submitted by /u/punknothing
[link] [comments]