Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
I’m currently trying to build a multivariate model to predict stock market movements using LSTM. The model is not seq-to-seq, but rather seq-to-one, if that matters.
I’ve read that walk-forward validation is the ‘gold-standard‘ for validation in time-series forecasting and that crossvalidation doesn’t work due to the spatial-temporal relevancy of the data.
This creates some weird implications for data normalization…
I’ve firmly held the belief that information leakage can spoil a model by providing unreasonable in-sample performance accuracy/loss. Consequently, I’m pretty careful when train-test splitting and then using custom tranforming pipelines to standardize the data (i.e. fit_transform() vs. transform() ). How do you overcome this issue? Is it really that big of a deal to split before standardization?
Main question: If you’re using a moving-window walk-forward validation, how would you handle train/test data splits and data normalization?
submitted by /u/punknothing
[link] [comments]