[D] Temporal coherence in transformers ? Why Fixed length inputs in Al-Rfou(2018) ?

Written by torontoai on October 16, 2019. Posted in Reddit MachineLearning.

Why use fixed length sequences in transformer ? In what way and why does it effect the performance and training of transformer ? Why did they not use sequences of length <= some number ?

Any paper regarding this?

Also, while reading the paper on Transformer-XL (Dai et. al, 2019) they say,

“We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence”

Why can’t we learn dependencies with a normal transformer(Vaswani et. al) beyond a fixed length without disrupting temporal coherence?

I think temporal coherence gets disturbed when the input length becomes comparable to the length of embedding used for a single word/character because the embedding then doesn’t contain enough information to link the word embedding to all the previous length of this input sequence . Am i right ?

submitted by /u/Jeevesh88
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Temporal coherence in transformers ? Why Fixed length inputs in Al-Rfou(2018) ?