Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Temporal coherence in transformers ? Why Fixed length inputs in Al-Rfou(2018) ?

Why use fixed length sequences in transformer ? In what way and why does it effect the performance and training of transformer ? Why did they not use sequences of length <= some number ?

Any paper regarding this?

Also, while reading the paper on Transformer-XL (Dai et. al, 2019) they say,

“We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence”

Why can’t we learn dependencies with a normal transformer(Vaswani et. al) beyond a fixed length without disrupting temporal coherence?

I think temporal coherence gets disturbed when the input length becomes comparable to the length of embedding used for a single word/character because the embedding then doesn’t contain enough information to link the word embedding to all the previous length of this input sequence . Am i right ?

submitted by /u/Jeevesh88
[link] [comments]