[R] Generative Modeling with Sparse Transformers
Blog post: https://openai.com/blog/sparse-transformer/
Direct link to PDF: https://d4mucfpksywv.cloudfront.net/Sparse_Transformer/sparse_transformers.pdf
I wonder how this compares to learned “hard” attention models like the recurrent attention glimpsing method ( https://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf )
submitted by /u/rtk25
[link] [comments]