[R] FacebookAI releases Adaptive attention span and All-attention layer to reduce decrease computation time / memory footprint
https://video.twimg.com/tweet_video/ECqxU2AU0AAkBwc.mp4
To enable wider use of this powerful deep learning architecture, we propose two new methods. The first, adaptive attention span is a way to make Transformer networks more efficient for longer sentences. With this method, we were able to increase the attention span of a Transformer to over 8,000 tokens without significantly increasing computation time or memory footprint. The second, all-attention layer is a way to simplify the model architecture of Transformer networks. Even with a much simpler architecture, our all-attention network matched the state-of-the-art performance of Transformer networks.
https://ai.facebook.com/blog/making-transformer-networks-simpler-and-more-efficient/
submitted by /u/BatmantoshReturns
[link] [comments]