[R] FacebookAI releases Adaptive attention span and All-attention layer to reduce decrease computation time / memory footprint

Written by torontoai on August 23, 2019. Posted in Reddit MachineLearning.

https://video.twimg.com/tweet_video/ECqxU2AU0AAkBwc.mp4

To enable wider use of this powerful deep learning architecture, we propose two new methods. The first, adaptive attention span is a way to make Transformer networks more efficient for longer sentences. With this method, we were able to increase the attention span of a Transformer to over 8,000 tokens without significantly increasing computation time or memory footprint. The second, all-attention layer is a way to simplify the model architecture of Transformer networks. Even with a much simpler architecture, our all-attention network matched the state-of-the-art performance of Transformer networks.

https://ai.facebook.com/blog/making-transformer-networks-simpler-and-more-efficient/

submitted by /u/BatmantoshReturns
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[R] FacebookAI releases Adaptive attention span and All-attention layer to reduce decrease computation time / memory footprint