[R] Evolution of representations in the Transformer
A paper that explains what representations do transformer models actually learn for machine translation, language modelling and BERT-like training.
Arxiv: https://arxiv.org/abs/1909.01380 (EMNLP19, E. Voita et al.)
Blog post: https://lena-voita.github.io/posts/emnlp19_evolution.html
submitted by /u/justheuristic
[link] [comments]