Blog

This is a new post in which I try to visualize the majority of what happens inside a trained GPT-2. We follow the journey of an input word from embedding, all the way up to the output of the model. I’ve also included a crude analogy for the query/key/value vectors of self-attention that I hope makes it easier for people starting out with transformer architectures. By the end of the post, we’d have looked at the major weight matrices of a single block, as well as the major weight matrices of the entire model. All feedback and corrections are welcomed!

The post: https://jalammar.github.io/illustrated-gpt2/

submitted by /u/nortab
[link] [comments]