Self-attention

The self-attention mechanism is a defining characteristic of Transformer models. They can be viewed as a graph-like inductive bias that connects all tokens in a sequence with a relevance-based pooling operation.

Jethro's Braindump

Self-attention

Links to this note