Circuit explained: How does a transformer perform compositional generalization
Fig 5
Circuit diagram of the key attention heads.
Green circles indicate attention heads that contribute most significantly to downstream nodes. Green arrows denote the flow of contributions from upstream nodes to each attention head. The main sub-circuits highlighted are the K-circuit and Q-circuit leading to the Output Head.