Self Attention (Shared KV)
Attention series: 6 of 11
Attention Series:
This is a single attention head that receives shared Key and Value from another head — the building block of Multi-Query Attention. Only the Query projection (Wq) is unique to this head. K and V are computed once elsewhere and reused here, cutting the KV cache cost.
Paid members: open the interactive diagram below ↓


