Grouped-Query Attention
Attention series: 11 of 11
Attention Series:
Grouped-Query Attention (Ainslie et al., 2023) is the middle ground between Multi-Head Attention and Multi-Query Attention. Instead of giving every head its own K and V (MHA) or sharing one K and V across all heads (MQA), GQA splits heads into groups. Each group shares one set of K and V projections.
Paid members: open the interactive diagram below ↓


