AI by Hand ✍️

AI by Hand ✍️

2. MHA, MQA, GQA, MoE-A: More Attention!

Frontier AI Drawings by Hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Aug 11, 2025
∙ Paid

More attention for… better attention ✍️

This week’s we zoom out to the bigger picture—how attention itself has evolved.

📌 Why this matters:
GQA showing up in GPT-OSS is yet another sign of its wide adoption—mirroring trends we’ve seen in PaLM, LLaMA, and other large-scale models. These are no longer niche optimizations; they’re becoming standard in high-performance architectures.

Drawings

I’ve created four new drawings breaking down the math behind:

  1. Multi-Head Attention (MHA) – the original transformer workhorse.

  2. Multi-Query Attention (MQA) – shares keys and values across heads for speed.

  3. Grouped-Query Attention (GQA) – the middle ground between MHA and MQA, used in GPT-OSS and increasingly in other frontier models.

  4. Mixture-of-Experts Attention (MoE-A) – routes attention to specialized experts for scalability.

Download

The Frontier AI drawings are available to AI by Hand Academy members. You can become a member via a paid Substack subscription.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture