AI by Hand ✍️

AI by Hand ✍️

Qwen 3

Frontier AI Drawings: 13 of 13

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Oct 14, 2025
∙ Paid

Frontier AI Drawings: the series

  1. "Expert Choice" Mixture of Experts (MoE)

  2. MHA, MQA, GQA, MoE-A: More Attention!

  3. New GPT-OSS Trick to Ignore Tokens

  4. MXFP4, FP4, FP8

  5. LoRA, Fine-Tune, Pre-Train

  6. QLoRA, DoRA, BitFit, NF4 vs INT4

  7. KV Cache, Prefill, Decode

  8. EmbeddingGemma, MRL, InfoNCE, Embed vs. Decode

  9. Inference Batching, Request-vs-Token Level

  10. MLP Parallelism: Data, Context, Row, Column, Pipeline

  11. RoPE vs PE in QKV Self-Attention

  12. RMS, Group, Layer, Batch Norm, Tensor Parallelism

  13. Qwen 3

I know many of you have been waiting for this: the full, step-by-step breakdown of Qwen3, presented in the AI by Hand ✍️ way so you can see every tensor and how they click together.

Why Qwen 3?

I chose to breakdown Qwen 3 because it brings together nearly every frontier concept we’ve studied so far, from RoPE and RMSNorm to sparse MoE. It sets the stage for Qwen 3-Next, which adds a state-space model layer, blending two powerful paradigms.

This is where the series comes together. Every concept from the twelve articles before this one shows up in Qwen 3: RoPE for positional encoding, Group-Query Attention for efficient multi-head attention, RMSNorm for normalization, and sparse MoE to activate only a fraction of parameters per token.

I know some of you work in infrastructure, some in training systems, others on model internals. I’ve heard from you that these drawings have become your shortcut to clarity. You’ve told me they help you skip the paper grind, parse the math faster, and get straight to math-accurate implementation. One of you mentioned using the worksheets to explain to colleagues the importance of parameter-efficient fine-tuning, a perfect example of how these materials are finding life beyond just individual study. I’m genuinely glad to hear that, because that’s exactly why I created this series.

Drawings

In this week’s full Qwen3 breakdown, I created the following new drawings:

  • Two-page spread of the full Qwen3 architecture for quick orientation

  • Mini-map that previews each section and how it flows

  • Zoomed-in panels with worksheets that walk through each section of the architecture

Section-by-section detail:

  • Input → Embedding

  • Attention: Group-Query Attention + RoPE

  • Mixture of Experts (MoE):

  • Output stack: RMSNorm → Linear → Softmax → Sample

Each panel shows the exact tensors, shapes, and operations.


Page 1 of 11

Become a member to access the rest of the drawings.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture