AI by Hand ✍️

AI by Hand ✍️

Qwen 3

Frontier Model Math by hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Oct 14, 2025
∙ Paid
14
2
Share

(P.S. This issue is written for advanced AI engineers and researchers. It is part of the premium Frontier subscription. If you are a beginner, please check out my free lectures, walkthroughs, and Excel exercises. I also share regular announcements of free learning opportunities.)

I know many of you have been waiting for this: the full, step-by-step breakdown of Qwen3—presented in the AI by Hand ✍️ way so you can see every tensor and how they click together.

Why Qwen 3?

I chose to breakdown Qwen 3 because it brings together nearly every frontier concept we’ve studied so far — from RoPE and RMSNorm to sparse MoE. It sets the stage for Qwen 3-Next, which adds a state-space model layer, blending two powerful paradigms.

This issue is the culmination of the past three months — a full-circle moment where everything we’ve covered so far comes together into one complete picture.

Here’s a review of the first 12 issues:

  1. “Expert Choice” Mixture of Experts (MoE)

  2. New GPT-OSS Trick to Ignore Tokens

  3. MHA, MQA, GQA, MoE-A: More Attention!

  4. MXFP4, FP4, FP8

  5. LoRA, Fine-Tune, Pre-Train

  6. QLoRA, DoRA, BitFit, NF4 vs INT4

  7. KV Cache, Prefill, Decode

  8. EmbeddingGemma, MRL, InfoNCE, Embed vs. Decode

  9. Inference Batching, Request-vs-Token Level

  10. MLP Parallelism: Data, Context, Row, Column, Pipeline

  11. RoPE vs PE in QKV Self-Attention

  12. RMS, Group, Layer, Batch Norm, Tensor Parallelism

First 12 issues of the AI by Hand ✍️ Frontier Subscription (100+ new worksheets)

I know some of you work in infrastructure, some in training systems, others on model internals — and I’ve heard from you that these worksheets have become your shortcut to clarity. You’ve told me they help you skip the paper grind, parse the math faster, and get straight to math-accurate implementation. One of you mentioned using the worksheets to explain to colleagues the importance of parameter-efficient fine-tuning — a perfect example of how these materials are finding life beyond just individual study. I’m genuinely glad to hear that, because that’s exactly why Frontier exists.

Worksheets

In this week’s full Qwen3 breakdown, I created the following new worksheets:

  • Two-page spread of the full Qwen3 architecture for quick orientation

  • Mini-map that previews each section and how it flows

  • Zoomed-in panels with worksheets that walk through each section of the architecture

Section-by-section detail:

  • Input → Embedding

  • Attention: Group-Query Attention + RoPE

  • Mixture of Experts (MoE):

    • Router

    • Token Routing → Expand → Gate

    • Value → Gated Value → Contract → Mixture → Add

  • Output stack: RMSNorm → Linear → Softmax → Sample

Each panel shows the exact tensors, shapes, and operations.

⬇️ Download the worksheets below (for Frontier Subscribers only)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Tom Yeh
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture