AI by Hand ✍️

AI by Hand ✍️

4. MXFP4, FP4, FP8

Frontier AI Drawings by Hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Aug 14, 2025
∙ Paid
Source: gpt-oss-120b & gpt-oss-20b Model Card (OpenAI)

When OpenAI released gpt-oss, I noticed something small but important buried in their model card:

“… quantization of the MoE weights to MXFP4 format …”

Almost right away, a leading AI company working on local inference reached out:

“Our AI engineers need to understand MXFP4 …. they need to understand how it fits 120 billion parameters into 80GB GPU memory.”

Lesser-known players have experimented with MXFP4 in various parts of their pipelines. But seeing OpenAI adopt it in gpt-oss tells us this isn’t just a niche trick anymore.

The idea behind MXFP4 is simple — but it’s not explained well in materials you can find online elsewhere. Existing materials are either papers with hard-to-understand equations or articles listing CUDA kernel code. And explaining it in a way where you can actually calculate it by hand ✍️ — that’s the hard part.

Drawings

For this Issue, I created four new drawings:

  1. FP8-E4M3

  2. FP8-E5M2

  3. FP4-E2M1

  4. MXFP4

Download

The Frontier AI drawings are available to AI by Hand Academy members. You can become a member via a paid Substack subscription.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture