MXFP4, FP4, FP8

Frontier Model Math by hand ✍️

Prof. Tom Yeh

Aug 14, 2025

∙ Paid

Source: gpt-oss-120b & gpt-oss-20b Model Card (OpenAI)

When OpenAI released gpt-oss, I noticed something small but important buried in their model card:

“… quantization of the MoE weights to MXFP4 format …”

Almost right away, a leading AI company working on local inference reached out:

“Our AI engineers need to understand MXFP4 …. they need to understand how it fits 120 billion parameters into 80GB GPU memory.”

Lesser-known players have experimented with MXFP4 in various parts of their pipelines. But seeing OpenAI adopt it in gpt-oss tells us this isn’t just a niche trick anymore.

The idea behind MXFP4 is simple — but it’s not explained well in materials you can find online elsewhere. Existing materials are either papers with hard-to-understand equations or articles listing CUDA kernel code. And explaining it in a way where you can actually calculate it by hand ✍️ — that’s the hard part.

To Individual AI Engineers:

I want to thank you for becoming Frontier subscribers — you motivated me to prioritize my time and energy to this challenging task: making worksheets that explain the math and algorithms behind frontier models. Otherwise I probably would need to spend my time writing NSF grant proposals (I hope my department chair is not reading this 😅).

To AI Startups:

I am glad that with the Frontier subscription through Substack, small AI startups like yours can simply buy a group subscription — rather than negotiating a licensing deal with me (or my university). That means your engineers can begin understanding frontier models immediately, keeping up with the most current trends to gain a competitive edge over others.

Worksheets

For this Bonus Frontier Issue, I created four brand-new sets of worksheets:

FP8-E4M3
FP8-E5M2
FP4-E2M1
MXFP4

⬇️ Download the worksheets below (for Frontier Subscribers only)

AI by Hand ✍️

MXFP4, FP4, FP8

Frontier Model Math by hand ✍️

Worksheets

This post is for paid subscribers