When OpenAI released gpt-oss, I noticed something small but important buried in their model card:
“… quantization of the MoE weights to MXFP4 format …”
Almost right away, a leading AI company working on local inference reached out:
“Our AI engineers need to understand MXFP4 …. they need to understand how it fits 120 billion parameters into 80GB GPU memory.”
Lesser-known players have experimented with MXFP4 in various parts of their pipelines. But seeing OpenAI adopt it in gpt-oss tells us this isn’t just a niche trick anymore.
The idea behind MXFP4 is simple — but it’s not explained well in materials you can find online elsewhere. Existing materials are either papers with hard-to-understand equations or articles listing CUDA kernel code. And explaining it in a way where you can actually calculate it by hand ✍️ — that’s the hard part.
To Individual AI Engineers:
I want to thank you for becoming Frontier subscribers — you motivated me to prioritize my time and energy to this challenging task: making worksheets that explain the math and algorithms behind frontier models. Otherwise I probably would need to spend my time writing NSF grant proposals (I hope my department chair is not reading this 😅).
To AI Startups:
I am glad that with the Frontier subscription through Substack, small AI startups like yours can simply buy a group subscription — rather than negotiating a licensing deal with me (or my university). That means your engineers can begin understanding frontier models immediately, keeping up with the most current trends to gain a competitive edge over others.
Worksheets
For this Bonus Frontier Issue, I created four brand-new sets of worksheets:
FP8-E4M3
FP8-E5M2
FP4-E2M1
MXFP4
⬇️ Download the worksheets below (for Frontier Subscribers only)