AI by Hand ✍️

AI by Hand ✍️

Share this post

AI by Hand ✍️
AI by Hand ✍️
"Expert Choice" Mixture of Experts (MoE)

"Expert Choice" Mixture of Experts (MoE)

Frontier Model Math by hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Aug 05, 2025
∙ Paid
8

Share this post

AI by Hand ✍️
AI by Hand ✍️
"Expert Choice" Mixture of Experts (MoE)
Share

Thanks for joining my early access subscription—this is where you’ll get my newest AI by Hand ✍️ worksheets before anyone else. This first release takes you into Expert-Choice Mixture of Experts (MoE) and shows you how this method contrasts with traditional MoEs that use token choice.

Q: Why Expert Choice routing?
A: Because traditional MoE (Token Choice) suffers from load imbalance—some experts get overloaded with tokens while others stay idle—wasting capacity.

Q: How does Expert Choice fix this?
A: By letting experts select their top tokens, it prevents any expert from being overloaded with tokens and keeps computation balanced.

Q: Who invented Expert Choice routing?

Researchers at Google.

Worksheets

⬇️ Download the worksheets below (for Frontier Subscribers only)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Tom Yeh
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share