AI by Hand ✍️

AI by Hand ✍️

GELU

Activation series: 8 of 12

Prof. Tom Yeh's avatar
Prof. Tom Yeh
May 11, 2026
∙ Paid

Activation Series:

  1. Softmax

  2. Sigmoid

  3. Tanh

  4. ReLU

  5. Leaky ReLU

  6. ELU

  7. SiLU

  8. GELU

  9. Log-Sum-Exp

  10. Softplus

  11. GLU

  12. SwiGLU

GELU (Gaussian Error Linear Unit) is SiLU's more decisive sibling: same x · gate structure, but the gate now uses the Gaussian CDF Φ(x) instead of sigmoid σ(x). That swap is what made GELU the activation across BERT, GPT-2/3, T5, and ViT.

Paid members: open the interactive diagram below ↓

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture