GELU
Activation series: 8 of 12
GELU (Gaussian Error Linear Unit) is SiLU's more decisive sibling: same x · gate structure, but the gate now uses the Gaussian CDF Φ(x) instead of sigmoid σ(x). That swap is what made GELU the activation across BERT, GPT-2/3, T5, and ViT.
Paid members: open the interactive diagram below ↓



