SiLU
Activation series · 7 of 12
Activation › SiLU
SiLU (Sigmoid Linear Unit, also called Swish) is the activation inside the feed-forward layers of Llama, Mistral, Mixtral, Gemma, and most modern open-weight LLMs. The whole idea is one move on top of sigmoid: use the sigmoid value as the fraction of the input that passes through.
The Fate of Five Boba Shops (4 of 5)
A new judge takes the bench, more nuanced than the previous ones. Leaky ReLU gave every shop in the red the same flat 10% deal; ELU added a credit limit but still used a fixed shape on the way down. SiLU adjusts the rate per shop: the pass-through slides continuously with profit, not stuck at any fixed value.
The mechanism is sigmoid σ(x), the S-shaped probability between 0 and 1 that becomes the pass-through fraction. A healthy shop earning 3K profit keeps about 95%, nearly the full 3K. A distressed one losing 1K only carries 27% of the debt (about 0.27K), the rest restructured away. A deeply troubled shop losing 4K sees only 2% stick; nearly the whole loss forgiven. Multiply each shop's profit by its own pass-through fraction and you have SiLU.
Walking through the Math
1. Profit: each shop's profit x.
2. Pass-through: sigmoid σ(x), the fraction of x that survives, bounded [0, 1].
3. Output: profit times pass-through, x · σ(x) = SiLU(x).
Healthy shops keep nearly everything; distressed shops still carry a meaningful chunk of their debt; deeply troubled shops have almost the whole loss forgiven, though never quite zero (sigmoid always leaks a sliver). Element-wise, like its predecessors.
Reading the Numbers
How does the hesitating sigmoid judge decide?
Even at -10, the gate never fully closes — sigmoid always leaks a sliver. SiLU is reluctant to write anything off completely.
← Previous:
ELU
Next:
GELU →




