SiLU
Activation series: 7 of 12
SiLU (Sigmoid Linear Unit, also called Swish) is the activation inside the feed-forward layers of Llama, Mistral, Mixtral, Gemma, and most modern open-weight LLMs. The whole idea is one move on top of sigmoid: use the sigmoid value as the fraction of the input that passes through.
Paid members: open the interactive diagram below ↓



