Softplus
Activation series · 10 of 12
Activation › Softplus
Softplus is the simplest case of log-sum-exp: a single input weighed against a fixed baseline of zero. It's two-input LSE with the second input pinned to e⁰ = 1. Outside hidden layers, softplus is the natural form of binary cross-entropy loss: −log(σ(x)) = softplus(−x).
Typhoon Aftermath (2 of 4)
The coast quiets to a single storm. Where LSE combined multiple typhoons into one equivalent rating, softplus is the special case: take a single storm and weigh it against a fixed Cat 0 baseline (atmospheric calm).
The baseline is always there: a quiet "no storm" reference that the storm has to overcome. Because air movement is never negative (no anti-storms), inputs and outputs both live in [0, ∞). The smoothing at zero is what distinguishes softplus from ReLU: the corner becomes a curve passing through ln(2) ≈ 0.69 instead of meeting at a sharp point.
Walking through the Math
1. Inputs: the storm's category x (editable) and a fixed baseline of 0 (atmospheric calm).
2. Damages: exponentiate both, eˣ and e⁰ = 1.
3. Total: sum the two damages, Z = eˣ + 1.
4. Back to category: take the log, ln(Z) = softplus(x).
This is LSE on two inputs with the second pinned to 0; the storm is always weighed against the constant calm. Same exp → sum → log recipe as LSE, just shrunk to two inputs.
Reading the Numbers
How does softplus respond at each storm category, and how does it compare to ReLU, the sharp-cornered cousin from the boba arc?
Softplus is essentially ReLU with the corner at zero smoothed into a curve. Far from zero, the two are indistinguishable: at Cat 5, the gap is just 0.007. The maximum difference is exactly ln 2 ≈ 0.69, right at x = 0, where softplus sits at the calm-vs-calm tie and ReLU is the sharp transition between zero output and pass-through. That's why softplus is sometimes called the "smooth ReLU." The name reflects this: "plus" is the old mathematical term for the positive-part function x₊ = max(x, 0), and "soft" softens the corner.
Diving into Equations
Softplus is LSE on two inputs with the second pinned to zero:
The smooth-maximum behavior carries over: when x is large, softplus tracks x; when x ≤ 0, the baseline e⁰ = 1 dominates and the output flattens toward zero.
The most beautiful fact about softplus is its slope: at every point it's exactly sigmoid.
Sigmoid is the speedometer; softplus is the odometer. At a strong storm, the speedometer reads near 1, so softplus climbs nearly parallel to y = x. At Cat 0, the speedometer reads 0.5, so softplus rises at half-speed through ln(2).
← Previous:
Log-Sum-Exp
Next:
GLU →






