Freezing Layers
Fine-Tuning series · 4 of 8
Library › Models › Fine-Tuning › Freezing Layers
In the previous lesson, full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting.
Then you realize something. The prerequisites haven't actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes' rule haven't moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top.
That's freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.
Its update flows through the equation:
W'3 = W3 + Δ W3
The right column shows the network after the update. W'1 and W'2 are identical to W1 and W2 — nothing changed there. Only W'3 is new.
How much did we save?
Full fine-tuning would train all three layers:
40 × 30 + 40 × 40 + 40 × 40 = 4400
parameters.
Freezing layers 1 and 2 leaves only:
40 × 40 = 1600
parameters trainable. The other 2800 parameters are skipped — no ΔW, no gradient, no storage.
For a real-world model with billions of parameters, freezing the first 80% of the network can shrink fine-tuning cost by an order of magnitude.
There's a second, subtler benefit. The frozen prerequisites can't drift, so the model can't forget its foundations — Linear Algebra stays Linear Algebra. This problem has a name: catastrophic forgetting — the new task overwrites old knowledge. Frozen layers prevent it simply by not letting those weights move.
The next lesson takes the idea even further: freeze all the prerequisites and bolt a brand-new advanced course on at the end.
← Previous: Full Fine-Tuning | Linear Probe →
Paid subscribers: open the interactive diagram below ↓


