AI by Hand ✍️

AI by Hand ✍️

Freezing Layers

Fine-Tuning series · 4 of 8

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Apr 24, 2026
∙ Paid

Library › Models › Fine-Tuning › Freezing Layers

In the previous lesson, full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting.

Then you realize something. The prerequisites haven't actually changed that much. Linear Algebra is still Linear Algebra; the matrix decompositions you learned still hold. Probability is still Probability; the distributions and Bayes' rule haven't moved. Almost all the new material — the new ideas, the recent discoveries — lives in the advanced layer at the top.

That's freezing layers: keep the prerequisite layers fixed at their pretrained state, and only update the advanced one. In the diagram below, W1 and W2 — the foundational prerequisites — stay frozen. Only W3 — the layer closest to your task-specific output — gets a ΔW.

Its update flows through the equation:

W'3 = W3 + Δ W3

The right column shows the network after the update. W'1 and W'2 are identical to W1 and W2 — nothing changed there. Only W'3 is new.

How much did we save?

Full fine-tuning would train all three layers:

40 × 30 + 40 × 40 + 40 × 40 = 4400

parameters.

Freezing layers 1 and 2 leaves only:

40 × 40 = 1600

parameters trainable. The other 2800 parameters are skipped — no ΔW, no gradient, no storage.

For a real-world model with billions of parameters, freezing the first 80% of the network can shrink fine-tuning cost by an order of magnitude.

There's a second, subtler benefit. The frozen prerequisites can't drift, so the model can't forget its foundations — Linear Algebra stays Linear Algebra. This problem has a name: catastrophic forgetting — the new task overwrites old knowledge. Frozen layers prevent it simply by not letting those weights move.

The next lesson takes the idea even further: freeze all the prerequisites and bolt a brand-new advanced course on at the end.


← Previous: Full Fine-Tuning | Linear Probe →

Paid subscribers: open the interactive diagram below ↓

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture