Full Fine-Tuning
Fine-Tuning series · 3 of 8
Library › Models › Fine-Tuning › Full Fine-Tuning
In the previous lesson, fine-tuning meant updating one weight matrix. A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them?
That's full fine-tuning: continue training every weight in the pretrained network on your new task. Every layer's W gets its own ΔW. Nothing is frozen — every parameter is in play.
Think of an MLP as a chain of prerequisites leading to an advanced course. Layer 1 might be Linear Algebra, layer 2 Probability, layer 3 Advanced Machine Learning — each one building on what came before.
Fine-tuning is what happens during graduate study: the foundations are already there from undergrad, so you're not re-learning. Full fine-tuning is reviewing every prerequisite to see what new topics have appeared and what discoveries the field has made since the last time you sat through them. Effective — but exhausting.
This diagram shows the same three-layer MLP twice, side by side.
On the left, the pretrained network runs on input X: three weight matrices W₁, W₂, W₃, each followed by a ReLU activation.
In the middle, one update equation per layer:
W'i = Wi + Δ Wi
On the right, the same network after the update — same shape, same flow, but every weight is now its updated version W'i. The arrows trace each Wi flowing down into its update, and each W'i flowing back up into its layer on the right.
Full fine-tuning gives the model the most freedom to specialize. Every parameter can move — and every parameter that can move must be stored.
In this small example, the trainable count is 40 × 30 + 40 × 40 + 40 × 40 = 4400 values. For a production model with billions of parameters, fine-tuning for one task means saving a billion-parameter ΔW set. Ten downstream tasks means ten complete model copies.
But not every prerequisite needs revisiting. The further you go back in the chain, the less the material has changed since pretraining — the linear-algebra basics under your computer-vision course are largely the same as they ever were. The next lesson does exactly that: freeze the prerequisites that haven't moved, and only refresh the advanced one closest to your specialization.
← Previous: Pretrain vs Fine-Tune | Freezing Layers →
Paid subscribers: open the interactive diagram below ↓


