Linear Probe

Fine-Tuning series · 5 of 8

Apr 24, 2026

∙ Paid

Library › Models › Fine-Tuning › Linear Probe

In Freezing Layers, we kept the foundational prerequisites fixed and refreshed only the advanced course on top — W3. But even refreshing one course is still a whole course. W3 is a full 40 × 40 matrix — 1600 weights to update — weeks of lectures, assignments, and exams to work through. What if we don't re-take any existing course at all, and instead pick up a single new one-credit certificate — the kind you can finish in a month?

That's a linear probe. The master's degree — your pretrained network — stays exactly as it was, untouched. On top of it, you add one small linear layer: no homework for the old material, just a single combination of subjects you already know, tuned for the new task.

That new layer is Wn — a thin 10 × 40 matrix, just 400 weights. That's 4× fewer trainable parameters than refreshing W3 alone. Everything in W1, W2, and W3 is permanently frozen.

Why bother with something so tiny? Because it's also a probe: if one thin linear layer can already do the downstream task well, that tells you the pretrained features are rich enough on their own. If it can't, you know you need something bigger — and the pretrained features still got you most of the way.

A one-credit certificate is quick and cheap, but it can only form linear combinations of what you already know. The next lesson adds a whole PhD on top — a multi-layer head that can form nonlinear connections too.

← Previous: Freezing Layers | Feature Extraction + Head →

Paid subscribers: open the interactive diagram below ↓

AI by Hand ✍️

Linear Probe

Fine-Tuning series · 5 of 8

This post is for paid subscribers