Fine-Tuning: the series
8 interactive lessons · click any card below to open the post
Library › Models › Fine-Tuning
The Fine-Tuning series
Fine-tuning is how you turn a general-purpose pretrained model into something that actually does your task — and getting it right means knowing which weights to update.
Fittingly, this series is itself a fine-tune: you bring what you already know about basic MLP neural networks, and each lesson specializes that foundation into one fine-tuning technique.
I teach this the way I teach my master's students — through higher-education metaphors. The pretrained network is someone who's finished their bachelor's.
Each of the eight lessons below shows one way to specialize further — retake every subject, refresh just the advanced course, add a certificate, pursue a PhD, invite a private tutor.
Pick any lesson. They form a sequence, but each stands on its own.
1. Weight Update
When a neural network learns, what actually changes? Not the architecture — the shape of the network stays fixed. Not the inputs — those come from outside. What moves is the weights.
Read → https://www.byhand.ai/p/library-models-fine-tuning-weight-update
2. Pretrain vs Fine-Tune
The previous lesson showed W + Δ W = W1 as a single abstract step. That same step shows up in two very different settings — and the setting is what separates pretraining from fine-tuning.
Read → https://www.byhand.ai/p/library-models-fine-tuning-weight-fine-tune
3. Full Fine-Tuning
In the previous lesson, fine-tuning meant updating one weight matrix. A real network has many — three layers in this example, billions of parameters in a production model. What does fine-tuning look like when you update all of them?
Read → https://www.byhand.ai/p/library-models-fine-tuning-full-fine-tuning
4. Freezing Layers
In the previous lesson, full fine-tuning reviewed every prerequisite — Linear Algebra, Probability, Advanced ML — to refresh each subject with the latest topics. Effective, but exhausting.
Read → https://www.byhand.ai/p/library-models-fine-tuning-frozen-layers
5. Linear Probe
In Freezing Layers, we kept the foundational prerequisites fixed and refreshed only the advanced course on top — W3. But even refreshing one course is still a whole course. W3 is a full 40 × 40 matrix — 1600 weights to update — weeks of lectures, assignments, and exams to work through. What if we don't re-take any existing course at all, and instead pick up a single new one-credit certificate — the kind you can finish in a month?
Read → https://www.byhand.ai/p/library-models-fine-tuning-linear-probe
6. Feature Extraction + Head
A feature head is a small trainable MLP bolted onto a frozen pretrained backbone. Think of it as pursuing a PhD on top of a master's degree. The master's — your pretrained backbone — stays exactly as it was, with no review. You aren't re-taking Linear Algebra or Probability; you're building something specialized on top of it: the PhD adds its own coursework, its own nonlinearity, and its own thesis layer.
Read → https://www.byhand.ai/p/library-models-fine-tuning-feature-head
7. Adapter Layers
Houlsby et al., 2019 proposed a different strategy: instead of choosing which layers to freeze, insert small trainable modules — adapters — between every frozen layer.
Read → https://www.byhand.ai/p/library-models-fine-tuning-adapter-layers
8. LoRA
Adapters showed that a low-rank bottleneck can specialize a frozen layer with far fewer parameters than a full ΔW.
Read → https://www.byhand.ai/p/library-models-fine-tuning-lora-intro









