Adapter Layers
Fine-Tuning series · 7 of 8
Library › Models › Fine-Tuning › Adapter Layers
Houlsby et al., 2019 proposed a different strategy: instead of choosing which layers to freeze, insert small trainable modules — adapters — between every frozen layer.
Each adapter has a down-projection (d → r), a nonlinearity, and an up-projection (r → d). The bottleneck dimension r is much smaller than d, so each adapter adds very few parameters. The pretrained weights stay completely frozen.
With rank r = 8 and hidden size d = 64, each adapter has only 2 × r × d = 1024 trainable parameters.
Across the two layers of this network:
The first layer projects from input to hidden, so W₁ has d × input = 2048 weights.
The second layer has d × d = 4096 weights.
Full fine-tuning would update all 6144 of them.
Adapter fine-tuning adds only 2 × r × d = 1024 new trainable weights — about a 3× reduction.
← Previous: Feature Extraction + Head | LoRA →
Paid subscribers: open the interactive diagram below ↓


