QKV Projection
Attention series: 1 of 11
Attention Series:
The first step of Self Attention is projecting the input X into three separate spaces using learned weight matrices. Wq produces Queries, Wk produces Keys, and Wv produces Values. Each is a simple matrix multiplication: Q = Wq × X, K = Wk × X, V = Wv × X.
Q and K must share the same row count (Key Dimension) so their dot product works in the attention computation. V can have a different row count (Value Dimension), which determines the output size.
The model size is the number of rows in the input X. The key and value dimensions are usually smaller than the model size, which means these projections compress the input down to a smaller representation.


