(P.S. This issue is written for advanced AI engineers and researchers. It is part of the premium Frontier subscription. If you are a beginner, please check out my free lectures, walkthroughs, and Excel exercises. I also share regular announcements of free learning opportunities.)
In last week’s Frontier issue, I opened with Gemma 3 to motivate KV Cache. The same week, Google announced EmbeddingsGemma.
Coincidence? Perhaps. 😉
One key feature highlighted is customizable output dimension. This means you can pick the embedding size that best fits your application.
For example, you might choose smaller vectors to speed up product search in e-commerce or FAQ retrieval in customer support, and larger vectors to maximize accuracy in legal document ranking, scientific literature search, or medical record clustering.
How is this flexibility achieved? It is achieved through Matryoshka Representation Learning (MRL). Matryoshka is the Russian word for Russian dolls. In the same way that dolls nest inside each other, embeddings are learned in nested layers of different scales, so a large embedding contains progressively smaller ones inside.
Both Qwen3 Embedding (released back in June) and EmbeddingsGemma now come with baked-in MRL, which means this approach is no longer experimental. It’s becoming the mainstream frontier for Transformer embeddings.
For this issue, I have created five new sets of worksheets.
Decode
Embed
Information Noise Contrastive Estimation (InfoNCE)
Matryoshka Representation Learning (MRL)
Fine-tune Embedding Model by MRL
You can see contrasts between
Decode vs. Embed
Both share the same Transformer backbone.
They only differ in the last layer: Decode projects embeddings into the (large) vocabulary for generation, while Embed produces (small) dense vectors for retrieval and similarity.
InfoNCE vs. MRL
Both share the contrastive learning framework of pulling positives together and pushing negatives apart.
They only differ in the levels: InfoNCE operates at a single embedding level, while MRL enforces consistency across multiple nested levels simultaneously.
Finally, the last set of worksheets brings everything together. You fine-tune an embedding model from pairs of text anchors and their respective positive examples, but this time you train with MRL so the model learns embeddings at multiple dimensions (i.e., 8, 4, 2) in one shot.
⬇️ Download the worksheets below (for Frontier Subscribers only)



