AI by Hand ✍️

AI by Hand ✍️

EmbeddingGemma, MRL, InfoNCE, Embed vs. Decode

Frontier Model Math by hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Sep 09, 2025
∙ Paid
15
Share
Source: Introducing EmbeddingGemma (Google 2025)

(P.S. This issue is written for advanced AI engineers and researchers. It is part of the premium Frontier subscription. If you are a beginner, please check out my free lectures, walkthroughs, and Excel exercises. I also share regular announcements of free learning opportunities.)

In last week’s Frontier issue, I opened with Gemma 3 to motivate KV Cache. The same week, Google announced EmbeddingsGemma.

Coincidence? Perhaps. 😉

One key feature highlighted is customizable output dimension. This means you can pick the embedding size that best fits your application.

For example, you might choose smaller vectors to speed up product search in e-commerce or FAQ retrieval in customer support, and larger vectors to maximize accuracy in legal document ranking, scientific literature search, or medical record clustering.

How is this flexibility achieved? It is achieved through Matryoshka Representation Learning (MRL). Matryoshka is the Russian word for Russian dolls. In the same way that dolls nest inside each other, embeddings are learned in nested layers of different scales, so a large embedding contains progressively smaller ones inside.

Both Qwen3 Embedding (released back in June) and EmbeddingsGemma now come with baked-in MRL, which means this approach is no longer experimental. It’s becoming the mainstream frontier for Transformer embeddings.

For this issue, I have created five new sets of worksheets.

  1. Decode

  2. Embed

  3. Information Noise Contrastive Estimation (InfoNCE)

  4. Matryoshka Representation Learning (MRL)

  5. Fine-tune Embedding Model by MRL

You can see contrasts between

  • Decode vs. Embed

    • Both share the same Transformer backbone.

    • They only differ in the last layer: Decode projects embeddings into the (large) vocabulary for generation, while Embed produces (small) dense vectors for retrieval and similarity.

  • InfoNCE vs. MRL

    • Both share the contrastive learning framework of pulling positives together and pushing negatives apart.

    • They only differ in the levels: InfoNCE operates at a single embedding level, while MRL enforces consistency across multiple nested levels simultaneously.

Finally, the last set of worksheets brings everything together. You fine-tune an embedding model from pairs of text anchors and their respective positive examples, but this time you train with MRL so the model learns embeddings at multiple dimensions (i.e., 8, 4, 2) in one shot.

⬇️ Download the worksheets below (for Frontier Subscribers only)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Tom Yeh
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture