AI by Hand ✍️

AI by Hand ✍️

3. New GPT-OSS Trick to Ignore Tokens

Frontier AI Drawings by Hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Aug 09, 2025
∙ Paid

Big news this week: OpenAI released its latest open-source model, GPT-OSS, along with a tech report—likely a prelude to the long-awaited GPT-5.

Instead of one early-access issue per week, here’s a bonus issue—to respond quickly to what may become a reference point in future architectures.

Over the next few issues, I’ll prioritize architectures and techniques cited in the OpenAI tech reports.

We’re starting with 👉 How transformers can ignore tokens.

Drawings

I created four new drawings to explore different approaches to handling irrelevant tokens in attention:

  1. Baseline Attention – the standard Softmax formulation, which always assigns some attention to every token.

  2. 🔥 Learned Bias in the Denominator – the new method used by OpenAI to allow true “ignoring.”

  3. Off-by-One Softmax – adds a fixed bias to suppress weak matches.

  4. Sink Tokens – introduces a special token to absorb low-similarity attention.

Each one solves the same limitation—but in very different ways. You’ll learn how they work, why they matter, and what tradeoffs they introduce.

Download

The Frontier AI drawings are available to AI by Hand Academy members. You can become a member via a paid Substack subscription.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture