New GPT-OSS Trick to Ignore Tokens

Aug 09, 2025

∙ Paid

Big news this week: OpenAI released its latest open-source model, GPT-OSS, along with a tech report—likely a prelude to the long-awaited GPT-5.

Instead of one early-access issue per week, here’s a bonus issue—to respond quickly to what may become a reference point in future architectures.

Over the next few issues, I’ll prioritize architectures and techniques cited in the OpenAI tech reports.

We’re starting with 👉 How transformers can ignore tokens.

I created four new worksheets to explore different approaches to handling irrelevant tokens in attention:

Baseline Attention – the standard Softmax formulation, which always assigns some attention to every token.
🔥 Learned Bias in the Denominator – the new method used by OpenAI to allow true “ignoring.”
Off-by-One Softmax – adds a fixed bias to suppress weak matches.
Sink Tokens – introduces a special token to absorb low-similarity attention.

Each one solves the same limitation—but in very different ways. You’ll learn how they work, why they matter, and what tradeoffs they introduce.

⬇️ Download the worksheets below (for Frontier Subscribers only)

AI by Hand ✍️