AI by Hand ✍️

AI by Hand ✍️

Share this post

AI by Hand ✍️
AI by Hand ✍️
New GPT-OSS Trick to Ignore Tokens

New GPT-OSS Trick to Ignore Tokens

Frontier Model Math by hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Aug 09, 2025
∙ Paid
10

Share this post

AI by Hand ✍️
AI by Hand ✍️
New GPT-OSS Trick to Ignore Tokens
Share

Big news this week: OpenAI released its latest open-source model, GPT-OSS, along with a tech report—likely a prelude to the long-awaited GPT-5.

Instead of one early-access issue per week, here’s a bonus issue—to respond quickly to what may become a reference point in future architectures.

Over the next few issues, I’ll prioritize architectures and techniques cited in the OpenAI tech reports.

We’re starting with 👉 How transformers can ignore tokens.

I created four new worksheets to explore different approaches to handling irrelevant tokens in attention:

  1. Baseline Attention – the standard Softmax formulation, which always assigns some attention to every token.

  2. 🔥 Learned Bias in the Denominator – the new method used by OpenAI to allow true “ignoring.”

  3. Off-by-One Softmax – adds a fixed bias to suppress weak matches.

  4. Sink Tokens – introduces a special token to absorb low-similarity attention.

Each one solves the same limitation—but in very different ways. You’ll learn how they work, why they matter, and what tradeoffs they introduce.

⬇️ Download the worksheets below (for Frontier Subscribers only)

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Tom Yeh
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share