Big news this week: OpenAI released its latest open-source model, GPT-OSS, along with a tech report—likely a prelude to the long-awaited GPT-5.
Instead of one early-access issue per week, here’s a bonus issue—to respond quickly to what may become a reference point in future architectures.
Over the next few issues, I’ll prioritize architectures and techniques cited in the OpenAI tech reports.
We’re starting with 👉 How transformers can ignore tokens.
I created four new worksheets to explore different approaches to handling irrelevant tokens in attention:
Baseline Attention – the standard Softmax formulation, which always assigns some attention to every token.
🔥 Learned Bias in the Denominator – the new method used by OpenAI to allow true “ignoring.”
Off-by-One Softmax – adds a fixed bias to suppress weak matches.
Sink Tokens – introduces a special token to absorb low-similarity attention.
Each one solves the same limitation—but in very different ways. You’ll learn how they work, why they matter, and what tradeoffs they introduce.
⬇️ Download the worksheets below (for Frontier Subscribers only)