Thank you to the 900+ people who registered my special lecture yesterday—and especially to the 300+ who showed up live! I was genuinely surprised and encouraged by the turnout, especially for a session this math-heavy. It’s not every day that so many people get excited to go deep into neural networks, attention, and architecture tweaks. Your curiosity and energy made it feel more like a live classroom.
You can download the Excel file I created during the live lecture in the bottom of this post.
Below is a quick recap.
I started with a blank Excel document.
I introduced basic neural networks and weights using a simple multiplication example to make the core ideas intuitive. Then I walked through how neural networks are structured—multiple layers, activation functions to add nonlinearity, and how tokens get processed independently in a feedforward pass. From there, I introduced attention as a way for tokens to share information with each other. I showed how attention lets us combine two tokens to create a new one—capturing relationships that feedforward networks can’t.
With the release of the open-source GPT-OSS model, I was especially excited to dive into two of its core architectural tweaks.
First, GQA (Grouped Query Attention), which makes attention computation faster and more efficient without sacrificing quality.
And second, a clever twist on the Softmax: adding a learnable bias to the denominator. This seemingly small change gives the model the power to completely ignore unimportant tokens—something traditional attention mechanisms struggle with. These ideas aren’t just clever—they’re practical, and they're helping push open models closer to state-of-the-art.
Thanks again for the incredible feedback! Your enthusiasm motivates me—next lecture is coming soon, and I think you’ll love what I’m planning.
Download
Feel free to grab the Excel file below—it has all my hand-drawn notes and formulas from the live lecture.
Hi Prof Tom,
Thank you for this session.
I missed this live session is there anyway I can join offline please kindly let me know ?
Is there a link to the recorded special lecture