AI by Hand ✍️

AI by Hand ✍️

Attention

💡Foundation AI Seminar Series

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Jan 15, 2026
∙ Paid

This week’s Foundation seminar focused on rebuilding intuition for attention, starting from the famous paper Attention Is All You Need, which introduced the Transformer and reshaped how modern language models are built.

I began by framing tokens as vectors—units of information—and explained attention as a way to produce new tokens by combining existing ones with different weights. Each output token is simply a weighted mixture of earlier tokens, not a mysterious operation.

We then walked through scaled dot-product attention step by step. Tokens are projected into queries, keys, and values; dot products compare what a token is looking for with what others can provide; scaling keeps the numbers stable; and softmax turns scores into probabilities. I emphasized that matrix multiplication is just many dot products happening in parallel.

From there, we extended the idea to multi-head attention. Instead of one comparison space, multiple heads run in parallel, each learning a different way to relate tokens. Their outputs are concatenated and projected back to the original dimension so the next layer can proceed cleanly.

The main takeaway was simple: attention looks intimidating on paper, but once you unpack it, it’s a clear and systematic way to mix information across tokens—the core idea that makes Transformers work.

Recording & Excel Workbook

The full recording and the associated Excel workbook are available to AI by Hand Academy members. You can become a member via a paid Substack subscription.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture