LLaMA 1 → 2 → 3 → 4

Frontier AI Seminar

May 14, 2025

LLaMA 4 was just released, and despite its lackluster reception, it provides a timely endpoint for this special seminar. I use LLaMA 1 → 2 → 3 → 4 as a backdrop to walk through the major architectural advances in Transformer models over the past few years. My goal is to use a popular open-source model family as a concrete reference point for when and why certain techniques became necessary.

Working inside a live Excel spreadsheet, we start from a minimal Transformer and evolve it step by step. As models scale, we introduce the corresponding innovations that emerged across the research community: changes in attention structure, dimension scaling, head organization, vocabulary growth, and routing mechanisms. Each idea is implemented visually and mechanically, so you can see exactly what changes in the computation and parameter layout.

Concepts like Grouped Query Attention, KV reuse, deeper stacks, and Mixture of Experts are presented as responses to real constraints—memory, latency, and training cost—rather than abstract design choices. By tying these shifts to a recognizable LLaMA timeline, the seminar helps you build an intuitive mental map of how modern Transformer architectures evolved, and how today’s frontier models are assembled from a series of practical, cumulative decisions.

The emphasis is on understanding the logic of progress, not memorizing architectures.

Full Recording (Free Preview)

The full recording and the associated Excel workbook are available to AI by Hand Academy members. You can become a member via a paid Substack subscription.

AI by Hand ✍️

Discussion about this post

Ready for more?