(P.S. This issue is written for advanced AI engineers and researchers. It is part of the premium Frontier subscription. If you are a beginner, please check out my free lectures, walkthroughs, and Excel exercises. I also share regular announcements of free learning opportunities.)
Yesterday, DeepSeek released version 3.2, formally introducing DSA (DeepSeek Sparse Attention) and releasing both the Python reference implementation and open pre-trained weights. I’ve been busy building an Excel reference implementation to help a partner company grasp the math behind DSA and begin constructing its own version.
For this week’s Frontier issue, I intend to focus on RoPE (Rotary Positional Embedding). The timing of DeepSeek’s 3.2 is perfect. It reinforces why understanding RoPE is no longer optional. RoPE has quickly become a de facto standard in modern transformer architectures since it was published in April 2021, and its influence is only growing.
I created four new sets of worksheets to illustrate the following:
Sine & Cosine Positional Encoding Patterns
Positional Encoding (PE) vs. Rotary Positional Embedding (RoPE)
QKV Self-Attention + Positional Encoding (PE)
QKV Self-Attention + Rotary Positional Embedding (RoPE)
Once I finish explaining RoPE, only a few more building blocks remain — like RMSNorm and SwiGLU — before I’ll have unpacked all the key pieces needed to do a full architecture breakdown. My plan is to work toward a complete Qwen3 breakdown in the coming weeks.
⬇️ Download the worksheets below (for Frontier Subscribers only)