AI by Hand ✍️

AI by Hand ✍️

11. RoPE vs PE in QKV Self-Attention

Frontier AI Drawing by Hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Sep 30, 2025
∙ Paid

Source: DeepSeek V3.2 Technical Report (Sep. 2025)

Yesterday, DeepSeek released version 3.2, formally introducing DSA (DeepSeek Sparse Attention) and releasing both the Python reference implementation and open pre-trained weights. I’ve been busy building an Excel reference implementation to help a partner company grasp the math behind DSA and begin constructing its own version.

For this week’s issue, I intend to focus on RoPE (Rotary Positional Embedding). The timing of DeepSeek’s 3.2 is perfect. It reinforces why understanding RoPE is no longer optional. RoPE has quickly become a de facto standard in modern transformer architectures since it was published in April 2021, and its influence is only growing.

I created four new sets of drawings to illustrate the following:

  1. Sine & Cosine Positional Encoding Patterns

  2. Positional Encoding (PE) vs. Rotary Positional Embedding (RoPE)

  3. QKV Self-Attention + Positional Encoding (PE)

  4. QKV Self-Attention + Rotary Positional Embedding (RoPE)

Once I finish explaining RoPE, only a few more building blocks remain — like RMSNorm and SwiGLU — before I’ll have unpacked all the key pieces needed to do a full architecture breakdown. My plan is to work toward a complete Qwen3 breakdown in the coming weeks.

Download

The Frontier AI drawings are available to AI by Hand Academy members. You can become a member via a paid Substack subscription.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Tom Yeh · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture