PPO → DPO → GRPO→ Rubrics

AI by Hand ✍️ Seminar Series

Mar 02, 2026

∙ Paid

In last week’s AI by Hand ✍️ seminar, I talked about reinforcement learning from first principles. I started all the way back from pre-training and inference, before climbing forward through the major evolution stages of the modern RL stack: PPO → DPO → GRPO. I finally arrived at Rubrics, widely considered one of today’s frontier topics in RL research.

I was joined by Cameron R. Wolfe, author of the Deep (Learning) Focus newsletter (60K subscribers) and a Senior Research Scientist at Netflix. He gave a special guest lecture based on his popular recent article Rubric-based Rewards for RL.

After eight seminars in 2026, I’ve settled into a format that has clearly resonated:

AI by Hand Lecture → Industry Guest Interview → Industry Guest Lecture.

I’m the professor in the first half—and the student in the second. 🙌

Feedback

Industry Expert: Cameron Wolfe

As a professor in academia, how can I possibly know what kind of research is really going on in the industry? I turn to people like Cameron. I've been reading his long-form articles long before I even started sharing my own content through this AI by Hand newsletter. Cameron was one of the early supporters who encouraged me to share more publicly.

I still remember two years ago when he told me he had just gotten married and was moving to Netflix to become a Senior Research Scientist working on reinforcement learning. He mentioned he might need to pause the Deep Focus newsletter for a while. Fortunately for all of us, the break was short. Cameron kept writing. I kept learning.

In the seminar, I finally got a chance to interview him so his story can be heard by all of you:

Q: Can you use Jiu-Jitsu as a metaphor to explain reinforcement learning?
Q: Reinforcement learning used to feel like robotics demos. Why has RL become so important for frontier language models?
Q: What are the main challenges of doing RL research at large scale?
Q: What motivated you to write the Deep Focus newsletter?

Watch the recording to hear Cameron explain it in his own words.

Recording & Workbook

The full recording and the associated Excel workbook are available to AI by Hand Academy members. You can become a member via a paid Substack subscription.

AI by Hand ✍️

PPO → DPO → GRPO→ Rubrics

AI by Hand ✍️ Seminar Series

Feedback

Industry Expert: Cameron Wolfe

Recording & Workbook

This post is for paid subscribers