AI by Hand ✍️

AI by Hand ✍️

DeepSeek OCR Excel Blueprint

Frontier Model Math by hand ✍️

Prof. Tom Yeh's avatar
Prof. Tom Yeh
Oct 28, 2025
∙ Paid
13
Share

(P.S. This issue is written for advanced AI engineers and researchers. It is part of the premium Frontier subscription. If you are a beginner, please check out my free Foundation series, lectures, walkthroughs, and Excel exercises.)

DeepSeek OCR was released last week. What’s the big deal?

Don’t we already have a lot of OCR tools that work well enough?

If you’re thinking this way, I hope this Blueprint and my explanation below can help you see its true significance, and prepare you for what’s next.

DeepSeekOCR stacks three Transformers into one tower.

They all share a common backbone—pre-LayerNorm, multi-head attention, residual add, LayerNorm, and FFN with GeLU.

Below I use CLIP ViT to illustrate this common backbone.

But they differ in other ways, such as context size, attention mask and position embedding methods.

Why?

By the end of this long article, I hope you’ll have an intuition for these design choices—so you can make similar ones yourself in the future.

Another reason I like this topic is that it lets me compare real models. Whenever I say “this is from DeepSeek,” it grabs my students’ attention much more than “this came from an academic paper.”

👇 Scroll to the bottom to download the Excel Blueprint.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Tom Yeh
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture