ViT to Llama 1, 2, 3, 4

May 14, 2025

Live Version:

Baseline Version

I will use Vision Transformer (ViT) as the baseline and extend it to Llama 1, 2, 3, 4 live.

Baseline: ViT

+ RMSNorm

+ model dimensions

+ layers

+ RoPE

+ Group Query Attention

+ Sparse Attention

+ Flash Attention

+ context length

Or download

ViT Excel Preview

207KB ∙ XLSX file

AI by Hand ✍️