Google Ironwood TPU: From Bits to HBM
Foundation AI Seminar Series
My next Foundation seminar will be about TPUs. I’ll take you from the 0/1 bits all the way to high-bandwidth memory (HBM) in a beginner-friendly way by hand ✍️.
I’m excited to be joined by a special guest expert, Shanya Chaubey from Micron Technology, who will help answer your HBM questions live in the chat—since her company is a major HBM supplier for Google’s new Ironwood TPUs.
Here’s an early sketch of the memory hierarchy drawing I’ve created for this seminar:
Bit – The most basic unit of information, the on–off decision from which every number, tensor, and model state is ultimately constructed.
FP8 (1×8 → 8 bits) – Eight bits are grouped to form a floating-point value, typically used for inference, where reduced precision is a deliberate trade-off to maximize throughput and efficiency.
BF16 (×2 → 16 bits) – Two FP8-scale chunks are combined to gain more dynamic range and stability, while still staying friendly to high-throughput hardware.
Tensor tile (×1024 → 1K) – Data moves through the chip in blocks of 1024 values at a time, defining the granularity at which tensors are fetched and manipulated.
Matrix Multiplication Unit (MXU) (×64 → 64K) – A systolic array where matrix multiplication is not abstract but physical, with tensor tiles flowing through fixed hardware to achieve the highest possible throughput.
Vector Memory (VMEM) (×2048 → 128M) – On-chip working memory that holds activations, partial results, and intermediates, sized specifically to keep the systolic array busy without stalling.
Common Memory (CMEM) (×8 → 1 GB) – A small but critical shared memory sitting between VMEM and HBM, used for staging, accumulation, synchronization, and cross-lane coordination.
HBM (×96 → 96 GB) – Off-chip high-bandwidth memory where model weights and large states live, implemented as HBM3e with 16 stacks at 6 GB each, for a total of 96 GB.
See you live at the seminar!



