The Split Matters
Token problems: 18 of 20
Token problems: the workbook
The Split Matters
Can two calls with the same total tokens finish at very different times? Yes. Input is read in one parallel pass; output is written one token at a time. A task that reads a lot and writes a little will almost always finish faster than one that reads a little and writes a lot. Agents designed to summarize or classify can be significantly faster than agents designed to generate long replies, even at the same token budget.
Problem
A model reads input at 500 tokens per second and writes output at 50 tokens per second. Task A summarizes a long document: 2,000 input tokens, 500 output tokens. Task B expands a short outline into a full article: 500 input tokens, 2,000 output tokens. Both tasks process 2,500 tokens in total. Which finishes first?
Time for task A
Time for task B
Compare
Practice 1
A model reads input at 500 tokens per second and writes output at 50 tokens per second. Task A answers a question about a long report: 2,500 input tokens, 500 output tokens. Task B drafts a detailed report from a short brief: 500 input tokens, 2,500 output tokens. Both tasks process 3,000 tokens in total. Which finishes first?
Time for task A
Time for task B
Compare
Practice 2
A model reads input at 500 tokens per second and writes output at 50 tokens per second. Task A classifies a batch of messages: 1,500 input tokens, 500 output tokens. Task B writes replies from short descriptions: 500 input tokens, 1,500 output tokens. Both tasks process 2,000 tokens in total. Which finishes first?










