How Long Will It Take?
Token problems: 17 of 20
Token problems: the workbook
How Long Will It Take?
Why does a short reply sometimes take longer than reading a large document? A call's time has two parts. Input is processed in parallel, every token in one pass, so even a large input clears quickly. Output is generated one token at a time, each waiting on the one before. For agents, this means latency is driven almost entirely by how much the agent writes, not by how much it reads.
Problem
A call sends 2,000 input tokens and gets back 200 output tokens. The model reads input at 1,000 tokens per second and writes output at 40 tokens per second. How long does the call take?
Time to read the input
Time to write the output
Add the two times
Practice 1
A call sends 3,000 input tokens and gets back 300 output tokens. The model reads input at 1,000 tokens per second and writes output at 50 tokens per second. How long does the call take?
Time to read the input
Time to write the output
Add the two times
Practice 2
A call sends 4,000 input tokens and gets back 120 output tokens. The model reads input at 2,000 tokens per second and writes output at 30 tokens per second. How long does the call take?










