Foundations

Token Economics

Large language model (LLM) pricing runs on tokens, the chunks of text models process as input and output, with output tokens typically costing 3 to 5 times more than input tokens and costs accumulating fast inside agent loops that call the model repeatedly. An unoptimized agent loop can burn through hundreds of dollars per day in production, so understanding tokenization, context window costs, and price differences between models is a prerequisite for building economically viable systems at scale. Prompt caching, model routing to cheaper models for simpler tasks, and context window discipline can reduce costs by 10x or more without sacrificing output quality.

subtopics

Tokenization Basics

Cost per Token Pricing

connected to

Context Window Budget Cost Tracking Context Density

resources

OpenAI API Pricingopenai.comCurrent pricing for all OpenAI models with input/output token breakdowns (openai.com)Anthropic API Pricinganthropic.comClaude model pricing including prompt caching discounts (anthropic.com)OpenAI Tokenizer Toolplatform.openai.comInteractive tool to see how text gets split into tokens (platform.openai.com)Google Gemini Pricingai.google.devGemini model pricing with generous free tier details (ai.google.dev)Anthropic: Prompt Cachingdocs.anthropic.comHow prompt caching can reduce repeated context costs by up to 90% (docs.anthropic.com)

view in track