Context Engineering

Context Caching

Context caching lets you reuse previously processed prompt prefixes across multiple API calls, cutting both cost and latency for repeated content like system prompts, documentation, or few-shot examples. Anthropic's prompt caching can reduce input token costs by up to 90% and latency by 85% for cached content, and OpenAI and Google offer comparable automatic caching mechanisms. This technique becomes especially valuable in agent systems that make many calls with the same base context, which is the default pattern in any multi-step agent loop where the system prompt and tool definitions stay constant across iterations.