Context Engineering

Context Caching

definition

Context caching lets you reuse previously processed prompt prefixes across multiple API calls, reducing both cost and latency for repeated context like system prompts, documentation, or few-shot examples. Anthropic's prompt caching can reduce input token costs by up to 90% and latency by 85% for cached content, while OpenAI and Google offer similar automatic caching mechanisms.

Context caching lets you reuse previously processed prompt prefixes across multiple API calls, reducing both cost and latency for repeated context like system prompts, documentation, or few-shot examples. Anthropic's prompt caching can reduce input token costs by up to 90% and latency by 85% for cached content, while OpenAI and Google offer similar automatic caching mechanisms. This is especially valuable for agent systems that make many calls with the same base context — which is the default pattern in any multi-step agent loop where the system prompt and tool definitions remain constant across iterations. Understanding when and how to structure your prompts to maximize cache hit rates is a critical production optimization that separates prototype-cost agents from economically viable ones. This concept connects to token economics for understanding the cost savings quantitatively, context window budget for structuring prompts to enable caching, and memory types for understanding how caching relates to short-term and long-term agent memory.

on the map

Context Caching Context Engineering

related concepts

Memory Types Cost Tracking Latency Optimization