Context Engineering

Context Caching

Context caching lets you reuse previously processed prompt prefixes across multiple API calls, cutting both cost and latency for repeated content like system prompts, documentation, or few-shot examples. Anthropic's prompt caching can reduce input token costs by up to 90% and latency by 85% for cached content, and OpenAI and Google offer comparable automatic caching mechanisms. This technique becomes especially valuable in agent systems that make many calls with the same base context, which is the default pattern in any multi-step agent loop where the system prompt and tool definitions stay constant across iterations.

subtopics

Prompt Caching Strategies

Cache Invalidation

connected to

Memory Types Cost Tracking Latency Optimization

resources

Anthropic: Prompt Cachingdocs.anthropic.comOfficial documentation on Claude's prompt caching with implementation details (docs.anthropic.com)OpenAI: Prompt Cachingplatform.openai.comOpenAI's automatic prompt caching for repeated prefixes (platform.openai.com)Google: Context Cachingai.google.devGemini's approach to caching context across multiple requests (ai.google.dev)Anthropic Cookbook: Prompt Cachinggithub.comPractical notebook showing how to implement prompt caching with Claude (github.com)

view in track