Memory and Knowledge

RAG Patterns

Retrieval-Augmented Generation (RAG) patterns address how agents dynamically retrieve relevant information from external knowledge sources and inject it into the model's context before generating a response. The core pattern involves chunking documents into segments, embedding them as vectors, storing them in a vector database, and at query time, retrieving the most semantically similar chunks to include in the prompt. Advanced RAG patterns include multi-step retrieval (using an initial retrieval to refine the query), hybrid search (combining semantic and keyword matching), re-ranking (using a second model to score relevance), and agentic RAG (letting the agent decide when and what to retrieve); understanding these variations matters because the right pattern depends heavily on your knowledge structure, query patterns, and latency budget.