home/glossary/Retrieval Augmented Generation
Context Engineering

Retrieval Augmented Generation

definition

Retrieval-Augmented Generation (RAG) is the foundational pattern for giving language models access to external knowledge by retrieving relevant documents and including them in the prompt before the model generates its response. Rather than relying solely on what the model learned during training (which has a knowledge cutoff and can hallucinate), RAG grounds the model's output in specific, verifiable source material — dramatically improving accuracy for domain-specific questions.

Retrieval-Augmented Generation (RAG) is the foundational pattern for giving language models access to external knowledge by retrieving relevant documents and including them in the prompt before the model generates its response. Rather than relying solely on what the model learned during training (which has a knowledge cutoff and can hallucinate), RAG grounds the model's output in specific, verifiable source material — dramatically improving accuracy for domain-specific questions. The pattern has become so fundamental to production AI systems that it's often the first architectural decision after choosing a model: do you need the model to answer questions about your data? If yes, you need RAG. Understanding RAG at the conceptual level matters because every variation — from simple vector retrieval to agentic multi-hop RAG — builds on this core pattern of "retrieve then generate." This concept connects to vector databases and embedding models for the infrastructure layer, RAG patterns for advanced retrieval strategies, and context window budget for understanding the token constraints that drive retrieval decisions.