Definition

What is retrieval-augmented generation (RAG)?

Last updated May 7, 2026

Definition

Retrieval-augmented generation is a pattern where an LLM retrieves relevant documents from a knowledge base at inference time and uses them as context to ground its response in source material.

RAG combines a retrieval step (typically vector similarity over an embedded document corpus) with a generation step (an LLM that conditions on the retrieved chunks). It addresses two LLM weaknesses at once: stale training data and hallucination on facts not in the model's parameters. Production RAG systems add re-ranking, chunking strategies, and citation tracking so the user can verify which source passages influenced the answer.

Where RAG fits

RAG is the right pattern when answers need to be grounded in a corpus the model wasn’t trained on — internal docs, recent news, customer-specific data. It’s the wrong pattern when the answer is a reasoning task with no obvious source documents.

Production patterns

Chunking — split documents into 200–800 token chunks with overlap; retrieval quality depends heavily on chunk size matching query granularity.
Re-ranking — initial vector retrieval pulls candidates; a smaller LLM or cross-encoder re-ranks the top N for final selection.
Citation tracking — return source IDs alongside the answer so users can verify.

Related terms

Related agents

BSK-007

AI UGC Agent

Turn a product brief into 5–8 paid-ad-ready vertical UGC videos in one pass.

Sources

Anthropic — Building effective agents