Glossary

What is retrieval-augmented generation (RAG)?

Retrieval-augmented generation (RAG) is a pattern that fetches relevant documents at query time and feeds them to a language model, so its answers are grounded in your own data rather than only its training.

← All glossary terms

Retrieval-augmented generation, almost always shortened to RAG, is a pattern for grounding a language model in a specific body of knowledge. Instead of relying only on what the model learned during training, a RAG system retrieves relevant documents at query time — usually from a vector database or search index — and inserts them into the prompt as context. The model then answers using that retrieved material, so its output reflects your data, not just its pretraining.

A typical RAG pipeline has two phases. Offline, you split source documents into chunks, turn each chunk into an embedding, and store those embeddings in an index. Online, when a user asks a question, you embed the query, find the most similar chunks, and pass the top results to the model alongside the question. The model's job becomes reading and synthesising the supplied passages rather than recalling facts from memory.

RAG is the default architecture for question-answering over private or fast-changing data: internal knowledge bases, product documentation, contracts, policy libraries, support histories. It is attractive because it sidesteps the cost and staleness of retraining — you can update the knowledge by re-indexing documents, and you can cite the exact sources behind an answer, which matters in regulated settings. The retrieval step is also where most quality lives or dies: if the right chunk isn't retrieved, no amount of model quality will save the answer.

RAG matters because it is the most reliable way to make a general model accurate and trustworthy on your specific domain, and because it makes answers auditable — you can show the passages an answer was drawn from. But it is not free: chunking strategy, embedding choice, retrieval tuning, and handling the cases where nothing relevant is found all require real engineering. Done well, RAG dramatically reduces hallucination; done carelessly, it retrieves the wrong context and produces confident, wrong answers that look grounded.

RelatedWhat are embeddings?

RelatedWhat is a vector database?

RelatedWhat is hallucination?

RelatedApplied builds

ReferenceThe applied-AI glossaryEvery term, defined for production — agents, RAG, evals, embeddings, and more.

ServiceAI consultingStrategy and production engineering in one continuous engagement.

From definition to deployment

Understanding the term is step one. Bring us the problem and we'll build the system that solves it — and prove it moved the number.

Start a conversation

See our work