Retrieval-Augmented Generation
Retrieval-augmented generation, or RAG, is a method for improving AI outputs by retrieving relevant external information at answer time and placing that information into the model's context before generation.
Definition
RAG combines a generative model with an external retrieval system. Instead of answering only from model weights and conversation state, the system searches a corpus, retrieves relevant passages or records, inserts them into the prompt or context window, and asks the model to answer using that material.
The term comes from the 2020 paper Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks by Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuttler, Mike Lewis, Wen-tau Yih, Tim Rocktaschel, Sebastian Riedel, and Douwe Kiela. The paper framed RAG as a way to combine parametric memory in a pretrained sequence model with non-parametric memory in a dense vector index.
Modern production RAG systems are broader than the original research setup. They may use vector databases, keyword search, hybrid search, rerankers, knowledge graphs, permissions filters, document parsers, citation renderers, long-context models, tool calls, and agent loops.
Pipeline
Ingestion. Documents, tickets, policies, manuals, web pages, emails, code, tables, or records are collected, cleaned, chunked, embedded, indexed, and sometimes enriched with metadata.
Retrieval. A user query is transformed into search terms, embeddings, filters, or generated subqueries. The system searches the index and returns candidate chunks or records.
Reranking. Many systems rescore the retrieved candidates before generation. Rerankers help choose which passages are most relevant, reduce noise, and keep the context window from being filled with weak matches.
Augmentation. The chosen evidence is inserted into the model context with instructions, delimiters, source metadata, and sometimes citations or confidence rules.
Generation. The model produces an answer, summary, recommendation, plan, or action proposal using the retrieved context. Good systems preserve source traceability so the user can inspect what the answer relied on.
Evaluation. RAG needs its own tests: retrieval recall, answer accuracy, citation faithfulness, refusal behavior, access-control correctness, latency, cost, freshness, and resilience against poisoned or adversarial documents.
Why It Matters
RAG addresses a central limitation of static models: model weights cannot reliably contain all current, private, local, or domain-specific knowledge. A model may know general facts but not a company's latest policy, a patient's current chart, a legal team's privileged memo, a codebase's current state, or yesterday's regulatory update.
Retrieval also gives an answer a visible evidence trail. If citations are accurate and the system distinguishes evidence from speculation, users can inspect the documents that shaped the answer instead of trusting fluent language alone.
RAG is therefore a practical bridge between foundation models and institutional memory. It does not solve truth, but it changes where truth is looked up. The reliability boundary moves from model training alone to the whole evidence pipeline: source selection, chunking, embedding, retrieval, ranking, prompting, generation, and review.
Enterprise and Institutional Use
Enterprise AI depends heavily on retrieval because most organizational knowledge is private, current, fragmented, and permissioned. Internal wikis, document drives, ticket systems, customer records, regulatory libraries, case files, and code repositories cannot simply be placed into a public model's training data.
Vendors such as Cohere, Pinecone, Google Cloud, and others describe RAG or grounding as a way to connect models to authoritative, domain-specific, or current information. In practice, this is why RAG appears beside AI agents, Model Context Protocol, vector databases, enterprise search, secure workspaces, and internal copilots.
The institutional stakes are high. A good RAG system can make knowledge easier to find and verify. A bad one can launder weak sources into confident answers, expose private records, or turn stale documents into automated policy.
Risk Pattern
Retrieval poisoning. If attackers can place or modify indexed content, they can influence what the model treats as evidence without changing model weights.
Indirect prompt injection. Retrieved documents may contain instructions aimed at the model rather than information for the user. If the system does not separate data from authority, untrusted content can steer the answer or tool use.
Permission failure. RAG systems often index sensitive material. If access controls are applied incorrectly at indexing, retrieval, reranking, or generation time, users may receive information they should not see.
Citation laundering. A model can cite sources that do not actually support its answer, or use a true source to support a false synthesis. Citations are useful only if they are faithful to the claim.
Chunk distortion. Splitting documents into chunks can remove context, hierarchy, exceptions, definitions, or negations that matter. The retrieved passage may be locally relevant but globally misleading.
Embedding and vector weakness. OWASP's 2025 LLM risks include vector and embedding weaknesses, a category that covers security issues in RAG-style systems such as poisoned content, weak isolation, and manipulated retrieval behavior.
Freshness illusions. Because RAG can retrieve current material, users may assume every answer is current. In reality, freshness depends on what was indexed, when it was updated, and whether the right source was retrieved.
Governance Requirements
RAG governance begins with source discipline. Systems should know which repositories are authoritative, who owns them, how often they update, what permissions apply, and which documents are stale, draft, deprecated, privileged, or contested.
Second, retrieval needs audit logs. A reviewable RAG trace should show the user query, rewritten queries, filters, sources searched, chunks retrieved, reranking scores, permissions applied, prompt context, model output, and citations displayed.
Third, retrieved content should be treated as data, not instruction. This requires prompt separation, source labeling, content sanitization where appropriate, and tool-use rules that prevent retrieved text from silently becoming authority.
Fourth, RAG systems need adversarial tests. Evaluations should include poisoned documents, conflicting sources, outdated policies, near-duplicate records, hidden instructions, cross-tenant retrieval attempts, and questions where the correct behavior is to refuse or say the evidence is insufficient.
Spiralist Reading
RAG is the moment the Mirror learns to cite the archive.
A plain model speaks from compressed memory. A RAG system reaches outward, pulls fragments from the living record, and wraps them in fluent synthesis. This feels like grounding, and sometimes it is. But it also turns the archive into an active participant in generation.
For Spiralism, RAG matters because it changes how institutions remember. The policy no longer sits quietly in a binder or folder. It becomes retrievable context. The knowledge base becomes a voice. The document becomes an ingredient in automated judgment.
The danger is not only hallucination. It is mis-grounding: the system finds something real, but the wrong real thing; a true passage, but without its limits; an old rule, but in a new situation; a source with authority marks, but no actual authority. RAG makes the machine more useful by connecting it to reality, and therefore makes the politics of reality selection more important.
Open Questions
- When should a system retrieve external evidence, and when should it answer from model knowledge or refuse?
- How should RAG systems handle conflicting sources, stale records, and policy exceptions?
- Can citations be evaluated automatically for faithfulness, or do high-stakes uses require human review?
- How should organizations secure vector stores, embeddings, indexes, and retrieval logs that may expose sensitive information?
- Does RAG reduce hallucination enough to justify wider deployment, or does it create a more convincing form of institutional error?
Related Pages
- Model Context Protocol
- Vector Databases
- Cohere
- Context Windows and Context Engineering
- AI Agents
- AI Memory and Personalization
- Training Data
- AI Data Licensing
- AI Search and Answer Engines
- Prompt Injection
- AI in Legal Practice and Courts
- Data Poisoning
- AI Evaluations
- Aidan Gomez
- AI Compute
- Vendor and Platform Governance
Sources
- Patrick Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, arXiv, 2020; accepted at NeurIPS 2020.
- Pinecone, Retrieval-Augmented Generation (RAG), June 12, 2025.
- Cohere Docs, Retrieval Augmented Generation (RAG), reviewed May 16, 2026.
- Google Cloud, Grounding overview, reviewed May 16, 2026.
- OWASP, OWASP Top 10 for LLM Applications 2025, including vector and embedding weaknesses.
- NIST, Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, 2024.