RAG / Context Engineering Complete Guide¶
Combine LLMs with retrieval to produce grounded answers and leverage your own knowledge base.
Long context, MCP, Agentic Search, and code exploration now sit beside RAG, so the design question is what to retrieve, what to place in context, and what to expose as tools.
What this hub covers
Before diving into papers, OSS, or implementation posts, this hub treats RAG, long context, MCP, and Agentic Search as different ways to provide evidence to models. The focus is not only what each technology is, but when to choose it.
What Is RAG?¶
RAG, or Retrieval-Augmented Generation, is a design pattern where the system retrieves relevant documents or data at answer time and passes them to the model as context. The core question is not only whether to use a vector database. It also includes retrieval scope, permissions, freshness, evidence display, and evaluation.
Start Here¶
Choosing between RAG, MCP, Agent Skills, and long context.
A beginner-friendly guide to RAG ranking issues and the reranker idea of rereading retrieved candidates.
Treating RAG as persistent wiki maintenance instead of repeated query-time search.
The limits of vector-DB-dependent RAG and the role of Agentic Search and code exploration.
Workflow, permission, and operating design for exposing internal knowledge to AI.
Design Questions¶
| Lens | Question |
|---|---|
| Retrieval scope | Which documents, databases, or code should the system reference? |
| Context | At what granularity should retrieved evidence be passed to the model? |
| Permissions | How should access differ by user or role? |
| Freshness | How often should documents or indexes be updated? |
| Evaluation | How should retrieval quality and answer quality be measured separately? |
Implementation Archive¶
These are older implementation-oriented articles with SageMaker assumptions. Start with the design articles above before using them.