Understanding RAG (Retrieval Augmented Generation) and MCP (Model Context Protocol)
Product Owners | January 02, 2026
Augmenting LLM Training Data with RAG and MCP
Scalable local and secure language models introduce some new concepts and terminology. Many local models are simple chatbots with very limited information from their initial training data sets, RAG or Retrieval Augmented Generation provides a method for adding data to a model without retraining the model - a time consuming and processing power intensive task. For example this can allow for adding a file or folder for the model to reference. Another key solution is MCP - the Model Context Protocol, this standard allows for models to access other systems, tools, and data sources - for example accessing a database or website for more context when generating results. Together these provide file-level to service level access to a wide range of data sources to customize and enhance LLM results.
RAG (Retrieval Augmented Generation)
RAG is essentially a methodology that bridges the gap between the vast, pre-trained knowledge stored within an LLM (the Generation part) and real-time, domain-specific information stored in an external knowledge base (the Retrieval part).
Large Language Models (LLMs) are trained on massive public data sets, mostly scraping publicly available data like websites and Wikipedia. This leads to the three most common LLM problems:
- Knowledge Cutoff: An LLM cannot inherently know about events or data that occurred after its training was completed.
- Hallucination: When an LLM lacks the specific information needed to answer a query, it may confidently invent facts or responses that do not make sense during validation.
- Non-Specific Knowledge: LLMs are generalists. They don't have access to your organization's private documents, internal manuals, or specific technical specs.
This can be remedied by providing the LLM access to relevant data before generating a response. For example we can upload a PDF or Doc to an LLM which can be used as reference material for answering a question, either about that document specifically, or to supplement data included in the original model.
How the RAG Process Works (The 3 Key Steps)
The RAG process can be broken down into three main operational phases: Indexing, Retrieval, and Generation.
- Indexing includes breaking down the document(s) into “chunks” that the LLM can process. Most LLM context windows are smaller than a full document. Each chunk is then vectorized - converted to a set of numbers the LLM can understand - and then stored for retrieval.
- Retrieval includes vectorizing the user’s request, searching the chunks to obtain the most relevant document chunks, then returning these to the LLM from the database
- Generation proceeds like normal, except that the model has access to the document chunks, through prompt generation the document chunks can be exclusively used by the LLM, or it can incorporate general information.
MCP (Model Context Protocol)
The Model Context Protocol (MCP) is an open-source standard that provides a unified, structured "language" for Large Language Models (LLMs) to securely and predictably interact with external systems, tools, and data sources.
Before MCP, every AI app had to learn a custom way to talk to every tool it wanted to use. If you had many AI apps and many tools, this quickly became messy because each app needed a separate connection for each tool.
This created an M × N problem:
M = number of AI apps
N = number of tools
Total connections = M × N
MCP fixes this by using a shared standard.
Tool developers build one MCP server per tool, and AI app developers build one MCP client per app. Once both speak MCP, they can work together automatically.
This turns the problem into M + N, which is much simpler and easier to scale.
Analogy: MCP is often compared to the Language Server Protocol (LSP) in software development, which standardized how IDEs communicate with programming language analysis tools. It's the "USB-C" of AI systems, providing a single connection for multiple functions.
MCP Components
MCP consists of multiple components, these can be on the same computer or local network, or even spread all over the world via the Internet:
- MCP Host - this is the application running on the local computer like Microsoft Foundry
- MCP Client - this is the manager for connection, discovery, and communications with an MCP Server
- MCP Server - this is a separate application acting as a wrapper for a service - for example a SQL database, web search engine, or specific website like GitHub
- Transport Layer - this is how the Host and Server communicate - for example stdio on the same system, or HTTP via SSE for remote connections
MCP Considerations
Because MCP grants LLMs the ability to read private data and execute arbitrary code (Tools), it places high importance on security:
- User Consent: Implementations must ensure users explicitly consent to data access and tool execution.
- Structured Context: By forcing data into a structured format (defined by schemas), MCP helps mitigate risks like Prompt Injection by providing validation and clear boundaries for context.
- Trust and Isolation: The Server is responsible for validating that the LLM's requested action is safe and authorized before executing it against the underlying system.
Loading Comments