MCP vs RAG: Making Sense of Agentic AI’s Alphabet Soup

The Big Picture

Hello and welcome! Let’s jump right into things.

Agentic AI

AI that can decide and act, not just chat. Think of AI that can book flights, schedule appointments, post on TikTok, send emails, or update databases (shudders) without you clicking anything first. You know, the stuff that usually brings about the end of the world in science fiction.

RAG (Retrieval-Augmented Generation)

An AI workflow that searches curated (pre-indexed) knowledge sources, pulls the most relevant snippets before responding, and then lets the model generate an answer grounded in that material. Behind the scenes, each document is converted into a vector (a list of numbers that captures its meaning) and those vectors are stored in a vector database, a search engine built for this numeric format. Think of it as giving the AI a research assistant. Invented by Meta.

CAG (Cache-Augmented Generation)

A close cousin to RAG. Instead of searching a vector database at question time, CAG preloads relevant documents into the model’s extended context window (its short-term memory: how much text it can keep in mind at once) or cache. When the dataset is small and stable enough to fit in that window, CAG can serve as a direct substitute for RAG, and its relationship to MCP is identical.

For a deeper look at the differences between RAG and CAG, start here:

CAG won’t be mentioned further, as it doesn’t change the relationship to MCP. Whether you compare MCP to RAG or to CAG, everything discussed here still applies.

MCP (Model Context Protocol)

A standardized way for AI to connect to tools and data sources during conversations. Like ChatGPT being able to look at your investment account in real time when you ask it “how am I doing financially?” This is the foundational tech behind Agentic AI systems. Invented by Anthropic.

Their Relationship

They’re not competing technologies. RAG is about retrieving information from custom knowledge bases, MCP is about standardizing connections to any system. You can even implement RAG as an MCP tool.

Use RAG when the AI needs knowledge that isn’t in its training data. Use MCP when the AI must call external tools, whether to retrieve live data, perform an action, or even run a RAG search service. Combine them when you need an AI that can gather domain knowledge and then act on it.

Hopefully that answers the question that brought you here. You could leave now, but if you keep reading, there’s approachable explanations, memes and fun ahead. I shall now enter “this is a a personal blog in some backwater corner of the internet, so I’m going to yap as much as I want, lol what are you going to do about it?” mode.

My AI Life

To set the stage, I’m a software engineer at a company you’ve probably heard of that’s getting AI this and AI that falcon punched into my face every day.

Whether in Teams chats, emails, meetings, tech blogs, tech forums, at church, at the barbershop, talking to friends and family, on dates (not even kidding), I can’t escape AI. It’s everything, everywhere, all at once. Which, to be fair, I’m not particularly mad about. I’m fascinated by AI and have grown to love working with it. It’s just that the sheer volume of its reach can be overwhelming.

At work and in my personal projects, my particular brand of AI exposure is more on the builder side of things. We’re a steadily rising group of people taking large language models and extending them, building things that use AI to enhance the domains we work in. This is different than what Machine Learning Engineers do. They’re the people creating and enhancing models, the brains behind the scenes.

For example, OpenAI alone has a whole zoo of models, including:

GPT-4 (Mar 2023 flagship, yet soon eclipsed by multiple follow ups)
GPT-4 Turbo (Nov 2023 release; “Turbo” label instead of a point number, so users wonder how it ranks among 4.x models)
GPT-4o (May 2024 “omni” model; the letter o looks like zero and skips the decimal scheme)
GPT-4o mini (Jul 2024 smaller tier; “mini” tag later collides with 4.1 mini and o-series minis)
GPT-4.5 (Feb 2025 interim upgrade that is later retired in favor of 4.1, making the numbering feel backward)
GPT-4.1 (Apr 2025 replacement for 4.5 even though the version number is lower)
GPT-4.1 mini (Apr 2025 smaller tier; “mini” label overlaps with 4o-mini)
o1 (Dec 2024 GA; first in the new “o” series meant to flag reasoning models, drops the “GPT-“ prefix, and restarts numbering with a letter)
o3 (Apr 2025 launch; skips “o2”, adding to the sequence puzzle)
o4-mini (Apr 2025 release on the same day as o3 but given a higher number, blurring the hierarchy)
o4-mini-high (Feb 2025 tier bump inside the “mini” family, leaving “high” undefined in other series)

An Aside: OpenAI’s naming is a mess. I blame their close relationship with Microsoft, who infamously struggle to name products clearly, as I’ve remarked on in the past.

And that’s just OpenAI. There are also plenty of models from companies like Anthropic, Google, DeepSeek, Mistral, Meta, and xAI.

Think of it this way: we’re both athletes, but in different sports. Maybe Machine Learning Engineers are playing baseball while we’re playing basketball. Both benefit from being athletic (technical skills), but they’re different sports requiring different tactics. Or maybe it’s more like chess vs. checkers. I don’t know, no analogy is perfect. The point is, they’re building the models, and we’re taking those models and building with them.

Some titles used to describe people extending AI models are AI Engineer (the emerging standard) and AI Builder (more ambiguous because Microsoft went and named a product this exact thing). Or maybe we’ll end up calling them Agent Bosses if the marketing teams get their way 🙄. This group typically includes people with job titles like software engineer, product manager, data analyst, or anything in between. Really, anyone building custom AI features for specific use cases is welcome.

Under the Hood

This is the part of the article where I assume you’re as inundated with AI stuff as I am. Particularly from the perspective of an AI Engineer/Builder, as that’s usually the kind of person looking for answers about MCP vs RAG. Someone asked me that exact question not more than a fortnight ago, to which my initial reaction was:

Which led me to almost saying “why don’t you ask ChatGPT?”, an intuition I didn’t act on, but am ashamed to admit I felt. It’s the modern version of Let Me Google That For You. I don’t think it’s wrong to use AI to help you understand AI, I do it all the time. But this person came to me, a human, hoping for an approachable explanation that would set them in the right direction. It’s kind of messed up to respond by going “ask AI”, so I set out to better understand how MCP and RAG fit together.

What Is RAG?

Retrieval-Augmented Generation (RAG) is an AI architecture pattern that adds a retrieval step before language model generation.

Here’s how RAG works:

You ask a question
The system searches your pre-indexed knowledge bases (not the live internet, that’s different) for relevant documents.
The most relevant documents are selected and combined with your question.
The AI generates an answer using both its training and the retrieved information.

Example: You ask your company’s chatbot “How do I update my mailing address in our HR system?” RAG searches your HR docs, policy guides, and employee handbook, then gives you the exact steps for your company’s process instead of generic advice.

Key Point: RAG happens before the AI generates its response. Everything gets loaded into the context window (the model’s short-term memory: how much text it can keep in mind at once) upfront.

Important Distinction: When ChatGPT searches the internet, that’s not RAG. It’s using a search tool during the conversation, which is actually more like MCP. RAG specifically refers to searching through pre-indexed, curated knowledge sources.

What Is MCP?

If you read the many dozens of MCP articles online, they’ll say Model Context Protocol (MCP) is like USB-C for AI systems. Which is true in a sense, but there’s no perfect analogies, so here’s another one.

It’s like having a personal assistant who can call anyone, order DoorDash, book an Uber, schedule your Real ID appointment, make dinner reservations, update your Instagram bio, or post on TikTok. All the things, they can just go and do them. Think of how life is for high-profile celebrities. Do you think Tom Cruise (the greatest action movie star of all time, don’t @ me about the scientology stuff, I don’t care) schedules his own dinner appointments? His dry cleaning? Nah. Somebody does that.

MCP lets you take one AI system (like ChatGPT, Claude, Gemini, Copilot, or your company’s custom chatbot) and connect it to everywhere. Not just to get information like RAG (and here’s the key: MCP can use RAG as one of its tools), but also to do various other stuff and thangs.

Here’s how MCP works:

The AI starts answering your question.
It realizes it needs extra info or needs to do something (like look something up or take an action).
It asks for help from a connected tool or service (like a database, API, or even a RAG system).
The AI gets what it needs, then continues the conversation or does what you asked.

Example: You ask your company’s chatbot “How do I update my mailing address in our HR system?” The AI realizes it needs current policy info, makes an MCP request to the HR knowledge base, gets the data, and responds with your company’s exact process. That knowledge base could be using RAG (pre-indexed documents), or it could be a direct database query, or an API call. MCP doesn’t care, it just uses whatever tool is available. But here’s where MCP goes further: it can also ask “Would you like me to update your address to 123 Main Street right now?” and actually do it.

Key Point: MCP happens during the conversation. It’s about standardized connections to tools and data sources, not just information retrieval.

What Makes It Powerful: While RAG searches and retrieves from pre-indexed knowledge bases, MCP can do that and much more. Need to know something? MCP can connect to any data source, whether that’s a RAG system for document search, a database for live data, or an API for real-time information. Need something done? MCP can execute it. Update databases, send emails, create tickets, post content, run calculations. Whatever tools you connect, MCP can use them. It’s a universal protocol for AI to interact with any system, not just read from them.

MCP vs RAG Cheat Sheet

Aspect	RAG	MCP
When it acts	Before generation	During generation
What it does	Retrieves from pre-indexed knowledge	Connects to any system/tool
Best for	Searching your custom documents	Taking actions + getting live data (APIs, databases, web)
Implementation	Specific to each system	Universal protocol

Most importantly, they’re not competing. RAG can be implemented as an MCP tool. When comparing the two, there’s some practical considerations.

Cost: RAG front-loads everything into the context window (expensive). MCP pulls only what’s needed (efficient).
Simplicity: Need to search documents? Use RAG. Need to take actions? Use MCP. Need both? MCP can orchestrate RAG tools alongside other capabilities.

To summarize:

RAG = Pre-indexed knowledge retrieval
MCP = Universal protocol for AI to use any tool (including RAG)
Together = The foundation of Agentic AI

The pace of change in this space is wild, but now you understand the fundamentals. Time to talk about why we might be cooked as a society.

Isn’t This All Kind Of Dangerous?

Now, if you’re like me, you might have realized something about MCP. We just gave AI the ability to not just talk about things, but actually do them. With MCP powering Agentic AI, you might be thinking “wait… isn’t that how AI destroys the world and ends humanity in like every science fiction property ever?”

Yes, yes it is.

The latest Mission Impossible film is literally about this. An AI goes rogue and the fact that it can take action (from editing bank accounts to launching nuclear weapons) is the premise of the entire film. So yes, this is potentially dangerous, and yes, tech-savvy people are raising the alarm, but no, there’s not much you and I can do to stop it. The powers that be have put gargantuan amounts of money behind this stuff and it’s not slowing down any time soon. For example, Google just announced an Agent2Agent Protocol that amounts to letting AIs talk to each other with almost zero human babysitting, trading data at a pace no one could ever fully sift through.

So if you’re feeling like we’re moving too fast, you’re not alone. The domain you’ll want to spend time reading up on is called Responsible AI. To get started, read these articles. They’ll help relieve any “you’re overreacting, we’re fine” vibes you might’ve been getting from friends, family, or overly zealous tech bros. You’re not crazy or alarmist. What we’re doing with the leap to Agentic AI is both fascinating and very concerning

Have experience with RAG or MCP? Thoughts on where Agentic AI is headed? Feel free to message me on LinkedIn to continue the conversation!

Tools & Resources

Some stuff to get you started down the AI Engineer/Builders path of making cool things using RAG and MCP.

RAG

LangChain: The go-to Python framework for LLM apps. It glues models, data, and tools together, giving you a full agent or RAG pipeline in one place.
LlamaIndex: Purpose-built for retrieval. It chunks, indexes, and serves your data so the model pulls only what it needs. Smaller scope than LangChain because it focuses on RAG, not every AI task under the sun.
Azure AI Search: Microsoft’s enterprise RAG tool on Azure. Point it at SharePoint, Blob Storage, Cosmos DB, or SQL, let it index everything, and feed the hits straight into your model. Built for companies swimming in data.

MCP

Anthropic MCP Docs: The official MCP docs straight from the creators. Clear examples, quick-start scripts, and everything you need to stand up a server.
Microsoft MCP for Beginners: Step-by-step labs and sample code that walk you from “What is MCP?” to “My model just hit an external API.” I highly recommend this, Microsoft did really well here.

Both

Microsoft AI Agents for Beginners: Full projects, step-by-step tutorials, and clear guidance on everything Agentic AI, including RAG, MCP, multi-agent systems, and more.
LangGraph: A Python framework for building stateful, branching agents on top of LangChain. This is what you use when you want to build truly complex, multi-step agent systems, not just add a chatbot to your site.
AutoGen: Microsoft’s open-source toolkit for building groups of AIs that can plan, chat, and take action together. Similar to LangGraph, but not tied to LangChain. Use this when you want advanced, multi-agent systems that can plug in RAG, MCP, or other tools wherever you need them.

Twitter LinkedIn