A hands-on journey through MCP and AI tool calling — from zero to building your own integrations.
Start LearningWhy LLMs need tools — the isolation problem and the breakthrough of tool calling
The universal 5-step pattern, JSON Schema definitions, and the agentic loop
Tool calling across OpenAI, Anthropic Claude, and Google Gemini — key differences
The protocol that changes everything — origin, governance, and N+M architecture
Resources, Tools, Prompts, Sampling, Roots, and Elicitation — the six building blocks
Stdio vs. Streamable HTTP, OAuth 2.1 with PKCE, and the Nov 2025 spec updates
500+ servers, enterprise adoption, and how MCP compares to function calling and REST
From learner to builder — write a Python MCP server and connect it to Claude Desktop
Before we can understand tool calling and MCP, we need to understand the fundamental limitation that made them necessary — and why solving it changed everything.
Large Language Models are trained on enormous datasets of text — books, code, articles, conversations. Through this training they develop remarkable capabilities: reasoning, writing, analysis, translation, and more. But there's a catch that defines everything: training ends.
When training concludes, a model's knowledge is frozen. It knows about the world as it existed up to its knowledge cutoff date. Ask it about last week's stock prices? It can't help. Ask it to check if a website is currently down? It has no way to know. Ask it to send an email on your behalf? It cannot reach outside its own context window.
This is the isolation problem. A base LLM is like a brilliant analyst locked in a room with only historical documents. They can reason with extraordinary sophistication about everything they've read — but they can't pick up a phone, check today's news, or modify a spreadsheet in real time.
Early workarounds helped at the margins. Prompt engineering allowed users to provide context directly. Retrieval-Augmented Generation (RAG) automated the process of fetching relevant documents and stuffing them into the context window. These were clever solutions to a deeper architectural limitation.
Think of a base LLM like a skilled analyst who can reason brilliantly but can't pick up a phone. They have deep knowledge from their training, but they're isolated from live systems. Tool calling gives them the phone — and eventually, an entire office of connected services.
Developers tried various approaches to work around the isolation problem. Prompt engineering involved carefully crafting instructions to coax better outputs from static knowledge. Users would paste in current data, ask for analysis, and work around the boundaries manually.
Retrieval-Augmented Generation (RAG) was a bigger step forward. It built a retrieval pipeline that could search a vector database or document store and inject relevant chunks into the model's context before asking a question. This solved the "fresh knowledge" problem partially — but only for reading, never for writing or acting. A RAG system could tell you current documentation but couldn't file a ticket, update a database, or query a live API.
These solutions had a fundamental ceiling: they were still just feeding more text to a model that could only output text.
The insight that changed everything was subtle but profound. Instead of trying to give models all the information they might need, what if models could request the actions themselves?
Rather than knowing current weather, a model could say: "I need to call the weather API with location 'Seoul' to answer this." Rather than guessing at a database value, it could say: "I need to query the customers table for ID 42." The model doesn't execute these actions — it can't. But it can articulate them precisely enough that your application code can do the executing.
This is tool calling. And it parallels a deep principle in software engineering: do one thing well, then compose. Just as Unix philosophy builds powerful systems from small, focused utilities chained together, tool calling builds capable AI systems from focused functions that the model can compose on demand.
The Unix philosophy says: do one thing well, compose with pipes. Tool calling applies this to AI: each tool does one thing (get weather, send email, query database), and the model composes them into complex behaviors. The result is far more powerful and maintainable than trying to encode all knowledge directly.
Old World
New World
The universal pattern behind tool calling is consistent across providers. Master these five steps and you understand the fundamentals of every AI tool integration.
Every tool calling implementation — regardless of provider — follows the same logical sequence. Understanding this flow deeply is the foundation of everything that follows.
A tool definition is a contract between you and the model. It tells the model: here's a function, here's what it does, here's what arguments it takes, and here's what's required. The quality of your descriptions directly impacts the model's ability to use tools correctly.
Every tool definition has three critical parts: the name (how the model refers to the tool), the description (the most important field — it tells the model when and why to use this tool), and the parameters (a JSON Schema object specifying inputs, their types, and which are required).
# Tool Definition (JSON Schema)
tools = [{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'Seoul'"
}
},
"required": ["location"]
}
}]
Models don't always call tools — you can configure when they're allowed to. The tool_choice parameter controls this behavior:
When a model calls multiple tools in sequence, you get the agentic loop. While the model's stop reason indicates another tool call is needed, your application keeps executing tools and feeding results back. This loop is how complex multi-step AI agents are built.
OpenAI, Anthropic, and Google all implement tool calling — but with meaningful differences in architecture, naming conventions, and unique capabilities. Here's what you need to know about each.
OpenAI's function calling (now called tool calling) is the most mature and widely referenced implementation. It introduced the pattern that others would follow and refine.
Key features: parallel function calls (the model can request multiple tools simultaneously), strict mode (strict: true enforces exact schema conformance), and flexible tool_choice options (auto, required, none, or a forced specific function). The latest models including GPT-5.4 also support tool_search for dynamic tool discovery at runtime.
Claude's tool implementation is architecturally richer because it distinguishes between who executes the tool. Claude has three distinct categories:
bash and text_editor have schemas that are trained into the model itself for higher reliability. You still execute them locally — but you use the pre-defined schemas.web_search, code_execution, web_fetch, and tool_search — these run on Anthropic's infrastructure. You don't execute them; Anthropic does. They can even run their own internal loops (e.g., multiple web searches before returning a result).Claude signals tool calls via stop_reason: "tool_use" and completion via stop_reason: "end_turn". The strict: true option is also supported for schema conformance.
Gemini uses function_declarations inside a tools array — the flow mirrors the universal pattern with provider-specific naming. Multi-tool use allows combining built-in capabilities (like Google Search grounding) with custom function calling in the same request. Gemini 3 Flash Preview is among the latest available models.
| Feature | OpenAI | Claude | Gemini |
|---|---|---|---|
| Tool Format | functions in tools[] |
tools[] with input_schema |
functionDeclarations |
| Parallel Calls | Yes | Yes | Yes |
| Strict Schema | strict: true |
strict: true |
— |
| Server-side Tools | — | web_search, code_execution, web_fetch, tool_search | Google Search grounding |
| Dynamic Discovery | tool_search (GPT-5.4+) | tool_search (server) | — |
| Stop Signal | finish_reason: "tool_calls" |
stop_reason: "tool_use" |
finishReason: "STOP" + functionCall |
# OpenAI tool calling
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-4.1",
tools=[{
"type": "function",
"name": "get_weather",
"description": "Get current weather",
"parameters": { "type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"] }
}],
input=[{"role": "user", "content": "Weather in Seoul?"}]
)
# Check response.output for tool_call items
# Claude tool calling (client tool)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=[{
"name": "get_weather",
"description": "Get current weather",
"input_schema": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}],
messages=[{"role": "user", "content": "Weather in Seoul?"}]
)
# response.stop_reason == "tool_use" signals a tool call
# response.stop_reason == "end_turn" signals completion
Tool calling solved the isolation problem for a single AI app. But what about building a world of interconnected AI applications and tools? That requires a protocol — the Model Context Protocol.
Imagine you're building an AI-powered development environment. You want it to access GitHub, query Postgres, search documentation, run terminal commands, and check Slack. That's 5 tools. Now imagine 20 different AI applications all needing those same 5 tools — plus each needing 5 other unique ones. You're looking at potentially hundreds of custom integrations, each built differently, each maintained separately, each breaking in its own unique way.
This is the N×M problem: N AI applications times M tools equals N×M custom integrations. At any real scale, this becomes unmanageable. Teams spend more time writing glue code than building actual features.
Without MCP
5 AI Apps × 10 Tools
50
custom integrations
With MCP
5 AI Apps + 10 Tools
15
integrations (N+M)
MCP was created by Anthropic and open-sourced in November 2024. From the beginning it was designed as an open standard — not a proprietary Anthropic technology. The spec was developed to be provider-agnostic and community-driven.
In December 2025, at its one-year anniversary, MCP was donated to the Linux Foundation under the Agentic AI Foundation (AAIF), co-founded by Anthropic, Block, and OpenAI. This move cemented MCP's position as a true industry standard, not controlled by any single vendor.
By early 2026, over 500 public MCP servers exist in the wild, with support from all three major AI providers: Anthropic, OpenAI, and Google DeepMind. Every major AI IDE has embedded MCP client support.
MCP is to AI tools what USB is to peripherals — a universal connector. Before USB, every peripheral needed its own port, driver, and protocol. USB created a standard that worked everywhere. MCP does the same for AI integrations: one protocol, every tool, every application.
MCP defines three roles in its architecture. Understanding these roles is essential to understanding how MCP works:
All communication uses JSON-RPC 2.0 as the wire format, which is simple, well-understood, and language-agnostic.
MCP was directly inspired by the Language Server Protocol (LSP) — the standard that transformed how IDEs provide language intelligence. What LSP did for programming languages and text editors, MCP does for AI applications and tools.
MCP defines six core primitives — three offered by servers and three offered by clients. Understanding each one and when to use it is the key to designing effective MCP integrations.
Resources are READ. Tools are DO. Prompts are GUIDE. And on the client side: Sampling lets servers think, Roots tell servers where to operate, and Elicitation lets servers ask for input.
These three primitives are offered by MCP servers and consumed by hosts/clients.
Resources expose data to the AI model, similar to how an HTTP GET endpoint works. They're identified by URIs and can be either static (a fixed file) or dynamic (a database query that runs each time). Resources can be consumed by either the user or the AI model, depending on the application design.
Think of resources as the "read" side of your data: files on disk, database records, API responses cached as documents, configuration data, or documentation. A resources request never has side effects — it's always safe to call.
Examples: file:///project/README.md, postgres://db/customers/42, github://repo/anthropics/mcp/issues
Tools are functions the AI model can call to take action in the world. Unlike resources, tools can modify state — they can send emails, create files, commit code, trigger deployments, or query live APIs that consume rate limit budget. The model decides when to call tools (model-controlled).
Tools are the action layer. Every tool has a name, a description (critical for model selection), and an input schema. The server executes the tool and returns a result.
Examples: create_github_issue, run_sql_query, send_slack_message, deploy_to_kubernetes
Prompts are reusable workflow templates that users can invoke. Unlike tools (which the model triggers autonomously), prompts are user-controlled — a user selects a prompt from a menu. They accept dynamic arguments, can include resource context (pulling in relevant data), and can chain multiple actions into a workflow.
Think of prompts as "smart shortcuts" that encode expert workflows. A debug_error prompt might automatically gather relevant logs, context, and file snippets before presenting them to the model for analysis.
| Dimension | Resources | Tools |
|---|---|---|
| Access Pattern | Read-only (like HTTP GET) | Read/Write (side effects allowed) |
| Control | User or model controlled | Model controlled |
| Identification | URI-based | Function name + schema |
| Safety | Always safe to call | May have irreversible effects |
| Use Case | Files, records, documents, configs | Send email, deploy code, modify DB |
These three primitives are offered by MCP clients and allow servers to reach back into the host application's capabilities.
Sampling is one of MCP's most powerful and unique features. It allows an MCP server to request that the host perform an LLM inference on its behalf. This enables recursive, agentic behaviors entirely on the server side.
Imagine a code analysis server that, upon receiving a tool call, wants to reason about several possible approaches before choosing one. With sampling, it can ask the host's LLM: "Given this code, what refactoring approach is most appropriate?" The host runs the inference and returns the result to the server — which then continues executing. Users must approve sampling requests for security.
Roots allow servers to ask clients: "What directories or URIs should I be operating within?" The client reports the root directories it has access to or that are relevant to the current session. This lets servers properly scope their file system operations and URI access, preventing them from wandering outside their intended domain.
Elicitation allows a server to request additional information from the user mid-workflow. If a server reaches a decision point that requires human input — a confirmation, a clarification, a preference — it can ask through the client without aborting the entire operation. This enables interactive, human-in-the-loop workflows within MCP.
How do hosts and servers actually communicate? What security guarantees does MCP provide? And what changed in the massive November 2025 spec update that made MCP enterprise-ready?
MCP defines two official transport mechanisms. Both use JSON-RPC 2.0 as the wire format — what changes is how bytes flow between host and server.
The simplest transport: the host launches the server as a subprocess and communicates via stdin/stdout. The host writes JSON-RPC messages to the server's stdin; the server writes responses to its stdout.
Stdio is ideal for: local development, CLI tools, shell scripts, and any tool that runs as a local process. It requires no network, no ports, no authentication setup — the process isolation provides the security boundary. Most MCP servers for IDEs use Stdio.
For remote servers, MCP uses HTTP POST for client-to-server messages, with optional Server-Sent Events (SSE) for streaming responses back to the client. This enables real-time streaming of long operations while keeping the standard HTTP infrastructure.
Streamable HTTP is ideal for: remote services, cloud deployments, shared team servers, and enterprise infrastructure. It requires proper authentication (see below) but works through standard firewalls and proxies.
MCP's security model is built on four core principles that govern how servers, clients, and hosts interact:
For remote connections, MCP uses OAuth 2.1 with PKCE (Public Key Code Exchange), added in June 2025. MCP servers are classified as OAuth Resource Servers and implement Resource Indicators (RFC 8707) to prevent token theft and ensure tokens are only usable with their intended server.
The November 2025 specification update — released on MCP's one-year anniversary — transforms MCP from a developer tool into an enterprise-ready platform. It adds async execution, machine-to-machine auth, enterprise SSO integration, and more. This is the update that unlocked production deployments at scale.
Six major extensions landed in the one-year anniversary update, each addressing a real-world production need:
Previously, all MCP requests were synchronous — the client had to wait for a response. The Async Tasks extension allows any request to immediately return a task handle with states: working, input_required, completed, failed, and cancelled. The client can poll or be notified when the task completes. "Call now, fetch later" — essential for long-running operations like building a codebase or running a test suite.
OAuth traditionally requires clients to register with each resource server before use — a major friction point for MCP at scale. CIMD replaces this with URL-based client identity. A client publishes a metadata document at a well-known URL. Servers can fetch this document to learn about the client without requiring manual per-server registration. This makes deploying new MCP clients dramatically simpler.
A formal system for adding optional capabilities to the protocol. The framework defines: a lightweight registry/namespace for extensions, a capability negotiation mechanism (clients and servers agree on supported extensions), and extension settings. This is how future MCP features will be added without breaking backward compatibility.
Two new authorization schemes address enterprise use cases:
client_credentials grant. Designed for cron jobs, headless automation, CI/CD pipelines, and any agent running without a human user present. No interactive login required.For sensitive flows — collecting API credentials, third-party OAuth, payment processing — directing users through an MCP client is inappropriate. URL-mode Elicitation allows servers to redirect users to a browser URL for the sensitive step. The browser handles it securely (with HTTPS, proper redirect flows, etc.) and the result is returned to the MCP flow. Credentials never pass through the MCP client.
Extends the Sampling primitive so that server-initiated LLM requests can include tool definitions. This means servers can run their own complete agent loops — sampling to reason, calling tools to act, sampling again to analyze results — without needing to surface each step to the host. Enables sophisticated server-side autonomous behaviors.
MCP has grown from a protocol spec into a thriving ecosystem. Understanding the landscape — what exists, who's building it, and when to use it — is essential for any production implementation.
As of early 2026, over 500 public MCP servers exist across the ecosystem. The GitHub repository best-of-mcp-servers tracks over 370 ranked servers with a combined 380,000+ GitHub stars.
Enterprise adoption has moved beyond early adopters. Microsoft, AWS, and HashiCorp are actively developing and maintaining MCP servers for their platforms. The pattern has become standard in every major AI IDE: Cursor, VS Code (with GitHub Copilot), Claude Code, Zed, and Windsurf all embed MCP clients natively — meaning any MCP server you build works automatically in all of them.
A common source of confusion is how MCP relates to function calling and REST APIs. The key insight: they operate at different layers and are designed to be complementary, not competitive.
| Aspect | REST APIs | Function Calling | MCP |
|---|---|---|---|
| Layer | Transport (HTTP) | Model capability | Application protocol (JSON-RPC) |
| Discovery | Manual (read docs) | Static (per-request schemas) | Dynamic (runtime) |
| State | Stateless | Stateless | Stateful (sessions) |
| Lock-in | None | Provider-specific | None (open standard) |
| Scale | N custom integrations | Context tax (all tools every request) | On-demand, server-side |
| Auth | Per-API | Your code handles | Credential isolation at server |
MCP servers often call REST APIs internally. Function calling can invoke MCP tools. Most production systems use all three: REST for direct API access, function calling for immediate tool integration, and MCP for scalable, persistent tool ecosystems. Choosing one doesn't mean abandoning the others.
You've learned the theory. Now it's time to build. In this module, you'll write a complete MCP server in Python, configure it for Claude Desktop, and learn the practices that separate production-quality servers from toy examples.
Anthropic maintains the official Python SDK for MCP at modelcontextprotocol/python-sdk. It handles all the protocol plumbing — JSON-RPC encoding, capability negotiation, transport management — so you can focus on your tools' business logic.
Install it with: pip install mcp
The SDK's decorator-based API makes defining tools feel natural. The @app.tool() decorator automatically reads your function's docstring as the tool description and infers the schema from type annotations.
from mcp.server import Server
from mcp.types import Tool, TextContent
app = Server("my-first-server")
@app.tool()
async def get_stock_price(symbol: str) -> list[TextContent]:
"""Get the current stock price for a given ticker symbol."""
# In production, call a real API
prices = {"AAPL": 198.50, "GOOGL": 178.25, "MSFT": 425.00}
price = prices.get(symbol.upper(), None)
if price:
return [TextContent(
type="text",
text=f"{symbol.upper()}: ${price:.2f}"
)]
return [TextContent(type="text", text=f"Unknown symbol: {symbol}")]
if __name__ == "__main__":
import asyncio
asyncio.run(app.run_stdio())
Claude Desktop reads a JSON configuration file to discover MCP servers. On macOS it lives at ~/Library/Application Support/Claude/claude_desktop_config.json. On Windows it's at %APPDATA%\Claude\claude_desktop_config.json.
Add your server to the mcpServers object with a name, the command to run it, and any arguments. After saving and restarting Claude Desktop, your server's tools appear automatically in every conversation.
{
"mcpServers": {
"stock-prices": {
"command": "python",
"args": ["path/to/server.py"]
}
}
}
Start simple. Build one server with one tool. Get it working in Claude Desktop. Test it thoroughly. Then expand. The biggest mistake new MCP developers make is building 10 tools before validating that the first one works correctly end-to-end.
The MCP ecosystem is evolving rapidly. Discovery registries will make it easy to find and install public servers. Enterprise governance tools will provide audit trails and access controls for organizational deployments. The expanding ecosystem means that the tools you connect today will work across every MCP-compatible host as the ecosystem grows.
You're not just learning a protocol — you're positioning yourself at the frontier of how AI systems will interact with the world. Every MCP server you build is a piece of that infrastructure.