Skip to main content

Technical Blog

MCP in Production: The Practical Guide to Model Context Protocol for AI Agents That Actually Ship

mcpagentsarchitecturellmenterprisesecuritypython

MCP is the protocol every AI agent now speaks. But shipping it in production — with auth, sandboxing, and tool governance — is where most teams stall. A hands-on architecture guide with Python, security patterns, and the cost math that makes it work for B2B.

Every agent now speaks MCP. Most teams still cannot ship it.

The Model Context Protocol has had the fastest adoption curve of any AI infrastructure standard in history. Anthropic released the spec in late 2024. By early 2025, OpenAI, Google, and Microsoft had announced support. In March 2026, the protocol moved to the Linux Foundation, making it vendor-neutral and community-governed. Today, in May 2026, MCP is the de facto standard for connecting AI agents to external tools, databases, and APIs.

And yet, if you look at production deployments — not demos, not blog posts, not hackathon projects — the adoption story is much thinner. Most enterprise teams I talk to are stuck in one of three places:

  1. They have a working MCP server in staging that connects to one tool, with no auth, no rate limiting, and no governance. It works. It will never ship.
  2. They have multiple MCP servers from different vendors, with no consistent security model, no unified observability, and no way to reason about what tools an agent can access across contexts.
  3. They skipped MCP entirely and are still wiring tools via bespoke function-calling schemas, one integration at a time, accumulating N×M technical debt.

This post is for teams in all three positions. It covers what MCP actually solves, how to build a production-grade MCP layer, the security model that enterprises require, and the cost structure that makes the investment defensible.

What MCP actually solves — and what it does not

MCP is a JSON-RPC 2.0 protocol that standardizes how an AI client (the agent, the IDE, the copilot) discovers and invokes tools exposed by an MCP server. The server declares its capabilities — tools, resources, prompts — and the client calls them through a typed, schema-validated interface.

The protocol solves the N×M integration problem. Without MCP, connecting 5 AI clients to 10 tools requires 50 bespoke integrations. With MCP, each tool is exposed once as an MCP server, each client speaks MCP, and any client can invoke any server. This is the USB-C analogy that every MCP explainer uses, and it is accurate.

What MCP does not solve:

  • Authentication and authorization. The spec defines transport (stdio, HTTP+SSE, and the newer Streamable HTTP) but delegates auth to the implementation. In practice, this means every MCP server you deploy needs its own auth story, and most open-source MCP servers ship with none.
  • Tool governance. Which agent, in which context, is allowed to call which tool with which parameters? MCP gives you the wire protocol. The policy layer is your problem.
  • Observability. MCP does not define tracing, logging, or metrics. You get tool calls in and results out. What happens in between — latency, error rates, cost per call, data lineage — is infrastructure you must build or buy.
  • Security sandboxing. An MCP server that wraps a shell command, a database query, or a file-system operation is an attack surface. The spec does not enforce sandboxing. Your deployment must.

The gap between "MCP works in my terminal" and "MCP is in production serving 10,000 users" is entirely in these four areas.

The architecture: MCP gateway pattern

The pattern that works in production — and that I have deployed across multiple B2B engagements — is an MCP gateway. Instead of letting each AI client connect directly to each MCP server, you interpose a gateway that handles auth, policy, observability, and sandboxing in one place.

┌─────────────┐     ┌─────────────────────────────────────────┐     ┌─────────────┐
│  AI Client  │     │            MCP Gateway                  │     │ MCP Server: │
│  (Agent,    │────▶│  ┌─────────┐ ┌──────────┐ ┌──────────┐ │────▶│  Database    │
│   Copilot,  │     │  │  Auth   │ │  Policy  │ │  Trace   │ │     └─────────────┘
│   IDE)      │     │  │  Layer  │ │  Engine  │ │  Collector│ │     ┌─────────────┐
└─────────────┘     │  └─────────┘ └──────────┘ └──────────┘ │────▶│ MCP Server: │
                    │                                         │     │  File System │
┌─────────────┐     │  ┌─────────┐ ┌──────────┐              │     └─────────────┘
│  AI Client  │────▶│  │  Rate   │ │  Sandbox │              │     ┌─────────────┐
│  (Agent 2)  │     │  │  Limiter│ │  Manager │              │────▶│ MCP Server: │
└─────────────┘     │  └─────────┘ └──────────┘              │     │  API Wrapper │
                    └─────────────────────────────────────────┘     └─────────────┘

The gateway is not a proxy in the HTTP sense. It is an MCP server itself that dynamically composes the tool manifests of its backend servers and exposes them to clients as a single, governed surface. The client sees one MCP endpoint. The gateway fans out to N servers behind the scenes.

Implementation: a production MCP gateway in Python

The following is a working gateway skeleton. It is not a library — it is the shape of the thing you should have before you wire a single agent to a production tool.

The core gateway

import asyncio
import json
import time
import logging
from dataclasses import dataclass, field
from typing import Any, Callable
 
logger = logging.getLogger("mcp_gateway")
 
 
@dataclass
class ToolManifest:
    """A tool exposed through the gateway, with policy metadata."""
    name: str
    description: str
    input_schema: dict
    server_id: str
    risk_level: str = "low"        # low | medium | high | critical
    requires_approval: bool = False
    max_calls_per_minute: int = 60
    allowed_roles: list[str] = field(default_factory=lambda: ["*"])
 
 
@dataclass
class MCPRequest:
    """Incoming tool call from an AI client."""
    tool_name: str
    arguments: dict[str, Any]
    caller_id: str
    caller_role: str
    session_id: str
    timestamp: float = field(default_factory=time.time)
 
 
@dataclass
class MCPResponse:
    """Outgoing result to the AI client."""
    tool_name: str
    result: Any
    latency_ms: float
    server_id: str
    approved: bool = True
    error: str | None = None
 
 
class PolicyEngine:
    """Evaluates whether a tool call is allowed under the current policy."""
 
    def __init__(self, approval_callback: Callable | None = None):
        self._approval_callback = approval_callback
 
    def evaluate(self, request: MCPRequest, manifest: ToolManifest) -> tuple[bool, str]:
        # 1. Role-based access control
        if "*" not in manifest.allowed_roles and request.caller_role not in manifest.allowed_roles:
            return False, f"Role '{request.caller_role}' not authorized for tool '{manifest.name}'"
 
        # 2. Risk-level gating: critical tools always need human approval
        if manifest.risk_level == "critical":
            if self._approval_callback is None:
                return False, "Critical tool requires human approval but no approval callback is configured"
            approved = self._approval_callback(request, manifest)
            if not approved:
                return False, "Human approval denied"
 
        # 3. High-risk tools with destructive arguments
        if manifest.risk_level == "high":
            destructive_keywords = {"delete", "drop", "truncate", "remove", "destroy"}
            arg_str = json.dumps(request.arguments).lower()
            if any(kw in arg_str for kw in destructive_keywords):
                if self._approval_callback:
                    approved = self._approval_callback(request, manifest)
                    if not approved:
                        return False, "Destructive operation denied by human reviewer"
 
        return True, "OK"
 
 
class RateLimiter:
    """Token-bucket rate limiter per tool per caller."""
 
    def __init__(self):
        self._buckets: dict[str, list[float]] = {}
 
    def check(self, caller_id: str, tool_name: str, max_rpm: int) -> bool:
        key = f"{caller_id}:{tool_name}"
        now = time.time()
        window = self._buckets.setdefault(key, [])
        # Evict old entries outside the 60-second window
        self._buckets[key] = [t for t in window if now - t < 60]
        if len(self._buckets[key]) >= max_rpm:
            return False
        self._buckets[key].append(now)
        return True
 
 
class TraceCollector:
    """Collects per-call traces for observability and audit."""
 
    def __init__(self):
        self._traces: list[dict] = []
 
    def record(self, request: MCPRequest, response: MCPResponse, policy_result: str):
        self._traces.append({
            "timestamp": request.timestamp,
            "session_id": request.session_id,
            "caller_id": request.caller_id,
            "caller_role": request.caller_role,
            "tool": request.tool_name,
            "arguments": request.arguments,
            "server_id": response.server_id,
            "result_preview": str(response.result)[:200],
            "latency_ms": response.latency_ms,
            "approved": response.approved,
            "policy_result": policy_result,
            "error": response.error,
        })
 
    def export(self) -> list[dict]:
        return list(self._traces)

Composing the gateway

class MCPGateway:
    """Central gateway that mediates between AI clients and MCP servers."""
 
    def __init__(self):
        self.tools: dict[str, ToolManifest] = {}
        self.servers: dict[str, Callable] = {}  # server_id -> async call function
        self.policy = PolicyEngine()
        self.rate_limiter = RateLimiter()
        self.tracer = TraceCollector()
 
    def register_server(
        self,
        server_id: str,
        tools: list[ToolManifest],
        call_fn: Callable,
    ) -> None:
        """Register an MCP server and its tools with the gateway."""
        for tool in tools:
            tool.server_id = server_id
            self.tools[tool.name] = tool
        self.servers[server_id] = call_fn
        logger.info(f"Registered server '{server_id}' with {len(tools)} tools")
 
    def list_tools(self, caller_role: str = "*") -> list[dict]:
        """Return the composed tool manifest, filtered by caller role."""
        visible = []
        for tool in self.tools.values():
            if "*" in tool.allowed_roles or caller_role in tool.allowed_roles:
                visible.append({
                    "name": tool.name,
                    "description": tool.description,
                    "inputSchema": tool.input_schema,
                    "riskLevel": tool.risk_level,
                    "requiresApproval": tool.requires_approval,
                })
        return visible
 
    async def call_tool(self, request: MCPRequest) -> MCPResponse:
        """Route a tool call through policy, rate limiting, and the backend server."""
        t0 = time.perf_counter()
 
        # 1. Resolve the tool
        manifest = self.tools.get(request.tool_name)
        if manifest is None:
            return MCPResponse(
                tool_name=request.tool_name,
                result=None,
                latency_ms=0,
                server_id="unknown",
                approved=False,
                error=f"Tool '{request.tool_name}' not found",
            )
 
        # 2. Policy check
        allowed, reason = self.policy.evaluate(request, manifest)
        if not allowed:
            resp = MCPResponse(
                tool_name=request.tool_name,
                result=None,
                latency_ms=(time.perf_counter() - t0) * 1000,
                server_id=manifest.server_id,
                approved=False,
                error=reason,
            )
            self.tracer.record(request, resp, reason)
            return resp
 
        # 3. Rate limiting
        if not self.rate_limiter.check(
            request.caller_id, request.tool_name, manifest.max_calls_per_minute
        ):
            resp = MCPResponse(
                tool_name=request.tool_name,
                result=None,
                latency_ms=(time.perf_counter() - t0) * 1000,
                server_id=manifest.server_id,
                approved=False,
                error="Rate limit exceeded",
            )
            self.tracer.record(request, resp, "rate_limited")
            return resp
 
        # 4. Dispatch to the backend MCP server
        call_fn = self.servers[manifest.server_id]
        try:
            result = await call_fn(request.tool_name, request.arguments)
            error = None
        except Exception as e:
            result = None
            error = str(e)
            logger.error(f"Tool '{request.tool_name}' failed: {e}")
 
        resp = MCPResponse(
            tool_name=request.tool_name,
            result=result,
            latency_ms=(time.perf_counter() - t0) * 1000,
            server_id=manifest.server_id,
            approved=True,
            error=error,
        )
        self.tracer.record(request, resp, reason)
        return resp

Wiring a real MCP server

Here is how you register a database query tool and a file-system tool behind the gateway, with different risk levels and RBAC:

import subprocess
 
# --- Database tool (read-only, low risk) ---
 
async def db_query_handler(tool_name: str, arguments: dict) -> dict:
    """Execute a read-only SQL query against the analytics warehouse."""
    import asyncpg
    conn = await asyncpg.connect("postgresql://readonly:***@db:5432/analytics")
    try:
        rows = await conn.fetch(arguments["query"])
        return {"rows": [dict(r) for r in rows], "count": len(rows)}
    finally:
        await conn.close()
 
 
db_tool = ToolManifest(
    name="query_analytics_db",
    description="Run a read-only SQL query against the analytics warehouse.",
    input_schema={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "SQL SELECT query"}
        },
        "required": ["query"],
    },
    server_id="analytics_db",
    risk_level="medium",
    max_calls_per_minute=30,
    allowed_roles=["analyst", "engineer", "admin"],
)
 
 
# --- File system tool (write access, high risk) ---
 
async def file_write_handler(tool_name: str, arguments: dict) -> dict:
    """Write content to a sandboxed directory."""
    import os
 
    base = os.path.abspath("/data/agent-workspace")
    target = os.path.abspath(os.path.join(base, arguments["path"]))
 
    if not target.startswith(base):
        raise PermissionError(f"Path traversal blocked: {arguments['path']}")
 
    os.makedirs(os.path.dirname(target), exist_ok=True)
    with open(target, "w") as f:
        f.write(arguments["content"])
 
    return {"status": "written", "path": arguments["path"]}
 
 
file_tool = ToolManifest(
    name="write_workspace_file",
    description="Write a file to the agent's sandboxed workspace directory.",
    input_schema={
        "type": "object",
        "properties": {
            "path": {"type": "string", "description": "Relative path within workspace"},
            "content": {"type": "string", "description": "File content to write"},
        },
        "required": ["path", "content"],
    },
    server_id="workspace_fs",
    risk_level="high",
    requires_approval=False,
    max_calls_per_minute=10,
    allowed_roles=["engineer", "admin"],
)
 
 
# --- Compose the gateway ---
 
gateway = MCPGateway()
gateway.register_server("analytics_db", [db_tool], db_query_handler)
gateway.register_server("workspace_fs", [file_tool], file_write_handler)
 
# An agent sees only the tools its role permits:
print(gateway.list_tools(caller_role="analyst"))
# [{"name": "query_analytics_db", ...}]  — no file_write_handler visible

Three things about this architecture are non-negotiable.

Policy is evaluated before dispatch, not after. The gateway never forwards a call to a backend server unless RBAC, rate limiting, and risk-level checks pass. This is the difference between a governed system and a demo.

Every call is traced with full lineage. When a compliance officer asks "which agent, calling which tool, with which arguments, produced this output, at what time," the trace collector has the answer. This is the audit layer that turns an agent from a liability into an asset.

File and database operations are sandboxed in the handler, not in the protocol. MCP does not sandbox anything. Your handler code must enforce path containment, read-only constraints, and parameter validation. Trusting the agent not to send malicious arguments is not a security model.

The security model: what enterprises actually need

The MCP ecosystem has a security problem that the community is actively working on, but that you must solve yourself today. The three attack vectors that matter most in production:

1. Tool poisoning via unverified registries

In early 2026, security researchers demonstrated that public MCP server registries are vulnerable to typosquatting — registering a server named postgres-mcp that looks like the popular postgresql-mcp but exfiltrates connection strings. The mitigation is manifest-only execution: never auto-install MCP servers from a registry at runtime. Pin server versions, hash manifests, and review tool schemas in your CI pipeline before deployment.

import hashlib
 
def verify_manifest(manifest: dict, expected_hash: str) -> bool:
    """Verify that a tool manifest has not been tampered with."""
    canonical = json.dumps(manifest, sort_keys=True, separators=(",", ":"))
    actual_hash = hashlib.sha256(canonical.encode()).hexdigest()
    return actual_hash == expected_hash

2. Prompt injection through tool results

An MCP server returns data to the agent. If that data contains instructions ("ignore all previous instructions and..."), a naive agent may follow them. This is the indirect prompt injection vector, and it is amplified by MCP because the tool result is injected directly into the agent's context.

The defense is result sanitization at the gateway level:

import re
 
INJECTION_PATTERNS = [
    r"ignore\s+(all\s+)?previous\s+instructions",
    r"you\s+are\s+now\s+a",
    r"system:\s*",
    r"<\|im_start\|>",
    r"<\|endoftext\|>",
]
 
def sanitize_tool_result(result: str) -> str:
    """Strip potential prompt injection patterns from tool results."""
    for pattern in INJECTION_PATTERNS:
        result = re.sub(pattern, "[REDACTED]", result, flags=re.IGNORECASE)
    return result

This is not a complete solution — prompt injection is an open research problem. But it catches the low-hanging attacks and raises the cost for an adversary.

3. Privilege escalation through tool chaining

An agent that can call query_database and write_file independently may be safe. An agent that can call query_database to exfiltrate credentials, then write_file to store them externally, has escalated privileges through composition. The gateway's policy engine must reason about tool chains, not just individual calls.

class ChainPolicy:
    """Detect and block risky tool call sequences within a session."""
 
    BLOCKED_CHAINS = [
        ("query_analytics_db", "write_workspace_file"),
        ("read_secrets", "send_email"),
    ]
 
    def __init__(self):
        self._session_history: dict[str, list[str]] = {}
 
    def check(self, session_id: str, tool_name: str) -> tuple[bool, str]:
        history = self._session_history.setdefault(session_id, [])
 
        for chain in self.BLOCKED_CHAINS:
            if len(history) > 0 and history[-1] == chain[0] and tool_name == chain[1]:
                return False, f"Blocked chain: {chain[0]}{chain[1]}"
 
        history.append(tool_name)
        return True, "OK"

The cost model: why the gateway pays for itself

MCP gateway infrastructure is not free. You are adding a hop, a policy evaluation, and a trace write to every tool call. Let me make the cost case explicit.

Consider a B2B agent handling 100,000 tool calls per day across 5 MCP servers.

Without a gateway

ProblemCost
No rate limiting: agent loops hammer the database at 500 QPSDatabase scaling: $2,000/month
No RBAC: analyst-tier agent accesses admin toolsIncident response: $50,000+ per security event
No observability: debugging agent failures requires log archaeologyEngineering time: 20+ hours/week at 150/hr=150/hr = 12,000/month
No tool governance: redundant calls to frontier APIsWasted inference: $3,000/month

With a gateway

ComponentMonthly cost
Gateway compute (2 vCPU, 4GB RAM)$50
Trace storage (100K calls/day × 1KB each ≈ 3GB/month)$10
Engineering: initial build (amortized over 12 months)$2,500
Monthly total$2,560
Annual total$30,720

The rate limiter alone saves the database scaling cost. The RBAC layer prevents one security incident that would cost more than three years of gateway operation. The trace collector replaces 20 hours/week of log archaeology with a dashboard query. The policy engine catches the redundant frontier API calls that context engineering would also prevent.

Net: the gateway is not overhead. It is the cheapest insurance policy in your AI stack.

MCP and the compound AI stack

MCP does not exist in isolation. It is one layer in the compound AI architecture that production agent systems require. Here is where it fits:

LayerResponsibilityKey technology
RoutingClassify query complexity, pick inference tierSemantic router
ContextAssemble minimal, correct context for the modelContext engine
InferenceGenerate the response or planFrontier API, local SLM, or ternary model
Tool executionInvoke external capabilities via governed protocolMCP gateway (this post)
MemoryPersist session and entity stateStructured DB, LLM Wiki
EvaluationMeasure trajectory quality, detect driftOffline evals + production traces

The MCP gateway is the tool execution layer. It sits between the agent's plan ("I need to query the database") and the actual execution ("here is the SQL, here is the result"). Without governance at this layer, every other layer's quality guarantees are void — a context engine that carefully curates 3,500 tokens of relevant context is useless if the agent then dumps 40KB of unfiltered tool output into the next call.

What I would build this week

If you are starting from zero with MCP in a B2B context, here is the four-week roadmap I would follow:

Week 1: Inventory and manifest. Catalog every tool your agents currently use. For each tool, define its MCP schema, risk level, and RBAC policy. Pin manifests in version control.

Week 2: Gateway MVP. Deploy the gateway pattern above with auth (API keys or OAuth2 client credentials), rate limiting, and basic tracing. Wire one agent to one tool through the gateway. Validate that the policy engine blocks unauthorized calls.

Week 3: Multi-server composition. Register all your MCP servers behind the gateway. Test cross-server tool chains. Implement chain-level policy checks. Deploy the trace dashboard (Grafana + the exported traces, or Langfuse if you prefer a managed solution).

Week 4: Production hardening. Add result sanitization for prompt injection defense. Implement manifest hash verification in CI. Set up alerting on anomalous tool call patterns (spike in rate-limited rejections, new tool names appearing, calls from unknown caller IDs). Run a tabletop security exercise with your team.

At the end of four weeks, you have a governed, observable, auditable MCP layer that every agent in your stack can use — and that your security team can approve.

The next six months

Three predictions for the MCP ecosystem through the end of 2026.

First, MCP gateway will become a standard infrastructure component, like an API gateway is for REST services today. Managed offerings will emerge from the cloud providers, but the pattern is simple enough that most teams will build their own initially — and should, because the policy engine is deeply specific to your business.

Second, A2A (Agent-to-Agent Protocol) will compose with MCP to enable multi-agent systems where agents from different vendors discover each other's capabilities and delegate tasks. MCP handles agent-to-tool. A2A handles agent-to-agent. Together they form the protocol stack for enterprise agentic systems.

Third, MCP security will become a compliance requirement in regulated industries. Banks, healthcare providers, and government contractors will need to demonstrate that their agent tooling has RBAC, audit trails, and sandboxing — exactly the capabilities that the gateway pattern provides. Teams that build this infrastructure now will have a structural advantage when the compliance mandates arrive.

The protocol itself is simple. The production infrastructure around it is where the engineering — and the value — lives.


I work as an AI Engineer and Data Scientist, available for contract engagements on RAG architecture, agent systems, and LLM integration. If this post was useful, feel free to reach out.