How to Create Model Context Protocol Server That Serves API

Introduction

As AI coding assistants become more sophisticated, they need better access to your API documentation to generate accurate code and provide intelligent responses. The Model Context Protocol (MCP) provides a standardized way for AI tools to interact with external services and data sources. By creating an MCP server that serves your API documentation, you enable AI assistants to understand your API’s structure, endpoints, authentication requirements, and response formats without manual context injection.

This guide walks through building an MCP server in Python that exposes your API documentation to AI tools. You’ll learn how to structure your server, parse documentation formats, and expose tools that AI assistants can query dynamically.

Prerequisites

Before building your MCP documentation server, ensure you have:

Python 3.10 or higher installed
An API with documentation (OpenAPI/Swagger, Postman collection, or custom format)
Basic familiarity with FastMCP or similar MCP frameworks

Setting Up Your Project

Start by creating a new Python project and installing the necessary dependencies:

mkdir api-docs-mcp-server
cd api-docs-mcp-server
uv venv
uv pip install fastmcp pydantic pyyaml httpx

Initialize your project structure:

api-docs-mcp-server/
├── main.py
├── docs/
│   └── openapi.yaml
└── requirements.txt

Creating the MCP Server

The core of your documentation server involves parsing your API documentation and exposing it through MCP tools. Here’s a complete implementation using FastMCP:

from fastmcp import FastMCP
from pydantic import BaseModel
from typing import Optional, List
import yaml
import httpx

mcp = FastMCP("API Documentation Server")

# Store parsed documentation in memory
api_docs = {}

class EndpointInfo(BaseModel):
    path: str
    method: str
    summary: str
    description: Optional[str] = None
    parameters: List[dict] = []
    request_body: Optional[dict] = None
    responses: dict = {}

def load_openapi_spec(path: str) -> dict:
    """Load and parse OpenAPI specification file."""
    with open(path, 'r') as f:
        return yaml.safe_load(f)

def parse_endpoints(spec: dict) -> List[EndpointInfo]:
    """Extract endpoint information from OpenAPI spec."""
    endpoints = []
    paths = spec.get('paths', {})

    for path, methods in paths.items():
        for method, details in methods.items():
            if method in ['get', 'post', 'put', 'delete', 'patch']:
                endpoint = EndpointInfo(
                    path=path,
                    method=method.upper(),
                    summary=details.get('summary', ''),
                    description=details.get('description'),
                    parameters=details.get('parameters', []),
                    request_body=details.get('requestBody'),
                    responses=details.get('responses', {})
                )
                endpoints.append(endpoint)

    return endpoints

@mcp.tool()
async def load_documentation(spec_path: str = "docs/openapi.yaml") -> str:
    """Load API documentation from OpenAPI specification file."""
    global api_docs
    spec = load_openapi_spec(spec_path)
    api_docs['spec'] = spec
    api_docs['endpoints'] = parse_endpoints(spec)
    api_docs['title'] = spec.get('info', {}).get('title', 'API')
    api_docs['version'] = spec.get('info', {}).get('version', '1.0.0')

    return f"Loaded documentation for {api_docs['title']} v{api_docs['version']} with {len(api_docs['endpoints'])} endpoints"

@mcp.tool()
async def get_endpoint(path: str, method: str) -> dict:
    """Get detailed information about a specific endpoint."""
    if not api_docs.get('endpoints'):
        return {"error": "No documentation loaded. Call load_documentation first."}

    for endpoint in api_docs['endpoints']:
        if endpoint.path == path and endpoint.method.upper() == method.upper():
            return {
                "path": endpoint.path,
                "method": endpoint.method,
                "summary": endpoint.summary,
                "description": endpoint.description,
                "parameters": endpoint.parameters,
                "request_body": endpoint.request_body,
                "responses": endpoint.responses
            }

    return {"error": f"Endpoint {method} {path} not found"}

@mcp.tool()
async def search_endpoints(query: str) -> List[dict]:
    """Search for endpoints matching a query string."""
    if not api_docs.get('endpoints'):
        return [{"error": "No documentation loaded. Call load_documentation first."}]

    query_lower = query.lower()
    results = []

    for endpoint in api_docs['endpoints']:
        if (query_lower in endpoint.path.lower() or
            query_lower in endpoint.summary.lower() or
            (endpoint.description and query_lower in endpoint.description.lower())):
            results.append({
                "path": endpoint.path,
                "method": endpoint.method,
                "summary": endpoint.summary
            })

    return results

@mcp.tool()
async def list_endpoints() -> List[dict]:
    """List all available endpoints."""
    if not api_docs.get('endpoints'):
        return [{"error": "No documentation loaded. Call load_documentation first."}]

    return [
        {"path": e.path, "method": e.method, "summary": e.summary}
        for e in api_docs['endpoints']
    ]

if __name__ == "__main__":
    mcp.run()

Creating Sample Documentation

Create a sample OpenAPI specification to test your server:

# docs/openapi.yaml
openapi: 3.0.0
info:
  title: Task Management API
  version: 1.0.0
  description: API for managing tasks and projects

paths:
  /tasks:
    get:
      summary: List all tasks
      description: Retrieve a paginated list of all tasks
      parameters:
        - name: page
          in: query
          schema:
            type: integer
            default: 1
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
      responses:
        '200':
          description: Successful response
          content:
            application/json:
              schema:
                type: object
                properties:
                  tasks:
                    type: array
                    items:
                      type: object
                      properties:
                        id:
                          type: string
                        title:
                          type: string
                        status:
                          type: string
                          enum: [pending, in_progress, completed]
    post:
      summary: Create a new task
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - title
              properties:
                title:
                  type: string
                description:
                  type: string
                priority:
                  type: string
                  enum: [low, medium, high]
      responses:
        '201':
          description: Task created successfully

  /tasks/{task_id}:
    get:
      summary: Get a specific task
      parameters:
        - name: task_id
          in: path
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Successful response
        '404':
          description: Task not found
    delete:
      summary: Delete a task
      parameters:
        - name: task_id
          in: path
          required: true
          schema:
            type: string
      responses:
        '204':
          description: Task deleted successfully

Running and Testing the Server

Start your MCP server:

python main.py

The server will start and listen for connections from AI tools. To test it, you can use an MCP-compatible client or manually invoke the tools:

import asyncio
from main import load_documentation, list_endpoints, get_endpoint, search_endpoints

async def test_server():
    # Load documentation
    result = await load_documentation("docs/openapi.yaml")
    print(result)

    # List all endpoints
    endpoints = await list_endpoints()
    print(f"Found {len(endpoints)} endpoints:")
    for ep in endpoints:
        print(f"  {ep['method']} {ep['path']} - {ep['summary']}")

    # Search for endpoints
    results = await search_endpoints("task")
    print(f"\nSearch results: {results}")

    # Get specific endpoint details
    details = await get_endpoint("/tasks", "GET")
    print(f"\nGET /tasks details: {details}")

asyncio.run(test_server())

Integrating with AI Tools

Once your MCP server is running, configure your AI assistant to connect to it. Most AI coding tools support MCP through their configuration files:

{
  "mcpServers": {
    "api-docs": {
      "command": "python",
      "args": ["/path/to/api-docs-mcp-server/main.py"]
    }
  }
}

After configuration, your AI assistant can query your API documentation directly. For example:

“Show me all endpoints related to user authentication”
“What parameters does the POST /tasks endpoint accept?”
“How do I authenticate with the API?”

Advanced Features

To enhance your documentation server further, consider adding these capabilities:

Response Examples: Extract and expose example responses from your OpenAPI spec so AI tools can understand data structures.

Authentication Documentation: Parse security schemes and expose authentication requirements clearly.

Rate Limiting Info: Include rate limit headers and retry-after values in endpoint documentation.

Version Comparison: Support multiple API versions and allow querying differences between versions.

Caching Documentation in Memory vs. Reloading on Change

Loading the OpenAPI spec on every tool call adds latency. But serving stale docs when the spec changes during development produces confusing AI suggestions. The right approach is to cache with optional hot-reload:

import asyncio
import hashlib
from pathlib import Path

class DocumentationCache:
    def __init__(self, spec_path: str):
        self.spec_path = Path(spec_path)
        self._cache: dict = {}
        self._last_hash: str = ""

    def _file_hash(self) -> str:
        return hashlib.md5(self.spec_path.read_bytes()).hexdigest()

    def get_or_reload(self) -> dict:
        current_hash = self._file_hash()
        if current_hash != self._last_hash:
            spec = load_openapi_spec(str(self.spec_path))
            self._cache = {
                "spec": spec,
                "endpoints": parse_endpoints(spec),
                "title": spec.get("info", {}).get("title", "API"),
                "version": spec.get("info", {}).get("version", "1.0.0"),
            }
            self._last_hash = current_hash
        return self._cache

This approach reloads automatically when the file changes without polling. When the MD5 matches the cached hash, it returns the cached result immediately. Development workflows benefit from the hot-reload; production deployments benefit from the cache hit performance.

Exposing Authentication Schemes

AI assistants generate better API client code when they understand the authentication model. Parse the security schemes from your OpenAPI spec and expose them as a dedicated tool:

@mcp.tool()
async def get_auth_schemes() -> dict:
    """Get authentication requirements for the API."""
    if not api_docs.get("spec"):
        return {"error": "No documentation loaded."}

    spec = api_docs["spec"]
    security_schemes = spec.get("components", {}).get("securitySchemes", {})
    global_security = spec.get("security", [])

    result = {"global_requirements": global_security, "schemes": {}}

    for name, scheme in security_schemes.items():
        result["schemes"][name] = {
            "type": scheme.get("type"),
            "description": scheme.get("description", ""),
        }
        if scheme.get("type") == "apiKey":
            result["schemes"][name]["in"] = scheme.get("in")
            result["schemes"][name]["name"] = scheme.get("name")
        elif scheme.get("type") == "http":
            result["schemes"][name]["scheme"] = scheme.get("scheme")  # bearer, basic
        elif scheme.get("type") == "oauth2":
            result["schemes"][name]["flows"] = list(scheme.get("flows", {}).keys())

    return result

When an AI assistant calls get_auth_schemes() before generating code for a protected endpoint, it automatically includes correct Authorization headers or API key parameters rather than requiring you to specify authentication in every prompt.

Supporting Multiple API Versions

APIs evolve. An MCP documentation server that only knows the current version can’t help when a client is pinned to an older version. Add multi-version support with a version registry:

# Store multiple spec versions
api_versions: dict[str, dict] = {}

@mcp.tool()
async def load_version(spec_path: str, version_label: str = "current") -> str:
    """Load a specific API version's documentation."""
    spec = load_openapi_spec(spec_path)
    api_versions[version_label] = {
        "spec": spec,
        "endpoints": parse_endpoints(spec),
        "title": spec.get("info", {}).get("title", "API"),
        "version": spec.get("info", {}).get("version"),
    }
    return f"Loaded {version_label}: {api_versions[version_label]['title']} v{api_versions[version_label]['version']}"

@mcp.tool()
async def list_versions() -> list[dict]:
    """List all loaded API versions."""
    return [
        {"label": label, "api_version": data["version"], "title": data["title"]}
        for label, data in api_versions.items()
    ]

@mcp.tool()
async def search_endpoints_by_version(query: str, version_label: str = "current") -> list[dict]:
    """Search endpoints in a specific API version."""
    if version_label not in api_versions:
        return [{"error": f"Version '{version_label}' not loaded."}]
    endpoints = api_versions[version_label]["endpoints"]
    query_lower = query.lower()
    return [
        {"path": e.path, "method": e.method, "summary": e.summary}
        for e in endpoints
        if query_lower in e.path.lower() or query_lower in e.summary.lower()
    ]

Load both versions at startup and AI assistants can generate code targeting either, or compare endpoints across versions to identify breaking changes.

Built by theluckystrike — More at zovo.one