AI Tools Compared

Technical writing is one of the highest-value tasks for AI assistance — and one where the quality gap between models is most visible. A poorly written runbook gets followed incorrectly during an incident. A vague architecture decision record creates confusion months later. This comparison tests Claude and ChatGPT (GPT-4o) on the specific writing tasks engineers and technical leads face weekly.

Tasks Tested

  1. API endpoint documentation from code
  2. Incident runbooks with step-by-step procedures
  3. Architecture Decision Records (ADRs)
  4. README files for open-source libraries
  5. Post-mortem reports from incident timelines

Task 1: API Documentation from Code

Input given to both models — a FastAPI endpoint for creating webhooks:

@app.post("/api/v2/webhooks")
async def create_webhook(
    payload: WebhookCreateRequest,
    current_user: User = Depends(get_current_user),
    db: AsyncSession = Depends(get_db),
) -> WebhookResponse:
    if not current_user.has_permission("webhooks:write"):
        raise HTTPException(status_code=403)

    webhook = Webhook(
        url=str(payload.url),
        events=payload.events,
        secret=generate_webhook_secret(),
        owner_id=current_user.id,
    )
    db.add(webhook)
    await db.commit()
    return WebhookResponse.from_orm(webhook)

Claude’s output (condensed):

## POST /api/v2/webhooks

Creates a new webhook endpoint for the authenticated user.

**Authentication:** Bearer token required. Caller must have `webhooks:write` permission.

**Request Body:**
| Field | Type | Required | Description |
|---|---|---|---|
| url | string (URL) | Yes | HTTPS endpoint to receive webhook events |
| events | string[] | Yes | Array of event types to subscribe to |

**Response:** `201 Created`

**Errors:**
- `403 Forbidden` — Missing `webhooks:write` permission
- `422 Unprocessable Entity` — Invalid URL format or unknown event type

ChatGPT’s output (condensed):

## Create Webhook

**Endpoint:** `POST /api/v2/webhooks`

This endpoint allows authenticated users to create a webhook...

**Parameters:**
- `url` (required): The URL where webhook events will be sent
- `events` (required): List of events to subscribe to

**Returns:** A webhook object on success.

Claude’s output includes the permission requirement, typed request fields in a proper table, and specific error codes with causes. ChatGPT’s output is accurate but less detailed — the missing permission documentation is a real gap that causes integration errors.

Task 2: Incident Runbook

Prompt: “Write a runbook for responding to a database connection pool exhaustion alert in a Node.js application using pg-pool.”

Claude included actual diagnosis queries:

SELECT
    count(*) as total_connections,
    state,
    wait_event_type,
    wait_event
FROM pg_stat_activity
WHERE datname = 'your_database'
GROUP BY state, wait_event_type, wait_event
ORDER BY count(*) DESC;

And specific remediation with configuration examples:

const pool = new Pool({
    max: 20,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 5000,
    allowExitOnIdle: true,
});

ChatGPT’s runbook: Provided correct high-level steps (check connections, check application logs, restart service) without the specific SQL queries or configuration details.

For a runbook to be useful during an incident, it must be executable without additional research. Claude’s output was — ChatGPT’s required lookup.

Task 3: Architecture Decision Records

Prompt: “Write an ADR for choosing PostgreSQL over MongoDB for a new user account service.”

Both produced structurally correct ADRs. The quality difference was in the Consequences section:

Claude’s Consequences section:

## Consequences

**Positive:**
- ACID transactions allow atomic user creation + role assignment without compensating transactions
- Existing team expertise reduces onboarding time
- pg_trgm extension handles username search without Elasticsearch dependency
- Row-level security simplifies multi-tenant data isolation

**Negative:**
- Schema migrations require coordination — no schema-less flexibility for evolving user attributes

**Review trigger:** Re-evaluate if user attribute schema changes require more than 2 migrations per quarter

ChatGPT’s Consequences section:

## Consequences

PostgreSQL will provide strong consistency and ACID compliance. The team will need to manage
schema migrations but benefits from their existing knowledge.

Claude’s version is specific enough to serve as a reference six months from now. ChatGPT’s version would not.

Task 4: README for an Open-Source Library

Prompt: “Write a README for a Python library called ‘fastqueue’ that provides a decorator-based task queue backed by Redis.”

Both produced readable READMEs. Claude’s differentiators:

ChatGPT’s README was shorter but complete — good for most purposes. Claude’s was noticeably more thorough in the configuration documentation.

Task 5: Post-Mortem Reports

Prompt: “Write a post-mortem for a 45-minute outage caused by a misconfigured Redis maxmemory-policy that evicted session keys, logging out all users.”

Claude’s post-mortem:

ChatGPT’s post-mortem was structurally correct and professionally written but shorter, with less specific action items.

Overall Assessment

Task Claude ChatGPT
API Documentation More detailed, includes all error cases Accurate but less thorough
Runbooks Executable without lookup Requires additional research
ADRs Specific consequences section Generic consequences
README files More complete, better structure Good for most use cases
Post-mortems Concrete details, clear owners Correct but shorter

Claude is the better default for technical writing where the output will be used by others. Its tendency toward specificity and thoroughness reduces the “write then edit” cycle.

ChatGPT (GPT-4o) produces faster first drafts that are often good enough — useful for casual technical explanations, Slack messages, or PR descriptions with a slightly more natural tone.

For shared documentation — runbooks, ADRs, public READMEs — Claude produces work that requires fewer follow-up edits.

Practical CLI Integration

Both models are accessible via API for programmatic doc generation:

Claude API (Python):

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...")

# Generate API docs from code
response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": f"""
Generate API documentation from this FastAPI code:

{open('app.py').read()}

Include: endpoint URL, request/response schemas, error codes, examples.
Use markdown format suitable for a public docs site.
"""
        }
    ]
)

print(response.content[0].text)

ChatGPT API (Python):

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=2048,
    messages=[
        {
            "role": "system",
            "content": "You are a technical writer specializing in API documentation."
        },
        {
            "role": "user",
            "content": f"Write API docs for this endpoint:\n\n{endpoint_code}"
        }
    ]
)

print(response.choices[0].message.content)

Batch Documentation Generation

For generating docs for 50+ endpoints:

import anthropic
import json

client = anthropic.Anthropic()

endpoints = [
    {"path": "/users/{id}", "method": "GET", "code": "..."},
    {"path": "/users", "method": "POST", "code": "..."},
    # ... 48 more
]

docs = {}
for endpoint in endpoints:
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"""
Document this endpoint for our API reference:
Path: {endpoint['path']}
Method: {endpoint['method']}

Code:
{endpoint['code']}

Return only the markdown for this endpoint's section.
"""
        }]
    )

    docs[endpoint['path']] = response.content[0].text

# Write to file
with open("api_docs.md", "w") as f:
    f.write("# API Reference\n\n")
    for path, doc in docs.items():
        f.write(f"{doc}\n\n")

Cost: 50 endpoints × ~300 tokens = $0.30 with Claude, $0.45 with ChatGPT.

Writing Consistency Checks

Both models can be used to audit existing documentation:

# Check if runbook is executable (passes Claude review)
import anthropic

client = anthropic.Anthropic()

runbook = open("database_failover_runbook.md").read()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": f"""
Review this runbook for:
1. Missing prerequisite steps
2. Commands that won't work (syntax errors)
3. Unclear instructions that require research
4. Missing error handling

Runbook:
{runbook}

List issues by severity.
"""
    }]
)

issues = response.content[0].text
print(issues)

Claude catches missing details ChatGPT would skip. Sample output:

[CRITICAL] Step 3: "Check the primary" — which primary? Needs hostname or instruction to find it.

[HIGH] Step 5: pg_ctl command uses -D flag but doesn't specify path — will fail.

[MEDIUM] Step 8: "Verify sync" without a query — leaves operator guessing which table.

Task-Specific Performance

Generated 10 samples of each type, measured edit cycles needed before usable output:

Task Type Claude Edits ChatGPT Edits
API docs (simple) 0.8 1.2
API docs (with auth/webhooks) 1.1 1.8
Runbooks (database) 0.9 1.6
Runbooks (app deployments) 1.0 1.4
ADRs 0.7 1.1
READMEs 0.6 0.9
Post-mortems 1.2 1.5

Claude averages 0.9 edits before “ready to ship.” ChatGPT averages 1.4. This 35-40% reduction in revision cycles matters at scale — 50 docs × 0.5 edits saved = hours saved.

Model-Specific Tips

For Claude:

For ChatGPT:

Real-World Workflow

Most teams benefit from a hybrid approach:

Draft phase: ChatGPT (faster, good enough for initial outline)
    ↓
Detail phase: Claude (fills in specifics, catches gaps)
    ↓
Review: Human (5-10 min) vs. 30 min with either model alone

For a 10-endpoint API:

Hybrid approach saves 15 minutes per documentation sprint while keeping first-draft speed.

Built by theluckystrike — More at zovo.one