Claude vs GPT-4 for Writing Documentation

Technical documentation has a quality ceiling that most AI tools don’t reach: accurate code examples, the right level of detail for the audience, and explanations that don’t talk down to experienced engineers. This guide compares Claude and GPT-4 across four documentation types that actually get written in engineering teams.

PYM-001**: Payment Service P95 Latency > 2s

Severity: P2 — escalate to P1 if >5s or error rate >1% On-call contact: #payments-oncall

1.

No constraints on amount: no explanation of when to use idempotency_key, no error scenarios.
Restart the service if: needed 5.
GPT-4’s requires a senior: engineer to translate into actual steps.

Test 1: API Reference from Code

Both models were given the same Python function with a docstring stub and asked to generate API reference documentation.

def create_payment_intent(
    amount: int,
    currency: str,
    customer_id: str | None = None,
    metadata: dict | None = None,
    idempotency_key: str | None = None
) -> PaymentIntent:
    """
    Creates a payment intent for a given amount.
    """

GPT-4’s output:

## create_payment_intent

Creates a payment intent for a given amount.

### Parameters

- `amount` (int): The amount to charge.
- `currency` (str): The currency code.
- `customer_id` (str, optional): The customer ID.
- `metadata` (dict, optional): Metadata.
- `idempotency_key` (str, optional): Idempotency key.

### Returns

`PaymentIntent`: The created payment intent.

Accurate but generic. No constraints on amount, no explanation of when to use idempotency_key, no error scenarios.

Claude’s output:

## create_payment_intent

Creates a payment intent representing a pending charge that can be confirmed
or cancelled. Payment intents are the recommended way to handle complex
payment flows, including 3D Secure authentication.

### Parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `amount` | `int` | Yes | Amount in the **smallest currency unit** (cents for USD). Minimum: 50. |
| `currency` | `str` | Yes | ISO 4217 currency code (e.g., `"usd"`, `"eur"`). Lowercase. |
| `customer_id` | `str \| None` | No | Customer ID from `create_customer()`. Attach to track payment history. |
| `metadata` | `dict \| None` | No | Up to 50 key-value pairs. Keys max 40 chars, values max 500 chars. |
| `idempotency_key` | `str \| None` | No | UUID to safely retry on network failure without double-charging. |

### Returns

`PaymentIntent` — See [PaymentIntent object reference](#paymentintent-object).

### Raises

- `InvalidAmountError`: `amount` is below the minimum (50 cents) or exceeds the maximum.
- `CurrencyNotSupportedError`: `currency` is not in the supported list.
- `CustomerNotFoundError`: `customer_id` does not exist.

### Example

```python
import uuid
from payments import create_payment_intent

# Charge $29.99 USD, safe to retry
intent = create_payment_intent(
 amount=2999, # $29.99 in cents
 currency="usd",
 customer_id="cus_abc123",
 metadata={"order_id": "ord_xyz"},
 idempotency_key=str(uuid.uuid4())
)
print(f"Intent created: {intent.id}, status: {intent.status}")

Notes

Payment intents expire after 24 hours if not confirmed.
For recurring charges, use create_subscription() instead. ```

Claude inferred the amount is in cents, provided a realistic example with uuid4() for idempotency, and added error types that make logical sense even without seeing the implementation.

Test 2: Runbook for On-Call Engineers

Prompt: “Write a runbook section for: payment service P95 latency exceeds 2 seconds.”

Both models produced runbook entries, but the structure differed significantly.

GPT-4’s runbook:

## Payment Service High Latency

If P95 latency > 2 seconds:

Check the payment service dashboard
Look for errors in the logs
Check database connections
Restart the service if needed
Escalate if issue persists

Generic enough to apply to any service. No specific commands, no metrics to check.

Claude’s runbook:

## PYM-001: Payment Service P95 Latency > 2s

**Severity**: P2 — escalate to P1 if >5s or error rate >1%
**On-call contact**: #payments-oncall

### 1. Triage (2 minutes)

```bash
# Current P95 across all instances
kubectl exec -n payments deploy/payment-svc -- \
 curl -s localhost:9090/metrics | grep http_request_duration_p95

# Error rate trend
kubectl logs -n payments -l app=payment-svc --since=10m | \
 grep '"level":"error"' | wc -l

Check: Payments Latency Dashboard

2. Identify the Layer (5 minutes)

Database latency?

kubectl exec -n payments deploy/payment-svc -- \
 curl -s localhost:9090/metrics | grep db_query_duration_p95

If db_query_duration_p95 > 1.5s → go to [DB Runbook PYM-DB-001]

External API (Stripe) latency?

kubectl exec -n payments deploy/payment-svc -- \
 curl -s localhost:9090/metrics | grep stripe_api_duration_p95

If stripe_api_duration_p95 > 500ms → check Stripe Status Page

Pod resource saturation?

kubectl top pods -n payments -l app=payment-svc

If CPU > 80% or memory > 85% → scale horizontally:

kubectl scale deploy/payment-svc -n payments --replicas=6

3. Known Causes

Symptom	Likely Cause	Fix
Slow around :00/:30 minute	Cron job running reports	Move cron to off-peak
Latency after deploy	N+1 query in new code	Roll back, see PYM-002
Gradual increase over hours	Connection pool exhaustion	Restart payment-svc

4. Escalation

Escalate to #payments-eng if:

Issue persists >15 minutes after mitigation
P95 > 5s
Transaction failure rate > 0.5% ```

Claude’s runbook is immediately actionable with real commands and thresholds. GPT-4’s requires a senior engineer to translate into actual steps.

Test 3: Tutorial Writing

Prompt: “Write an introduction section for a tutorial on setting up Redis caching in a FastAPI app.”

Both models write clear prose. The difference is in code example quality and conceptual accuracy.

GPT-4 introduction:

In this tutorial, we’ll learn how to add Redis caching to a FastAPI application. Redis is an in-memory database that can significantly speed up your app. We’ll cover installation, configuration, and implementing a simple cache.

Claude introduction:

FastAPI’s async architecture makes it a natural fit for Redis — both are non-blocking by design. Adding a cache layer with redis-py’s async client reduces database load for read-heavy endpoints without changing your API surface.

By the end of this tutorial, you’ll have a working cache-aside pattern: on cache miss, fetch from the database and write to Redis with a TTL; on cache hit, return the cached value and skip the database entirely.

What you’ll build:

A CacheManager class wrapping redis.asyncio

A @cached decorator for endpoint-level caching

Cache invalidation when data changes

Prerequisites: FastAPI app running, Redis 7+ available (docker run -p 6379:6379 redis:7)

Claude explains why Redis fits FastAPI (async alignment), describes the cache-aside pattern before showing code, and gives a concrete prerequisite with a runnable Docker command. GPT-4 uses generic tutorial phrasing.

Test 4: README Generation from Code

Claude consistently produces better READMEs by inferring purpose and usage from code structure. Given a Python CLI tool with argparse, Claude generates:

Accurate installation instructions based on the package structure
Usage examples that match the actual argument names
A “Why this tool” section that captures the design intent

GPT-4 produces accurate but generic README templates that require significant editing.

Side-by-Side Score

Documentation Type	Claude	GPT-4
API reference	Parameter constraints, errors, examples	Accurate but generic
Runbooks	Specific commands and thresholds	Needs translation to commands
Tutorials	Pattern explanation + code	Clear but surface-level
README generation	Infers intent from code	Generic template
Speed	Slower for long docs	Faster
Token efficiency	Longer output	More concise

When to Use GPT-4 for Docs

GPT-4 is better when you need:

High volume, good-enough documentation
Translation or localization of existing docs
First draft that a human will heavily edit
Fast iteration on doc structure

Claude is better for:

Documentation that ships without human review
Runbooks and operational guides
API references from complex code
Explaining non-obvious design decisions