AI Tools Compared

Technical documentation has a quality ceiling that most AI tools don’t reach: accurate code examples, the right level of detail for the audience, and explanations that don’t talk down to experienced engineers. This guide compares Claude and GPT-4 across four documentation types that actually get written in engineering teams.

PYM-001**: Payment Service P95 Latency > 2s

Severity: P2 — escalate to P1 if >5s or error rate >1% On-call contact: #payments-oncall

1.

Test 1: API Reference from Code

Both models were given the same Python function with a docstring stub and asked to generate API reference documentation.

def create_payment_intent(
    amount: int,
    currency: str,
    customer_id: str | None = None,
    metadata: dict | None = None,
    idempotency_key: str | None = None
) -> PaymentIntent:
    """
    Creates a payment intent for a given amount.
    """

GPT-4’s output:

## create_payment_intent

Creates a payment intent for a given amount.

### Parameters

- `amount` (int): The amount to charge.
- `currency` (str): The currency code.
- `customer_id` (str, optional): The customer ID.
- `metadata` (dict, optional): Metadata.
- `idempotency_key` (str, optional): Idempotency key.

### Returns

`PaymentIntent`: The created payment intent.

Accurate but generic. No constraints on amount, no explanation of when to use idempotency_key, no error scenarios.

Claude’s output:

## create_payment_intent

Creates a payment intent representing a pending charge that can be confirmed
or cancelled. Payment intents are the recommended way to handle complex
payment flows, including 3D Secure authentication.

### Parameters

| Parameter | Type | Required | Description |
|---|---|---|---|
| `amount` | `int` | Yes | Amount in the **smallest currency unit** (cents for USD). Minimum: 50. |
| `currency` | `str` | Yes | ISO 4217 currency code (e.g., `"usd"`, `"eur"`). Lowercase. |
| `customer_id` | `str \| None` | No | Customer ID from `create_customer()`. Attach to track payment history. |
| `metadata` | `dict \| None` | No | Up to 50 key-value pairs. Keys max 40 chars, values max 500 chars. |
| `idempotency_key` | `str \| None` | No | UUID to safely retry on network failure without double-charging. |

### Returns

`PaymentIntent` — See [PaymentIntent object reference](#paymentintent-object).

### Raises

- `InvalidAmountError`: `amount` is below the minimum (50 cents) or exceeds the maximum.
- `CurrencyNotSupportedError`: `currency` is not in the supported list.
- `CustomerNotFoundError`: `customer_id` does not exist.

### Example

```python
import uuid
from payments import create_payment_intent

# Charge $29.99 USD, safe to retry
intent = create_payment_intent(
 amount=2999, # $29.99 in cents
 currency="usd",
 customer_id="cus_abc123",
 metadata={"order_id": "ord_xyz"},
 idempotency_key=str(uuid.uuid4())
)
print(f"Intent created: {intent.id}, status: {intent.status}")

Notes

Claude inferred the amount is in cents, provided a realistic example with uuid4() for idempotency, and added error types that make logical sense even without seeing the implementation.

Test 2: Runbook for On-Call Engineers

Prompt: “Write a runbook section for: payment service P95 latency exceeds 2 seconds.”

Both models produced runbook entries, but the structure differed significantly.

GPT-4’s runbook:

## Payment Service High Latency

If P95 latency > 2 seconds:

1. Check the payment service dashboard
2. Look for errors in the logs
3. Check database connections
4. Restart the service if needed
5. Escalate if issue persists

Generic enough to apply to any service. No specific commands, no metrics to check.

Claude’s runbook:

## PYM-001: Payment Service P95 Latency > 2s

**Severity**: P2 — escalate to P1 if >5s or error rate >1%
**On-call contact**: #payments-oncall

### 1. Triage (2 minutes)

```bash
# Current P95 across all instances
kubectl exec -n payments deploy/payment-svc -- \
 curl -s localhost:9090/metrics | grep http_request_duration_p95

# Error rate trend
kubectl logs -n payments -l app=payment-svc --since=10m | \
 grep '"level":"error"' | wc -l

Check: Payments Latency Dashboard

2. Identify the Layer (5 minutes)

Database latency?

kubectl exec -n payments deploy/payment-svc -- \
 curl -s localhost:9090/metrics | grep db_query_duration_p95

If db_query_duration_p95 > 1.5s → go to [DB Runbook PYM-DB-001]

External API (Stripe) latency?

kubectl exec -n payments deploy/payment-svc -- \
 curl -s localhost:9090/metrics | grep stripe_api_duration_p95

If stripe_api_duration_p95 > 500ms → check Stripe Status Page

Pod resource saturation?

kubectl top pods -n payments -l app=payment-svc

If CPU > 80% or memory > 85% → scale horizontally:

kubectl scale deploy/payment-svc -n payments --replicas=6

3. Known Causes

Symptom Likely Cause Fix
Slow around :00/:30 minute Cron job running reports Move cron to off-peak
Latency after deploy N+1 query in new code Roll back, see PYM-002
Gradual increase over hours Connection pool exhaustion Restart payment-svc

4. Escalation

Escalate to #payments-eng if:

Claude’s runbook is immediately actionable with real commands and thresholds. GPT-4’s requires a senior engineer to translate into actual steps.

Test 3: Tutorial Writing

Prompt: “Write an introduction section for a tutorial on setting up Redis caching in a FastAPI app.”

Both models write clear prose. The difference is in code example quality and conceptual accuracy.

GPT-4 introduction:

In this tutorial, we’ll learn how to add Redis caching to a FastAPI application. Redis is an in-memory database that can significantly speed up your app. We’ll cover installation, configuration, and implementing a simple cache.

Claude introduction:

FastAPI’s async architecture makes it a natural fit for Redis — both are non-blocking by design. Adding a cache layer with redis-py’s async client reduces database load for read-heavy endpoints without changing your API surface.

By the end of this tutorial, you’ll have a working cache-aside pattern: on cache miss, fetch from the database and write to Redis with a TTL; on cache hit, return the cached value and skip the database entirely.

What you’ll build:

Prerequisites: FastAPI app running, Redis 7+ available (docker run -p 6379:6379 redis:7)

Claude explains why Redis fits FastAPI (async alignment), describes the cache-aside pattern before showing code, and gives a concrete prerequisite with a runnable Docker command. GPT-4 uses generic tutorial phrasing.

Test 4: README Generation from Code

Claude consistently produces better READMEs by inferring purpose and usage from code structure. Given a Python CLI tool with argparse, Claude generates:

GPT-4 produces accurate but generic README templates that require significant editing.

Side-by-Side Score

Documentation Type Claude GPT-4
API reference Parameter constraints, errors, examples Accurate but generic
Runbooks Specific commands and thresholds Needs translation to commands
Tutorials Pattern explanation + code Clear but surface-level
README generation Infers intent from code Generic template
Speed Slower for long docs Faster
Token efficiency Longer output More concise

When to Use GPT-4 for Docs

GPT-4 is better when you need:

Claude is better for: