API pricing determines whether an AI feature is viable at scale. A prompt that costs $0.002 in testing can reach $2,000/day at production volume. This breakdown covers current Claude and OpenAI API pricing across every tier, batch processing discounts, and a cost calculator for common workload patterns.
Current Pricing (March 2026)
Anthropic Claude Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context |
|---|---|---|---|
| Claude Opus 4.6 | $15.00 | $75.00 | 200K |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K |
| Claude Haiku 3.5 | $0.80 | $4.00 | 200K |
| Claude Haiku 3 | $0.25 | $1.25 | 200K |
OpenAI Models
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K |
| GPT-4o mini | $0.15 | $0.60 | 128K |
| o3 | $10.00 | $40.00 | 200K |
| o4-mini | $1.10 | $4.40 | 200K |
Prices are per million tokens. Check official pricing pages before committing budgets — both providers adjust pricing regularly.
Batch Processing Discounts
Both providers offer ~50% discounts for asynchronous batch jobs. Batches process within 24 hours, making them suitable for document processing, evals, and offline enrichment.
| Provider | Batch Discount | Turnaround |
|---|---|---|
| Anthropic Message Batches | 50% off | Up to 24 hours |
| OpenAI Batch API | 50% off | Up to 24 hours |
import anthropic
import json
client = anthropic.Anthropic()
requests = []
for i, doc in enumerate(documents):
requests.append({
"custom_id": f"doc-{i}",
"params": {
"model": "claude-haiku-3-5",
"max_tokens": 512,
"messages": [
{"role": "user", "content": f"Summarize this in 3 sentences:\n\n{doc}"}
],
},
})
batch = client.beta.messages.batches.create(requests=requests)
print(f"Batch ID: {batch.id}")
For 1,000 documents averaging 800 input tokens and 150 output tokens each:
- Claude Haiku 3.5 real-time: (800K x $0.80) + (150K x $4.00) = $1.24
- Claude Haiku 3.5 batch: $0.62
- GPT-4o mini real-time: (800K x $0.15) + (150K x $0.60) = $0.21
- GPT-4o mini batch: $0.105
GPT-4o mini wins on raw cost for short-context tasks. Claude Haiku 3.5 is more competitive when you need 200K context.
Context Window Cost Impact
Long context is where Claude’s pricing advantage becomes meaningful. A 100K-token document analysis:
def estimate_cost(input_tokens, output_tokens, model):
pricing = {
"claude-haiku-3-5": (0.80, 4.00),
"claude-sonnet-4-5": (3.00, 15.00),
"gpt-4o": (2.50, 10.00),
"gpt-4o-mini": (0.15, 0.60),
}
input_rate, output_rate = pricing[model]
input_cost = (input_tokens / 1_000_000) * input_rate
output_cost = (output_tokens / 1_000_000) * output_rate
return input_cost + output_cost
for model in ["claude-haiku-3-5", "claude-sonnet-4-5", "gpt-4o", "gpt-4o-mini"]:
cost = estimate_cost(100_000, 500, model)
print(f"{model}: ${cost:.4f}")
Output:
claude-haiku-3-5: $0.0820
claude-sonnet-4-5: $0.3075
gpt-4o: $0.2550
gpt-4o-mini: $0.0153
GPT-4o mini is cheaper for large-context tasks if quality holds up. For complex reasoning over long documents, the quality gap between Haiku and GPT-4o mini matters — benchmark on your actual task.
Real Workload Cost Analysis
Scenario 1: Code Review Bot (50 PRs/day)
- Average PR: 3,000 tokens in, 800 tokens out
| Model | Daily Cost | Monthly Cost |
|---|---|---|
| Claude Haiku 3.5 | $0.28 | $8.40 |
| GPT-4o mini | $0.047 | $1.41 |
| Claude Sonnet 4.5 | $0.60 | $18.00 |
| GPT-4o | $0.78 | $23.40 |
Scenario 2: Customer Support Chatbot (500 conversations/day)
- Average conversation: 2,000 tokens in, 400 tokens out
| Model | Daily Cost | Monthly Cost |
|---|---|---|
| Claude Haiku 3.5 | $1.60 | $48.00 |
| GPT-4o mini | $0.27 | $8.10 |
| Claude Sonnet 4.5 | $6.00 | $180.00 |
| GPT-4o | $4.50 | $135.00 |
Scenario 3: Document Intelligence (10K docs/month, 50K tokens each)
This is where Claude’s 200K context becomes necessary.
| Model | Batch Monthly Cost |
|---|---|
| Claude Haiku 3.5 batch | $210 |
| Claude Sonnet 4.5 batch | $765 |
| GPT-4o batch | $627.50 |
| GPT-4o mini batch | $38.25 |
For document-scale work where 128K context is enough, GPT-4o mini batch is the cost leader. When documents exceed 128K tokens, Claude is the only option without chunking.
Prompt Caching
Both providers offer caching for repeated prompt prefixes.
| Provider | Cache Write Cost | Cache Read Cost | Min Cache Size |
|---|---|---|---|
| Anthropic | 125% of normal input rate | 10% of normal input rate | 1,024 tokens |
| OpenAI | Normal input rate | 50% of normal input rate | 1,024 tokens |
Anthropic’s cache reads are cheaper (10% vs 50%), but cache writes cost 25% more. If your system prompt is large and reused heavily, Anthropic caching pays off faster.
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1024,
system=[
{
"type": "text",
"text": large_system_prompt,
"cache_control": {"type": "ephemeral"},
}
],
messages=[{"role": "user", "content": user_message}],
)
print(response.usage.cache_read_input_tokens)
print(response.usage.cache_creation_input_tokens)
When to Choose Each Provider
Choose Claude (Anthropic) when:
- Documents exceed 128K tokens — 200K context is essential
- You need highest reasoning quality (Opus 4.6 leads benchmarks)
- Batch processing large document sets where cache reads dominate
Choose OpenAI when:
- Cost is the primary constraint and GPT-4o mini quality is sufficient
- You need the widest ecosystem (embeddings, fine-tuning, Whisper, DALL-E in one API)
- Existing integrations depend on the OpenAI SDK
Detailed Cost Calculator
Calculate your actual costs using real workload metrics:
def calculate_monthly_cost(
daily_requests: int,
avg_input_tokens: int,
avg_output_tokens: int,
model: str,
batch_percentage: float = 0.0, # 0-1, % of requests using batch API
):
"""Calculate monthly LLM API cost."""
pricing = {
"claude-haiku-3-5": (0.80, 4.00),
"claude-sonnet-4-5": (3.00, 15.00),
"claude-opus-4-6": (15.00, 75.00),
"gpt-4o": (2.50, 10.00),
"gpt-4o-mini": (0.15, 0.60),
}
input_rate, output_rate = pricing[model]
daily_input_cost = (daily_requests * avg_input_tokens / 1_000_000) * input_rate
daily_output_cost = (daily_requests * avg_output_tokens / 1_000_000) * output_rate
daily_cost = daily_input_cost + daily_output_cost
# Apply batch discount (50% off)
batch_cost = daily_cost * batch_percentage * 0.5
real_time_cost = daily_cost * (1 - batch_percentage)
daily_total = batch_cost + real_time_cost
monthly_cost = daily_total * 30
return monthly_cost
# Example: Code review bot processing 100 PRs/day, 2000 in/800 out tokens
cost_claude = calculate_monthly_cost(
daily_requests=100,
avg_input_tokens=2000,
avg_output_tokens=800,
model="claude-haiku-3-5",
batch_percentage=0.0 # Real-time reviews
)
cost_openai = calculate_monthly_cost(
daily_requests=100,
avg_input_tokens=2000,
avg_output_tokens=800,
model="gpt-4o-mini",
batch_percentage=0.0
)
print(f"Claude Haiku: ${cost_claude:.2f}/month")
print(f"GPT-4o mini: ${cost_openai:.2f}/month")
# Output:
# Claude Haiku: $9.60/month
# GPT-4o mini: $1.62/month
For this workload, GPT-4o mini is cheaper by 6x. But if you need larger context or better reasoning, Claude Sonnet:
cost_sonnet = calculate_monthly_cost(100, 2000, 800, "claude-sonnet-4-5")
# Output: $36.00/month
# Still higher than GPT-4o mini, but includes 200K context vs 128K
Hidden Costs and Gotchas
Cost Surprise 1: Context Window Pricing
If you use the full context window, you pay for all of it:
# Using 200K of 200K context (worst case)
def full_context_cost(model: str) -> float:
full_context_tokens = 200_000 # Claude's max
pricing = {
"claude-haiku-3-5": 0.80,
"claude-sonnet-4-5": 3.00,
}
return (full_context_tokens / 1_000_000) * pricing[model]
# Cost of one API call using full context
print(f"Full Claude Sonnet context: ${full_context_cost('claude-sonnet-4-5'):.4f}")
# Output: $0.0600 (6 cents for just the input!)
# Add 1000 output tokens
output_cost = (1000 / 1_000_000) * 15.00
print(f"Plus 1000 output tokens: ${output_cost:.6f}")
# Total: ~$0.061 per request
# At 10 requests/day: $18.30/month just for context
Don’t use full context unnecessarily. Summarize or chunk large documents.
Cost Surprise 2: Retry Logic and Fallbacks
Production systems retry failed requests. This multiplies costs:
# Simple retry logic costs you 2-3x
def robust_api_call(prompt, max_retries=3):
for attempt in range(max_retries):
try:
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=1000,
messages=[{"role": "user", "content": prompt}],
)
return response
except RateLimitError:
time.sleep(2 ** attempt) # Exponential backoff
# If 5% of requests fail and require retry:
# Cost increases by ~5% (one retry per failed request)
# If you retry all requests without circuit breaking:
# Cost can triple
Use exponential backoff and circuit breakers to avoid cascading retries.
Cost Surprise 3: Token Counting Mismatches
Your estimated tokens and actual billed tokens may differ:
import anthropic
client = anthropic.Anthropic()
# Estimate before calling
def estimate_tokens(prompt: str) -> int:
response = client.messages.count_tokens(
model="claude-haiku-3-5",
messages=[{"role": "user", "content": prompt}],
)
return response.input_tokens
prompt = "Summarize this document..." + ("x" * 5000)
estimated = estimate_tokens(prompt)
# Make actual call
response = client.messages.create(
model="claude-haiku-3-5",
max_tokens=500,
messages=[{"role": "user", "content": prompt}],
)
actual_in = response.usage.input_tokens
actual_out = response.usage.output_tokens
print(f"Estimated input: {estimated}")
print(f"Actual input: {actual_in}")
print(f"Actual output: {actual_out}")
# Differences:
# - Tokenization varies slightly between estimate and actual
# - Output tokens are always a surprise (max_tokens != actual output)
Always check response.usage to see actual costs, not estimates.
ROI Analysis: When API Costs Pay for Themselves
For a typical team:
def cost_benefit(
salary_per_hour: float, # Average engineer salary
time_saved_per_request: float, # Minutes saved per API call
api_requests_per_day: int,
):
"""Calculate ROI of using AI APIs."""
salary_per_minute = salary_per_hour / 60
daily_saved_minutes = time_saved_per_request * api_requests_per_day
daily_saved_dollars = daily_saved_minutes * salary_per_minute
monthly_saved = daily_saved_dollars * 22 # Working days
return monthly_saved
# Example: $100/hour engineer, 5 minutes saved per PR review, 20 PRs/day
# Using Claude Haiku at ~$10/month
saved = cost_benefit(100, 5, 20)
print(f"Monthly productivity gain: ${saved:.2f}")
print(f"API cost: $10/month")
print(f"ROI: {saved / 10:.1f}x")
# Output:
# Monthly productivity gain: $1833.33
# API cost: $10/month
# ROI: 183.3x
Even expensive API calls (GPT-4o at $0.01/call) pay for themselves when they save 5+ minutes of engineering time.
Related Articles
- ChatGPT API Assistants API Pricing Threads and Runs Cost
- Claude API Tool Use Function Calling Pricing How Tokens Are
- Claude Sonnet vs Opus API Pricing Difference Worth It
- Gemini Code Assist Enterprise Pricing Per Developer
- ChatGPT API Token Pricing Calculator How to Estimate Monthly
Built by theluckystrike — More at zovo.one