Claude API bills extended thinking tokens at the standard output token rate. When extended thinking is enabled, the model generates internal reasoning tokens that are invisible in the final response but count toward your output token total. This means your output token cost increases proportionally to the complexity of the reasoning task, even though only the final answer appears in the response. Below is a practical breakdown of the billing mechanism, cost implications, and strategies to optimize your spending.

What Is Extended Thinking in Claude API

Extended thinking is a feature that allows Claude models to engage in deeper reasoning before producing their final response. When enabled, the model breaks down complex problems, explores multiple approaches, and reasons through its solution before delivering the actual output. This results in more thoughtful, accurate responses for tasks that require complex reasoning, coding, or analysis.

The feature works by having the model generate internal reasoning tokens that are not visible in the final response but are still processed and billed as output tokens. These reasoning tokens represent the model’s “thought process” as it works through your request.

How Output Tokens Are Billed

When using the Claude API, you’re charged based on the number of tokens processed—both input tokens (what you send) and output tokens (what the model generates). Extended thinking specifically affects output token billing because the reasoning process generates additional tokens beyond the visible response.

Here’s the basic cost structure:

Extended thinking increases output tokens because the model generates reasoning tokens before producing its final answer. These reasoning tokens count toward your output token limit and are billed at the standard output token rate.

Current Pricing Structure

Claude API pricing varies by model. Here’s a general breakdown for the main models:

Model Input (per 1M tokens) Output (per 1M tokens)
Claude 3.5 Haiku $0.80 $4.00
Claude 3.5 Sonnet $3.00 $15.00
Claude 3.5 Opus $15.00 $75.00

When you enable extended thinking, the output tokens include both the visible response and the internal reasoning tokens. The exact number of reasoning tokens depends on the complexity of your request.

Practical Code Examples

Here’s how to enable extended thinking in your API calls:

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    extra_headers={
        "anthropic-beta": "extended-thinking-2025-05-01"
    },
    messages=[
        {"role": "user", "content": "Explain how quicksort works and implement it in Python"}
    ]
)

print(message.content)
print(f"Usage: {message.usage}")

In this example, message.usage will show both input and output tokens, including any reasoning tokens generated by extended thinking.

For Node.js applications, the equivalent code looks like this:

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const message = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 4096,
  extra_headers: {
    'anthropic-beta': 'extended-thinking-2025-05-01',
  },
  messages: [
    {
      role: 'user',
      content: 'Explain how quicksort works and implement it in Python'
    }
  ],
});

console.log(message.content);
console.log(message.usage);

Understanding Token Usage in Responses

To understand exactly how extended thinking affects your billing, examine the usage object in the API response:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    extra_headers={"anthropic-beta": "extended-thinking-2025-05-01"},
    messages=[{"role": "user", "content": "Solve this complex problem..."}]
)

# Usage breakdown
print(f"Input tokens: {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
# Extended thinking tokens are included in output_tokens

The output_tokens field includes both the visible response tokens and the reasoning tokens. You can estimate the reasoning token count by comparing output counts between requests with and without extended thinking for similar queries.

Cost Optimization Strategies

Managing costs with extended thinking requires careful consideration of when to use the feature:

Use extended thinking for:

Skip extended thinking for:

You can also control costs by setting appropriate max_tokens limits:

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,  # Limit output to control costs
    extra_headers={"anthropic-beta": "extended-thinking-2025-05-01"},
    messages=[{"role": "user", "content": "Your prompt here"}]
)

Monitoring Your Spending

Implement logging to track your extended thinking usage over time:

import time
import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

def track_request(prompt, enable_extended_thinking=True):
    headers = {}
    if enable_extended_thinking:
        headers["anthropic-beta"] = "extended-thinking-2025-05-01"
    
    start_time = time.time()
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        extra_headers=headers,
        messages=[{"role": "user", "content": prompt}]
    )
    duration = time.time() - start_time
    
    cost = (response.usage.input_tokens / 1_000_000 * 3.0) + \
           (response.usage.output_tokens / 1_000_000 * 15.0)
    
    print(f"Tokens: {response.usage.input_tokens} in / {response.usage.output_tokens} out")
    print(f"Cost: ${cost:.4f}")
    print(f"Duration: {duration:.2f}s")
    
    return response, cost

# Test with extended thinking
response, cost = track_request("Explain the time complexity of merge sort")

Conclusion

Extended thinking in Claude API provides deeper reasoning capabilities at the cost of additional output tokens. The feature is billed at standard output token rates, with reasoning tokens included in your total output count. For complex tasks requiring thorough reasoning, the additional cost often delivers significantly better results. For simpler tasks, you can disable the feature and save on token costs.

Monitor your usage patterns, set appropriate limits, and enable extended thinking selectively to balance cost and performance for your specific use case.

Built by theluckystrike — More at zovo.one