Claude API tool use (function calling) is billed as standard API token usage with no separate pricing category – tool definitions count as input tokens, tool call blocks count as output tokens, and tool results count as input tokens on the next turn. For example, a typical multi-tool conversation on Sonnet 4 costs roughly $0.008, with costs accumulating across all token exchanges during the interaction. Understanding these mechanics helps you estimate costs accurately and design efficient tool integrations.

How Tool Use Works in Claude API

Tool use (also called function calling) allows Claude to request that your application execute specific functions and return the results back to the model. This creates a powerful loop where the AI can gather real-time information, perform calculations, or manipulate external data.

When you define tools in your API request, Claude can decide to call them based on the user’s query. The model doesn’t execute functions directly—it generates a structured response indicating which tool to call and with what parameters. Your application then executes the function and feeds the results back to continue the conversation.

Here’s a practical example showing how to define tools and handle the response:

import anthropic

client = anthropic.Anthropic(api_key="your-api-key")

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
]

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{
        "role": "user",
        "content": "What's the weather like in Tokyo?"
    }]
)

# Check if the model wants to call a tool
for block in message.content:
    if block.type == "tool_use":
        tool_name = block.name
        tool_input = block.input
        tool_id = block.id
        print(f"Tool called: {tool_name}")
        print(f"Input: {tool_input}")

Token Counting for Tool Use

Token counting with tool use follows the same principles as standard API calls, but there are specific considerations for each phase of the interaction. Understanding these details helps you estimate costs accurately.

Input Tokens

Input tokens include your system prompt, the conversation history, tool definitions, and any context you provide. Tool definitions are particularly important—they consume tokens based on their complexity. A simple function with few parameters uses fewer tokens than a complex one with detailed descriptions and nested schemas.

Each tool definition contributes to your input token count. If you define multiple tools, weigh the cost benefit of including them versus the risk of the model selecting the wrong tool.

Output Tokens

Output tokens include the model’s responses, including tool call requests. When Claude generates a tool call, the entire tool_use block counts toward your output token limit. This includes the tool name, the input parameters, and any reasoning the model includes.

After you execute a tool and return results, those results become part of the input for the next turn. This means conversation history grows with each tool interaction, and your input costs accumulate over extended conversations.

Tool Result Tokens

When you return tool results to the model, those contents count as input tokens in the subsequent request. The size depends on how much data your function returns. A simple boolean or string uses minimal tokens, while returning large JSON objects or API responses can significantly increase your token usage.

For cost-effective implementations, return only the necessary data from your tools rather than dumping entire API responses:

# Inefficient - returns too much data
def get_weather(location):
    api_response = weather_api.get_full_forecast(location)
    return api_response  # May include metadata, timestamps, etc.

# Efficient - returns only what's needed
def get_weather(location):
    api_response = weather_api.get_full_forecast(location)
    return {
        "temperature": api_response.current.temp,
        "conditions": api_response.current.conditions,
        "humidity": api_response.current.humidity
    }

Pricing Structure

Anthropic’s API pricing follows a per-token model. As of 2026, the pricing tiers differ by model:

Model Input (per 1M tokens) Output (per 1M tokens)
Claude Opus 4 $15.00 $75.00
Claude Sonnet 4 $3.00 $15.00
Claude Haiku 4 $0.80 $4.00

Tool use doesn’t have a separate pricing category—it’s billed as regular API usage based on input and output tokens. A conversation with multiple tool calls accumulates costs across all the tokens exchanged during the entire interaction.

Estimating Tool Use Costs

To estimate costs for a typical tool-using conversation, consider this scenario:

  1. Initial request: 100 tokens (user message) + 200 tokens (tool definitions) = 300 input
  2. First tool call: 50 output tokens (tool_use block)
  3. Tool result: 150 input tokens (returned data)
  4. Model response: 80 output tokens (text response)
  5. Second tool call: 45 output tokens
  6. Second result: 100 input tokens
  7. Final response: 60 output tokens

Total: 700 input tokens + 235 output tokens at Sonnet pricing = approximately $0.0077 per conversation.

Best Practices for Cost Management

Implementing tool use efficiently requires thoughtful design. Here are practical strategies to optimize your implementation:

Define precise tool descriptions. The model uses your description to decide whether to call a tool. Vague or overly broad descriptions can lead to unnecessary tool calls, increasing costs without improving results.

Limit tool parameters. Only request the parameters you actually need. Extra fields that go unused still contribute to token counts in both definitions and outputs.

Cache tool definitions when possible. If you’re using the same tools across many requests, consider implementing caching to avoid redefining them each time (though this depends on your specific implementation architecture).

Truncate tool results strategically. After executing a tool, you can truncate or summarize the results before returning them to the model if you don’t need every detail. This reduces input tokens for subsequent requests.

Common Implementation Patterns

For developers building production systems, certain patterns emerge as most effective:

Sequential tool calls work well when each tool result feeds into the next decision. The model calls a tool, receives results, then decides whether to call another tool or provide a final response.

Parallel tool calls are possible when the model determines multiple tools can run independently. This reduces round trips but requires your application to handle multiple simultaneous tool executions.

Error handling is critical. When a tool fails, return a clear error message to the model so it can attempt an alternative approach rather than repeating the failed call.

def execute_tool(tool_name, tool_input):
    try:
        if tool_name == "get_weather":
            return get_weather(tool_input["location"], tool_input.get("unit"))
        elif tool_name == "search_database":
            return search_db(tool_input["query"])
        else:
            return {"error": f"Unknown tool: {tool_name}"}
    except Exception as e:
        return {"error": str(e)}

Understanding these mechanics enables you to build robust applications that leverage Claude’s tool use capabilities while managing costs effectively. The key is designing clean, focused tools that return only necessary data, and structuring your implementation to minimize unnecessary token accumulation across conversation turns.

Built by theluckystrike — More at zovo.one