Claude Skill Token Usage Profiling and Optimization

Every token your Claude skill sends to the model costs money and affects response latency. As you build more complex skills—whether you’re working with the pdf skill for document processing, the tdd skill for test-driven development, or custom skills for your workflow—understanding token usage becomes essential for keeping costs down and responses fast. For techniques on actually shrinking your skill bodies, see Claude skill prompt compression techniques.

This guide covers practical methods for profiling token consumption in your skills and implementing optimization strategies that actually work.

Understanding Token Sources in Claude Skills

When you invoke a skill, tokens come from three main sources:

Skill body tokens: The content of your .md skill file, including the system prompt and any examples
Context tokens: Files, conversation history, and additional context you load
Output tokens: The model’s response

The skill body is often the easiest to measure and optimize. Let’s start there.

Profiling Your Skill’s Token Usage

Measuring Skill Body Tokens

The simplest way to measure your skill file’s token count is by using a simple script. Create a helper skill or use the bash skill to count tokens:

import tiktoken

def count_tokens(text, model="claude-3-5-sonnet-20241022"):
    encoding = tiktoken.get_encoding("claude")
    return len(encoding.encode(text))

with open("skills/your-skill.md", "r") as f:
    content = f.read()
    tokens = count_tokens(content)
    print(f"Skill body: {tokens} tokens")

This gives you a baseline. Most production skills should stay under 2000 tokens for the skill body alone. If you’re seeing 5000+ tokens, you have optimization opportunities.

Tracking Real-World Usage

For skills you actively use, enable token tracking in your Claude configuration. Add this to your ~/.claude/settings.json:

{
  "verbose": true,
  "showTokenCounts": true
}

This displays token usage after each response, showing you exactly how many tokens were sent and received. Use this data to identify which skills consume the most tokens during typical workflows.

Optimization Strategies That Work

Strategy 1: Trim Your System Prompts

The most effective optimization is reducing skill body tokens. Review your skill prompts for these common issues:

Remove redundant instructions: Claude already knows basic coding practices. Don’t restate fundamentals:

❌ AVOID:
"Write clean code with proper variable names. Use functions to organize logic.
Avoid global variables. Comment your code."

✅ PREFER:
"Follow the existing code style in src/. Add JSDoc comments to public functions."

Use implicit instructions: Instead of explicit rules, show examples:

❌ AVOID:
"When returning JSON, always include proper indentation of 2 spaces.
Never omit the closing brace. Always use double quotes for strings."

✅ PREFER:
"Return responses in this JSON format:
{
  \"status\": \"success\",  // operation result
  \"data\": {...}         // response payload
}"

Strategy 2: Lazy-Load Context

Skills like frontend-design or canvas-design often need reference files but don’t need everything upfront. Use conditional context loading:

## Context Loading

Only load these files when specifically requested:
- design-system.css (when working on UI components)
- assets/ directory (when asked about visuals)

Do not load large asset directories unless explicitly requested.

This prevents the skill from consuming tokens reading files it won’t need.

Strategy 3: Template Responses

For skills that produce structured output—like the xlsx skill for spreadsheets or the docx skill for documents—provide output templates directly in the skill:

## Output Format

Generate files matching these templates:

config.yaml:
```yaml
database:
  host: localhost
  port: 5432
  name: app_db

This approach reduces tokens spent on formatting instructions while ensuring consistent output.

Strategy 4: Split Large Skills

If a skill exceeds 3000 tokens, split it into related skills with a shared base. Create a core skill other skills reference:

skills/
├── _core.md          # Shared patterns and rules (~800 tokens)
├── pdf-edit.md       # Uses _core for editing (~400 tokens)
├── pdf-extract.md    # Uses _core for extraction (~400 tokens)

Reference the core in each skill’s prompt:

You follow the conventions in _core.md. Apply those patterns to PDF tasks.

This modular approach keeps individual skills lean while maintaining consistency.

Practical Example: Optimizing a TDD Skill

Consider a tdd skill that started at 4200 tokens. Here’s the optimization process:

Before optimization:

Full explanation of TDD principles: 800 tokens
Multiple examples of test patterns: 1200 tokens
Detailed rules for each language: 1500 tokens
General coding guidelines: 700 tokens

After optimization:

Reference to “TDD fundamentals” (user already knows): 0 tokens
One concrete example per major language (600 tokens)
Language-specific rules only when detected (400 tokens)
Shortened coding guidelines, focusing on project-specific: 300 tokens

Final token count: 1300 tokens (69% reduction)

The optimized skill:

Costs less per invocation
Responds faster due to shorter context
Still produces the same quality output because the removed content was redundant

Measuring Your Optimizations

After implementing changes, measure the impact:

Token delta: Compare token counts before and after
Response quality: Ensure output quality hasn’t degraded
Latency: Check if response time improved
Cost: Calculate savings at your usage volume

Create a simple tracking table:

Skill	Before	After	Savings	Quality Check
tdd	4200	1300	69%	✓ Same
pdf-extract	2800	1900	32%	✓ Same
frontend	5100	3400	33%	✓ Same

When to Optimize

Not every skill needs aggressive optimization. Optimize when:

You use the skill frequently (daily or more)
The skill exceeds 2500 tokens
You’re noticing latency issues
Token costs are impacting your budget

For skills you use occasionally, the optimization effort likely isn’t worth the time savings.

Building Token-Aware Skills

As you create new skills, keep token efficiency in mind from the start:

Write tight prompts: Every word should earn its place
Use examples sparingly: One good example beats three mediocre ones
Reference over repeat: Point to external docs rather than copying content
Test during development: Measure tokens as you iterate

Skills like supermemory for knowledge management benefit from this approach from day one.

Conclusion

Token optimization isn’t about stripping valuable content—it’s about removing redundancy and letting Claude’s base capabilities handle what they already know. The skills that perform best are often the leanest, because less context means faster responses and clearer guidance.

Start by measuring your current skills, then apply these strategies systematically. You’ll likely find that 30-50% token reduction is achievable without any loss in output quality.

Claude Code Token Usage Optimization Best Practices Guide — If you’re a user rather than a skill author, this covers session-level strategies: prompt structuring, context window management, and output constraints.
Claude Skill Prompt Compression Techniques — Apply compression techniques directly to the skill bodies you profile for fastest results.
Claude Skills Token Optimization: Reduce API Costs Guide — Extend token profiling results with broader API cost reduction strategies.
Measuring Claude Code Skill Efficiency Metrics — Go beyond token counts to measure full skill efficiency including task completion rates and time to output.
Advanced Claude Skills — Discover more advanced performance patterns for building production-quality skills.

Built by theluckystrike — More at zovo.one