Claude Skills Guide

Claude Skill Token Usage Profiling and Optimization

Every token your Claude skill sends to the model costs money and affects response latency. As you build more complex skills—whether you’re working with the pdf skill for document processing, the tdd skill for test-driven development, or custom skills for your workflow—understanding token usage becomes essential for keeping costs down and responses fast. For techniques on actually shrinking your skill bodies, see Claude skill prompt compression techniques.

This guide covers practical methods for profiling token consumption in your skills and implementing optimization strategies that actually work.

Understanding Token Sources in Claude Skills

When you invoke a skill, tokens come from three main sources:

  1. Skill body tokens: The content of your .md skill file, including the system prompt and any examples
  2. Context tokens: Files, conversation history, and additional context you load
  3. Output tokens: The model’s response

The skill body is often the easiest to measure and optimize. Let’s start there.

Profiling Your Skill’s Token Usage

Measuring Skill Body Tokens

The simplest way to measure your skill file’s token count is by using a simple script. Create a helper skill or use the bash skill to count tokens:

import tiktoken

def count_tokens(text, model="claude-3-5-sonnet-20241022"):
    encoding = tiktoken.get_encoding("claude")
    return len(encoding.encode(text))

with open("skills/your-skill.md", "r") as f:
    content = f.read()
    tokens = count_tokens(content)
    print(f"Skill body: {tokens} tokens")

This gives you a baseline. Most production skills should stay under 2000 tokens for the skill body alone. If you’re seeing 5000+ tokens, you have optimization opportunities.

Tracking Real-World Usage

For skills you actively use, enable token tracking in your Claude configuration. Add this to your ~/.claude/settings.json:

{
  "verbose": true,
  "showTokenCounts": true
}

This displays token usage after each response, showing you exactly how many tokens were sent and received. Use this data to identify which skills consume the most tokens during typical workflows.

Optimization Strategies That Work

Strategy 1: Trim Your System Prompts

The most effective optimization is reducing skill body tokens. Review your skill prompts for these common issues:

Remove redundant instructions: Claude already knows basic coding practices. Don’t restate fundamentals:

❌ AVOID:
"Write clean code with proper variable names. Use functions to organize logic.
Avoid global variables. Comment your code."

✅ PREFER:
"Follow the existing code style in src/. Add JSDoc comments to public functions."

Use implicit instructions: Instead of explicit rules, show examples:

❌ AVOID:
"When returning JSON, always include proper indentation of 2 spaces.
Never omit the closing brace. Always use double quotes for strings."

✅ PREFER:
"Return responses in this JSON format:
{
  \"status\": \"success\",  // operation result
  \"data\": {...}         // response payload
}"

Strategy 2: Lazy-Load Context

Skills like frontend-design or canvas-design often need reference files but don’t need everything upfront. Use conditional context loading:

## Context Loading

Only load these files when specifically requested:
- design-system.css (when working on UI components)
- assets/ directory (when asked about visuals)

Do not load large asset directories unless explicitly requested.

This prevents the skill from consuming tokens reading files it won’t need.

Strategy 3: Template Responses

For skills that produce structured output—like the xlsx skill for spreadsheets or the docx skill for documents—provide output templates directly in the skill:

## Output Format

Generate files matching these templates:

config.yaml:
```yaml
database:
  host: localhost
  port: 5432
  name: app_db

This approach reduces tokens spent on formatting instructions while ensuring consistent output.

Strategy 4: Split Large Skills

If a skill exceeds 3000 tokens, split it into related skills with a shared base. Create a core skill other skills reference:

skills/
├── _core.md          # Shared patterns and rules (~800 tokens)
├── pdf-edit.md       # Uses _core for editing (~400 tokens)
├── pdf-extract.md    # Uses _core for extraction (~400 tokens)

Reference the core in each skill’s prompt:

You follow the conventions in _core.md. Apply those patterns to PDF tasks.

This modular approach keeps individual skills lean while maintaining consistency.

Practical Example: Optimizing a TDD Skill

Consider a tdd skill that started at 4200 tokens. Here’s the optimization process:

Before optimization:

After optimization:

Final token count: 1300 tokens (69% reduction)

The optimized skill:

Measuring Your Optimizations

After implementing changes, measure the impact:

  1. Token delta: Compare token counts before and after
  2. Response quality: Ensure output quality hasn’t degraded
  3. Latency: Check if response time improved
  4. Cost: Calculate savings at your usage volume

Create a simple tracking table:

Skill Before After Savings Quality Check
tdd 4200 1300 69% ✓ Same
pdf-extract 2800 1900 32% ✓ Same
frontend 5100 3400 33% ✓ Same

When to Optimize

Not every skill needs aggressive optimization. Optimize when:

For skills you use occasionally, the optimization effort likely isn’t worth the time savings.

Building Token-Aware Skills

As you create new skills, keep token efficiency in mind from the start:

  1. Write tight prompts: Every word should earn its place
  2. Use examples sparingly: One good example beats three mediocre ones
  3. Reference over repeat: Point to external docs rather than copying content
  4. Test during development: Measure tokens as you iterate

Skills like supermemory for knowledge management benefit from this approach from day one.

Conclusion

Token optimization isn’t about stripping valuable content—it’s about removing redundancy and letting Claude’s base capabilities handle what they already know. The skills that perform best are often the leanest, because less context means faster responses and clearer guidance.

Start by measuring your current skills, then apply these strategies systematically. You’ll likely find that 30-50% token reduction is achievable without any loss in output quality.

Built by theluckystrike — More at zovo.one