Claude Skill Token Usage Profiling and Optimization
Every token your Claude skill sends to the model costs money and affects response latency. As you build more complex skills—whether you’re working with the pdf skill for document processing, the tdd skill for test-driven development, or custom skills for your workflow—understanding token usage becomes essential for keeping costs down and responses fast. For techniques on actually shrinking your skill bodies, see Claude skill prompt compression techniques.
This guide covers practical methods for profiling token consumption in your skills and implementing optimization strategies that actually work.
Understanding Token Sources in Claude Skills
When you invoke a skill, tokens come from three main sources:
- Skill body tokens: The content of your
.mdskill file, including the system prompt and any examples - Context tokens: Files, conversation history, and additional context you load
- Output tokens: The model’s response
The skill body is often the easiest to measure and optimize. Let’s start there.
Profiling Your Skill’s Token Usage
Measuring Skill Body Tokens
The simplest way to measure your skill file’s token count is by using a simple script. Create a helper skill or use the bash skill to count tokens:
import tiktoken
def count_tokens(text, model="claude-3-5-sonnet-20241022"):
encoding = tiktoken.get_encoding("claude")
return len(encoding.encode(text))
with open("skills/your-skill.md", "r") as f:
content = f.read()
tokens = count_tokens(content)
print(f"Skill body: {tokens} tokens")
This gives you a baseline. Most production skills should stay under 2000 tokens for the skill body alone. If you’re seeing 5000+ tokens, you have optimization opportunities.
Tracking Real-World Usage
For skills you actively use, enable token tracking in your Claude configuration. Add this to your ~/.claude/settings.json:
{
"verbose": true,
"showTokenCounts": true
}
This displays token usage after each response, showing you exactly how many tokens were sent and received. Use this data to identify which skills consume the most tokens during typical workflows.
Optimization Strategies That Work
Strategy 1: Trim Your System Prompts
The most effective optimization is reducing skill body tokens. Review your skill prompts for these common issues:
Remove redundant instructions: Claude already knows basic coding practices. Don’t restate fundamentals:
❌ AVOID:
"Write clean code with proper variable names. Use functions to organize logic.
Avoid global variables. Comment your code."
✅ PREFER:
"Follow the existing code style in src/. Add JSDoc comments to public functions."
Use implicit instructions: Instead of explicit rules, show examples:
❌ AVOID:
"When returning JSON, always include proper indentation of 2 spaces.
Never omit the closing brace. Always use double quotes for strings."
✅ PREFER:
"Return responses in this JSON format:
{
\"status\": \"success\", // operation result
\"data\": {...} // response payload
}"
Strategy 2: Lazy-Load Context
Skills like frontend-design or canvas-design often need reference files but don’t need everything upfront. Use conditional context loading:
## Context Loading
Only load these files when specifically requested:
- design-system.css (when working on UI components)
- assets/ directory (when asked about visuals)
Do not load large asset directories unless explicitly requested.
This prevents the skill from consuming tokens reading files it won’t need.
Strategy 3: Template Responses
For skills that produce structured output—like the xlsx skill for spreadsheets or the docx skill for documents—provide output templates directly in the skill:
## Output Format
Generate files matching these templates:
config.yaml:
```yaml
database:
host: localhost
port: 5432
name: app_db
This approach reduces tokens spent on formatting instructions while ensuring consistent output.
Strategy 4: Split Large Skills
If a skill exceeds 3000 tokens, split it into related skills with a shared base. Create a core skill other skills reference:
skills/
├── _core.md # Shared patterns and rules (~800 tokens)
├── pdf-edit.md # Uses _core for editing (~400 tokens)
├── pdf-extract.md # Uses _core for extraction (~400 tokens)
Reference the core in each skill’s prompt:
You follow the conventions in _core.md. Apply those patterns to PDF tasks.
This modular approach keeps individual skills lean while maintaining consistency.
Practical Example: Optimizing a TDD Skill
Consider a tdd skill that started at 4200 tokens. Here’s the optimization process:
Before optimization:
- Full explanation of TDD principles: 800 tokens
- Multiple examples of test patterns: 1200 tokens
- Detailed rules for each language: 1500 tokens
- General coding guidelines: 700 tokens
After optimization:
- Reference to “TDD fundamentals” (user already knows): 0 tokens
- One concrete example per major language (600 tokens)
- Language-specific rules only when detected (400 tokens)
- Shortened coding guidelines, focusing on project-specific: 300 tokens
Final token count: 1300 tokens (69% reduction)
The optimized skill:
- Costs less per invocation
- Responds faster due to shorter context
- Still produces the same quality output because the removed content was redundant
Measuring Your Optimizations
After implementing changes, measure the impact:
- Token delta: Compare token counts before and after
- Response quality: Ensure output quality hasn’t degraded
- Latency: Check if response time improved
- Cost: Calculate savings at your usage volume
Create a simple tracking table:
| Skill | Before | After | Savings | Quality Check |
|---|---|---|---|---|
| tdd | 4200 | 1300 | 69% | ✓ Same |
| pdf-extract | 2800 | 1900 | 32% | ✓ Same |
| frontend | 5100 | 3400 | 33% | ✓ Same |
When to Optimize
Not every skill needs aggressive optimization. Optimize when:
- You use the skill frequently (daily or more)
- The skill exceeds 2500 tokens
- You’re noticing latency issues
- Token costs are impacting your budget
For skills you use occasionally, the optimization effort likely isn’t worth the time savings.
Building Token-Aware Skills
As you create new skills, keep token efficiency in mind from the start:
- Write tight prompts: Every word should earn its place
- Use examples sparingly: One good example beats three mediocre ones
- Reference over repeat: Point to external docs rather than copying content
- Test during development: Measure tokens as you iterate
Skills like supermemory for knowledge management benefit from this approach from day one.
Conclusion
Token optimization isn’t about stripping valuable content—it’s about removing redundancy and letting Claude’s base capabilities handle what they already know. The skills that perform best are often the leanest, because less context means faster responses and clearer guidance.
Start by measuring your current skills, then apply these strategies systematically. You’ll likely find that 30-50% token reduction is achievable without any loss in output quality.
Related Reading
- Claude Code Token Usage Optimization Best Practices Guide — If you’re a user rather than a skill author, this covers session-level strategies: prompt structuring, context window management, and output constraints.
- Claude Skill Prompt Compression Techniques — Apply compression techniques directly to the skill bodies you profile for fastest results.
- Claude Skills Token Optimization: Reduce API Costs Guide — Extend token profiling results with broader API cost reduction strategies.
- Measuring Claude Code Skill Efficiency Metrics — Go beyond token counts to measure full skill efficiency including task completion rates and time to output.
- Advanced Claude Skills — Discover more advanced performance patterns for building production-quality skills.
Built by theluckystrike — More at zovo.one