Claude Code for Capacity Planning Workflow Tutorial
Capacity planning is one of the most challenging aspects of infrastructure management. Teams often struggle with predicting future resource needs, balancing costs, and responding to demand spikes. Claude Code offers a powerful way to automate and enhance capacity planning workflows through intelligent automation and data-driven decision-making.
This tutorial walks you through building a complete capacity planning workflow using Claude Code skills, complete with practical examples and actionable advice you can apply immediately to your projects.
Understanding Capacity Planning Challenges
Before diving into the technical implementation, it’s worth understanding what makes capacity planning difficult. Traditional approaches rely on static thresholds and manual analysis, which leads to either over-provisioning (wasting money) or under-provisioning (causing outages).
Modern capacity planning requires:
- Historical data analysis to identify trends
- Demand forecasting to predict future needs
- Cost optimization to balance performance and budget
- Automated responses to scale resources appropriately
Claude Code can assist with all of these areas through carefully designed skills that handle data collection, analysis, and even execution of scaling actions.
Setting Up Your Capacity Planning Skill
The first step is creating a dedicated Claude Code skill for capacity planning. This skill will encapsulate all the prompts, tools, and logic needed for your workflow.
Create a new skill file at ~/.claude/skills/capacity-planning/skill.md:
---
name: capacity-planning
description: Analyzes resource utilization and generates capacity planning recommendations
tools: [Read, Write, Bash, todo]
---
You are a capacity planning expert. Analyze the provided metrics and generate actionable recommendations for infrastructure scaling.
This minimal skill definition gives you a foundation to build upon. The key here is restricting tools to only what’s necessary for the workflow—in this case, file operations, bash commands for running analysis scripts, and the todo tool for tracking action items.
Collecting and Analyzing Metrics
The core of any capacity planning workflow is data. Your Claude Code skill needs access to metrics from your infrastructure. Here’s how to structure the analysis phase:
#!/usr/bin/env python3
"""
capacity_metrics_collector.py
Collects key metrics for capacity planning analysis
"""
import json
from datetime import datetime, timedelta
def collect_cpu_metrics(hosts, time_range="24h"):
"""Collect CPU utilization metrics from monitored hosts"""
metrics = []
for host in hosts:
# Simulated metric collection
metrics.append({
"host": host,
"avg_cpu": 65.2,
"peak_cpu": 89.7,
"p95_cpu": 78.3,
"timestamp": datetime.now().isoformat()
})
return metrics
def collect_memory_metrics(hosts):
"""Collect memory utilization metrics"""
metrics = []
for host in hosts:
metrics.append({
"host": host,
"avg_memory": 72.4,
"peak_memory": 91.2,
"p95_memory": 84.1,
"timestamp": datetime.now().isoformat()
})
return metrics
if __name__ == "__main__":
hosts = ["app-server-1", "app-server-2", "db-primary", "db-replica"]
data = {
"cpu": collect_cpu_metrics(hosts),
"memory": collect_memory_metrics(hosts)
}
print(json.dumps(data, indent=2))
This script collects CPU and memory metrics from your infrastructure. Run it periodically and store the results for trend analysis. Your Claude Code skill can then read these JSON files and provide analysis.
Building the Analysis Prompt
The real power of Claude Code comes from its ability to understand context and provide intelligent recommendations. Here’s how to structure the analysis prompt within your skill:
## Analysis Task
Review the collected metrics and provide capacity planning recommendations:
1. Identify utilization patterns and trends
2. Flag any resources approaching critical thresholds (>80%)
3. Recommend scaling actions for the next 7 days
4. Estimate cost implications of recommended changes
Present your findings in a structured format with clear action items.
When you invoke this skill with your metrics data, Claude will analyze patterns and provide recommendations based on its understanding of capacity planning best practices. The model can identify correlations you might miss and suggest actions that balance performance with cost efficiency.
Creating Automated Scaling Recommendations
Beyond passive analysis, you can extend your capacity planning skill to generate concrete scaling recommendations. Here’s a practical approach:
#!/usr/bin/env python3
"""
scaling_recommender.py
Generates scaling recommendations based on utilization data
"""
import json
def analyze_scaling_needs(metrics, thresholds={"cpu": 80, "memory": 85}):
recommendations = []
for host_data in metrics.get("cpu", []):
if host_data["peak_cpu"] > thresholds["cpu"]:
recommendations.append({
"host": host_data["host"],
"action": "scale_up",
"reason": f"Peak CPU {host_data['peak_cpu']}% exceeds threshold",
"current": host_data["peak_cpu"],
"threshold": thresholds["cpu"]
})
for host_data in metrics.get("memory", []):
if host_data["peak_memory"] > thresholds["memory"]:
recommendations.append({
"host": host_data["host"],
"action": "scale_up",
"reason": f"Peak memory {host_data['peak_memory']}% exceeds threshold",
"current": host_data["peak_memory"],
"threshold": thresholds["memory"]
})
return recommendations
if __name__ == "__main__":
# Sample input
sample_metrics = {
"cpu": [{"host": "app-server-1", "peak_cpu": 92.5}],
"memory": [{"host": "db-primary", "peak_memory": 88.1}]
}
results = analyze_scaling_needs(sample_metrics)
print(json.dumps(results, indent=2))
This script identifies when resources exceed defined thresholds and generates actionable recommendations. Integrate it into your Claude Code workflow by having the skill call this script and then analyze the output.
Implementing a Complete Workflow
Now let’s put it all together into a complete capacity planning workflow:
#!/bin/bash
# capacity-planning-workflow.sh
# Complete capacity planning workflow orchestration
set -e
echo "=== Starting Capacity Planning Workflow ==="
echo "Collecting metrics..."
python3 ~/scripts/capacity_metrics_collector.py > /tmp/metrics.json
echo "Analyzing and generating recommendations..."
python3 ~/scripts/scaling_recommender.py < /tmp/metrics.json > /tmp/recommendations.json
echo "Review recommendations with Claude..."
claude --print "
Review the following scaling recommendations:
$(cat /tmp/recommendations.json)
For each recommendation, provide:
1. Priority (P1-P3)
2. Implementation steps
3. Expected outcome
"
This workflow collects metrics, generates initial recommendations, and then uses Claude to add intelligent context and prioritization. The human-in-the-loop approach ensures that scaling decisions are reviewed before execution.
Best Practices for Capacity Planning Skills
When building capacity planning workflows with Claude Code, keep these best practices in mind:
Start with clean data. Claude is only as good as the data you provide. Ensure your metric collection is reliable and consistent. Invest time in proper instrumentation before expecting useful recommendations.
Use appropriate thresholds. Not all resources should have the same thresholds. Database servers might need headroom at 70% CPU, while stateless application servers can safely run at 90%. Tailor thresholds to your workload characteristics.
Include cost context. Capacity planning is always a trade-off between performance and cost. Include pricing information in your data so Claude can recommend cost-effective solutions.
Maintain human oversight. Fully automated scaling can be risky. Design your workflow to generate recommendations that humans review before execution, especially for production environments.
Iterate and improve. Start simple and add sophistication over time. Monitor what recommendations Claude provides and refine your prompts based on the results.
Conclusion
Claude Code transforms capacity planning from a reactive, manual process into an intelligent, automated workflow. By combining structured data collection with Claude’s analysis capabilities, you can build systems that proactively identify scaling needs, predict future demand, and optimize resource allocation.
The key is starting simple: collect metrics, generate basic recommendations, and gradually add sophistication as you learn what works for your specific infrastructure. With this tutorial’s patterns as a foundation, you’ll be able to build capacity planning workflows that save both money and prevent outages.
Remember that the most effective capacity planning combines automation with human judgment. Use Claude Code to do the heavy lifting of data analysis and recommendation generation, but keep experienced engineers in the loop for critical decisions.