Claude Skills Guide

Claude Code for Load Test Results Analysis Workflow

Load testing is essential for understanding how your application performs under stress, but analyzing the results can be time-consuming and error-prone. A well-designed Claude Code skill can transform raw load test data into actionable insights, helping you identify bottlenecks, compare performance across builds, and generate reports without manual effort.

This guide shows you how to create a skill that automates load test results analysis using Claude Code, with practical examples for popular load testing tools like JMeter, k6, and Gatling.

Why Use Claude Code for Load Test Analysis?

Traditional load test analysis requires opening multiple files, importing data into spreadsheets, and manually calculating percentiles. This process is:

A Claude Code skill can:

  1. Parse multiple load test output formats automatically
  2. Calculate statistics (p50, p95, p99, throughput, error rates)
  3. Compare results against baselines and thresholds
  4. Generate human-readable summaries and recommendations
  5. Identify regressions or improvements across test runs

Building Your Load Test Analysis Skill

Skill Structure

Create a new skill file at skills/load-test-analyzer.md with this structure:

---
name: load-test-analyzer
description: Analyze load test results from JMeter, k6, and Gatling. Calculate metrics, identify bottlenecks, and generate insights.
tools: [read_file, bash, write_file]
aliases: [analyze-load, load-insights]
patterns:
  - "analyze load test results"
  - "performance test analysis"
  - "load test bottlenecks"
---

# Load Test Results Analyzer

You analyze load test results to identify performance issues and provide actionable recommendations. You work with JMeter CSV/XML, k6 JSON, and Gatling reports.

## Available Analysis Modes

When the user asks to analyze load test results:

1. **Quick Summary**: Show key metrics at a glance
2. **Detailed Analysis**: Deep dive into response times, error rates, throughput
3. **Comparison**: Compare current vs. baseline results
4. **Trend Analysis**: Track metrics across multiple test runs

Parsing Different Load Test Formats

Your skill needs to handle various output formats. Here’s how to implement parsers for common tools:

k6 JSON Output

k6 exports JSON that includes all metrics:

import json

def parse_k6_results(json_file):
    with open(json_file) as f:
        data = json.load(f)
    
    metrics = {}
    for metric in data.get('metrics', {}):
        metrics[metric] = {
            'avg': data['metrics'][metric].get('values', {}).get('avg'),
            'p95': data['metrics'][metric].get('values', {}).get('p(95)'),
            'p99': data['metrics'][metric].get('values', {}).get('p(99)'),
        }
    return metrics

JMeter CSV Parsing

JMeter’s CSV format requires handling headers and timestamp columns:

import csv

def parse_jmeter_csv(csv_file):
    results = []
    with open(csv_file, 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            results.append({
                'elapsed': float(row.get('elapsed', 0)),
                'responseCode': row.get('responseCode', ''),
                'success': row.get('success', 'true') == 'true',
                'timestamp': row.get('timeStamp', '')
            })
    return results

Practical Analysis Examples

Example 1: Quick Bottleneck Identification

When analyzing results, focus on these key indicators:

  1. Response Time Degradation: Compare p95/p99 across test runs
  2. Error Rate Spikes: Look for HTTP 5xx errors or timeouts
  3. Throughput Limits: Identify when requests start queuing
  4. Resource Saturation: Correlate with CPU/memory metrics

A Claude prompt for quick analysis:

Analyze the load test results in results/k6-summary.json. Identify:
- Overall pass/fail status
- Top 3 slowest endpoints (p95 response time)
- Any error rate above 1%
- Recommendations for improvement

Example 2: Automated Regression Detection

Compare current results against a baseline:

def detect_regression(current, baseline, threshold=0.2):
    """Detect if current performance regressed beyond threshold"""
    regressions = []
    
    for endpoint in baseline:
        baseline_p95 = baseline[endpoint]['p95']
        current_p95 = current.get(endpoint, {}).get('p95', 0)
        
        if baseline_p95 > 0:
            degradation = (current_p95 - baseline_p95) / baseline_p95
            
            if degradation > threshold:
                regressions.append({
                    'endpoint': endpoint,
                    'baseline_p95': baseline_p95,
                    'current_p95': current_p95,
                    'degradation_pct': round(degradation * 100, 2)
                })
    
    return regressions

Example 3: Generating Performance Reports

Create automated reports that teams can act on:

## Load Test Report - Build #1247

**Test Date**: 2026-03-15
**Environment**: staging
**Duration**: 30 minutes
**Virtual Users**: 500

### Summary
- ✅ All critical endpoints below 2s p95
- ⚠️ /api/search showing 15% degradation vs baseline
- ✅ Error rate: 0.03% (below 1% threshold)

### Detailed Metrics

| Endpoint | p50 | p95 | p99 | Error Rate |
|----------|-----|-----|-----|------------|
| /api/home | 120ms | 450ms | 890ms | 0.01% |
| /api/search | 380ms | 2100ms | 3500ms | 0.08% |
| /api/checkout | 210ms | 780ms | 1200ms | 0.02% |

### Recommendations
1. **High Priority**: Investigate /api/search degradation
2. **Medium Priority**: Consider caching /api/home responses
3. **Low Priority**: Monitor /api/checkout under higher load

Best Practices for Load Test Skills

1. Define Clear Thresholds

Don’t just report metrics—provide context:

THRESHOLDS = {
    'p95_response_time_ms': 2000,
    'error_rate_percent': 1.0,
    'throughput_rps': 100,
    'p99_max_ms': 5000
}

2. Handle Multiple Test Runs

Organize results by timestamp for trend analysis:

def analyze_trends(results_dir):
    """Analyze performance trends across multiple test runs"""
    runs = sorted(Path(results_dir).glob('**/summary.json'))
    
    trends = []
    for run in runs:
        data = json.loads(run.read_text())
        trends.append({
            'date': run.parent.name,
            'p95': data['metrics']['http_req_duration']['p95'],
            'rps': data['metrics']['http_reqs']['rate']
        })
    
    return trends

3. Integrate with CI/CD

Use Claude Code skills in your pipeline:

# GitHub Actions example
- name: Analyze Load Test
  run: |
    claude --skill load-test-analyzer \
      analyze results/k6-summary.json \
      --compare baseline/baseline.json \
      --output report.md

4. Maintain Historical Baselines

Store baseline results for regression detection:

baselines/
  2026-03-01-staging.json
  2026-03-08-staging.json
  2026-03-15-staging.json

Conclusion

A Claude Code skill for load test analysis transforms raw performance data into actionable insights. By automating metric calculation, regression detection, and report generation, you save time while ensuring consistent, thorough analysis.

Start with the quick summary mode, then expand to comparison and trend analysis as your skill matures. The key is defining clear thresholds and maintaining historical baselines so Claude can identify regressions automatically.

With the right skill design, your team can make data-driven performance decisions faster and more reliably than ever before.