Claude Code for Criterion Benchmarking Workflow Guide

Benchmarking is essential for understanding your code’s performance characteristics and identifying optimization opportunities. When working with Rust projects, the Criterion framework provides a robust solution for statistical benchmarking, and Claude Code can significantly streamline your benchmarking workflow. This guide walks you through building an efficient benchmarking pipeline using both tools together.

Understanding Criterion Benchmarking

Criterion is a statistics-driven benchmarking framework for Rust that goes beyond simple timing measurements. It provides:

Statistical analysis - Determines whether performance differences are meaningful
Warmup phases - Ensures accurate measurements by warming up the CPU cache
Plots and reports - Visualizes performance over time
Regression tracking - Compares results against previous runs

Before integrating with Claude Code, ensure you have Criterion set up in your Rust project:

# Cargo.toml
[dev-dependencies]
criterion = "0.5"

[[bench]]
name = "my_benchmark"
harness = false

Setting Up Your Benchmarking Project

The first step is organizing your project for efficient benchmarking. Claude Code can help you set up the entire structure:

Create a benchmarks directory - Keep your benchmarks organized
Configure Criterion - Set up custom measurements and thresholds
Establish baseline metrics - Record initial performance data

Here’s how to configure Criterion with custom settings:

// benches/my_benchmark.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn my_function(input: u64) -> u64 {
    // Your code to benchmark
    (0..input).fold(0, |acc, x| acc + x)
}

fn benchmark_basic(c: &mut Criterion) {
    c.bench_function("sum_0_to_n", |b| {
        b.iter(|| my_function(black_box(1000000)))
    });
}

criterion_group!(benches, benchmark_basic);
criterion_main!(benches);

Claude Code Integration Patterns

Claude Code excels at automating repetitive benchmarking tasks. Here are the key integration patterns:

1. Automated Baseline Generation

Use Claude Code to generate and save baseline benchmarks:

# Run Criterion and save baseline
cargo bench --save-baseline baseline-2026

Claude Code can help you create scripts that automate this process:

#!/bin/bash
# scripts/run_benchmark.sh
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
cargo bench --save-baseline "baseline-$TIMESTAMP"
echo "Baseline saved: baseline-$TIMESTAMP"

2. Regression Detection Workflow

Compare current results against baselines to detect regressions:

# Compare against baseline
cargo bench --compare baseline-2026

3. Batch Benchmarking

For comprehensive analysis, run multiple benchmarks in sequence:

# Run all benchmarks with output
cargo bench --message-format=json | jq -r 'select(.type == "benchmark") | .data'

Practical Example: Optimizing a Function

Let’s walk through a real-world optimization scenario using Claude Code and Criterion.

Initial Benchmark

First, create a benchmark for the function you want to optimize:

// benches/string_processing.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn process_strings(items: &[String]) -> Vec<String> {
    items.iter()
        .map(|s| s.to_uppercase())
        .map(|s| s.trim().to_string())
        .collect()
}

fn benchmark_string_processing(c: &mut Criterion) {
    let data: Vec<String> = (0..1000)
        .map(|i| format!("  item {}  ", i))
        .collect();
    
    c.bench_function("process_1000_strings", |b| {
        b.iter(|| process_strings(black_box(&data)))
    });
}

criterion_group!(benches, benchmark_string_processing);
criterion_main!(benches);

Run the initial benchmark:

cargo bench --bench string_processing

Analysis and Optimization

Claude Code can analyze the benchmark results and suggest improvements. Common optimization strategies include:

Avoid unnecessary allocations - Use &str instead of String when possible
Batch operations - Reduce intermediate collections
Use iterators efficiently - Chain operations to minimize passes

Here’s an optimized version:

fn process_strings_optimized(items: &[String]) -> Vec<String> {
    // Pre-allocate with exact capacity
    let mut result = Vec::with_capacity(items.len());
    
    for item in items {
        // Reuse the string buffer
        let mut s = item.trim().to_string();
        s.make_ascii_uppercase();
        result.push(s);
    }
    
    result
}

Verify Improvements

Run the comparison to verify your optimizations:

cargo bench --bench string_processing --compare baseline-2026

Best Practices for Benchmarking Workflows

1. Consistent Environment

Run benchmarks on a quiet system
Disable CPU frequency scaling
Use fixed seeds for random data

2. Statistical Significance

Increase sample size for noisy benchmarks
Set appropriate measurement time
Use warmup phases effectively

3. Automation with Claude Code

Claude Code can help automate the entire pipeline:

# .claude/benchmark.yml
benchmark:
  baseline_dir: "benches/baselines"
  compare_targets:
    - "baseline-2026-01"
    - "baseline-2026-02"
  thresholds:
    regression_warning: 0.10  # 10% regression triggers warning
    regression_error: 0.25    # 25% regression triggers error

4. Continuous Integration

Integrate benchmarking into your CI pipeline:

# .github/workflows/benchmark.yml
name: Benchmark
on: [push, pull_request]
jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run benchmarks
        run: cargo bench --no-run
      - name: Compare with baseline
        run: cargo bench --compare baseline-main

Advanced Techniques

Memory Profiling

Combine Criterion with memory profiling tools:

use criterion::Bencher;

fn benchmark_with_memory(b: &mut Bencher) {
    // Measure memory allocations
    b.iter(|| {
        let mut vec = Vec::new();
        for i in 0..1000 {
            vec.push(i);
        }
        vec
    });
}

Custom Measurements

Extend Criterion with custom measurements:

use criterion::{Criterion, BenchmarkId};

fn custom_benchmark(c: &mut Criterion) {
    let mut group = c.benchmark_group("custom_metrics");
    
    for size in [100, 1000, 10000].iter() {
        group.bench_with_input(
            BenchmarkId::new("processing", size),
            size,
            |b, &size| {
                b.iter(|| process_size(black_box(size)))
            }
        );
    }
    
    group.finish();
}

Conclusion

Combining Claude Code with Criterion creates a powerful benchmarking workflow. Claude Code handles the automation, organization, and analysis, while Criterion provides accurate, statistical measurements. Start with simple baselines, automate your comparison workflows, and progressively add sophistication as your benchmarking needs grow.

Remember: good benchmarks require careful setup and consistent execution. Use Claude Code to automate the repetitive parts, but always verify results manually when making critical optimization decisions.

This guide helps developers build efficient benchmarking workflows using Claude Code and Criterion.