Claude Code for Criterion Benchmarking Workflow Guide
Benchmarking is essential for understanding your code’s performance characteristics and identifying optimization opportunities. When working with Rust projects, the Criterion framework provides a robust solution for statistical benchmarking, and Claude Code can significantly streamline your benchmarking workflow. This guide walks you through building an efficient benchmarking pipeline using both tools together.
Understanding Criterion Benchmarking
Criterion is a statistics-driven benchmarking framework for Rust that goes beyond simple timing measurements. It provides:
- Statistical analysis - Determines whether performance differences are meaningful
- Warmup phases - Ensures accurate measurements by warming up the CPU cache
- Plots and reports - Visualizes performance over time
- Regression tracking - Compares results against previous runs
Before integrating with Claude Code, ensure you have Criterion set up in your Rust project:
# Cargo.toml
[dev-dependencies]
criterion = "0.5"
[[bench]]
name = "my_benchmark"
harness = false
Setting Up Your Benchmarking Project
The first step is organizing your project for efficient benchmarking. Claude Code can help you set up the entire structure:
- Create a benchmarks directory - Keep your benchmarks organized
- Configure Criterion - Set up custom measurements and thresholds
- Establish baseline metrics - Record initial performance data
Here’s how to configure Criterion with custom settings:
// benches/my_benchmark.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn my_function(input: u64) -> u64 {
// Your code to benchmark
(0..input).fold(0, |acc, x| acc + x)
}
fn benchmark_basic(c: &mut Criterion) {
c.bench_function("sum_0_to_n", |b| {
b.iter(|| my_function(black_box(1000000)))
});
}
criterion_group!(benches, benchmark_basic);
criterion_main!(benches);
Claude Code Integration Patterns
Claude Code excels at automating repetitive benchmarking tasks. Here are the key integration patterns:
1. Automated Baseline Generation
Use Claude Code to generate and save baseline benchmarks:
# Run Criterion and save baseline
cargo bench --save-baseline baseline-2026
Claude Code can help you create scripts that automate this process:
#!/bin/bash
# scripts/run_benchmark.sh
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
cargo bench --save-baseline "baseline-$TIMESTAMP"
echo "Baseline saved: baseline-$TIMESTAMP"
2. Regression Detection Workflow
Compare current results against baselines to detect regressions:
# Compare against baseline
cargo bench --compare baseline-2026
3. Batch Benchmarking
For comprehensive analysis, run multiple benchmarks in sequence:
# Run all benchmarks with output
cargo bench --message-format=json | jq -r 'select(.type == "benchmark") | .data'
Practical Example: Optimizing a Function
Let’s walk through a real-world optimization scenario using Claude Code and Criterion.
Initial Benchmark
First, create a benchmark for the function you want to optimize:
// benches/string_processing.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn process_strings(items: &[String]) -> Vec<String> {
items.iter()
.map(|s| s.to_uppercase())
.map(|s| s.trim().to_string())
.collect()
}
fn benchmark_string_processing(c: &mut Criterion) {
let data: Vec<String> = (0..1000)
.map(|i| format!(" item {} ", i))
.collect();
c.bench_function("process_1000_strings", |b| {
b.iter(|| process_strings(black_box(&data)))
});
}
criterion_group!(benches, benchmark_string_processing);
criterion_main!(benches);
Run the initial benchmark:
cargo bench --bench string_processing
Analysis and Optimization
Claude Code can analyze the benchmark results and suggest improvements. Common optimization strategies include:
- Avoid unnecessary allocations - Use
&strinstead ofStringwhen possible - Batch operations - Reduce intermediate collections
- Use iterators efficiently - Chain operations to minimize passes
Here’s an optimized version:
fn process_strings_optimized(items: &[String]) -> Vec<String> {
// Pre-allocate with exact capacity
let mut result = Vec::with_capacity(items.len());
for item in items {
// Reuse the string buffer
let mut s = item.trim().to_string();
s.make_ascii_uppercase();
result.push(s);
}
result
}
Verify Improvements
Run the comparison to verify your optimizations:
cargo bench --bench string_processing --compare baseline-2026
Best Practices for Benchmarking Workflows
1. Consistent Environment
- Run benchmarks on a quiet system
- Disable CPU frequency scaling
- Use fixed seeds for random data
2. Statistical Significance
- Increase sample size for noisy benchmarks
- Set appropriate measurement time
- Use warmup phases effectively
3. Automation with Claude Code
Claude Code can help automate the entire pipeline:
# .claude/benchmark.yml
benchmark:
baseline_dir: "benches/baselines"
compare_targets:
- "baseline-2026-01"
- "baseline-2026-02"
thresholds:
regression_warning: 0.10 # 10% regression triggers warning
regression_error: 0.25 # 25% regression triggers error
4. Continuous Integration
Integrate benchmarking into your CI pipeline:
# .github/workflows/benchmark.yml
name: Benchmark
on: [push, pull_request]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run benchmarks
run: cargo bench --no-run
- name: Compare with baseline
run: cargo bench --compare baseline-main
Advanced Techniques
Memory Profiling
Combine Criterion with memory profiling tools:
use criterion::Bencher;
fn benchmark_with_memory(b: &mut Bencher) {
// Measure memory allocations
b.iter(|| {
let mut vec = Vec::new();
for i in 0..1000 {
vec.push(i);
}
vec
});
}
Custom Measurements
Extend Criterion with custom measurements:
use criterion::{Criterion, BenchmarkId};
fn custom_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("custom_metrics");
for size in [100, 1000, 10000].iter() {
group.bench_with_input(
BenchmarkId::new("processing", size),
size,
|b, &size| {
b.iter(|| process_size(black_box(size)))
}
);
}
group.finish();
}
Conclusion
Combining Claude Code with Criterion creates a powerful benchmarking workflow. Claude Code handles the automation, organization, and analysis, while Criterion provides accurate, statistical measurements. Start with simple baselines, automate your comparison workflows, and progressively add sophistication as your benchmarking needs grow.
Remember: good benchmarks require careful setup and consistent execution. Use Claude Code to automate the repetitive parts, but always verify results manually when making critical optimization decisions.
This guide helps developers build efficient benchmarking workflows using Claude Code and Criterion.