Coverage reports tell you what’s untested but not why or how to fix it. AI tools bridge that gap — they read your source code, analyze existing tests, and generate targeted tests for uncovered branches. This guide covers practical workflows using Claude, GitHub Copilot, and CodiumAI’s cover-agent.
Table of Contents
- The Coverage Problem
- Approach 1: Claude for Targeted Gap Analysis
- Example Output
- Approach 2: GitHub Copilot Chat for Coverage
- Approach 3: CodiumAI cover-agent (OSS)
- Coverage Impact Comparison
- Integration in CI
- Validating AI-Generated Tests Before Committing
- Prioritizing Coverage by Risk
- Tracking Coverage Trends Over Time
- Related Reading
The Coverage Problem
Low coverage is rarely random. It clusters in three places:
- Error handling branches (
except,catch, edge cases) - Complex conditionals with many combinations
- Recently added code with no accompanying tests
AI tools are particularly good at the first two. They can enumerate all branches in a function and generate a test for each one — something that takes a human 30 minutes per function.
Approach 1: Claude for Targeted Gap Analysis
Use coverage.py to generate a JSON report, then pipe uncovered lines to Claude:
# Run tests with coverage
pytest --cov=src --cov-report=json --cov-report=term-missing
# coverage.json is generated in the working directory
# scripts/ai_coverage_gap.py
"""
Reads coverage.json, finds uncovered functions, and uses Claude
to generate missing test cases.
"""
import json
import os
import ast
from pathlib import Path
from anthropic import Anthropic
client = Anthropic()
def load_coverage_report(path: str = "coverage.json") -> dict:
with open(path) as f:
return json.load(f)
def find_uncovered_functions(
coverage_data: dict,
threshold: float = 0.80
) -> list[dict]:
"""Return functions with coverage below threshold."""
gaps = []
for filepath, file_data in coverage_data["files"].items():
if "test_" in filepath or "/tests/" in filepath:
continue
missing_lines = set(file_data.get("missing_lines", []))
if not missing_lines:
continue
# Parse AST to find function boundaries
try:
source = Path(filepath).read_text()
tree = ast.parse(source)
except Exception:
continue
for node in ast.walk(tree):
if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
continue
func_lines = set(range(node.lineno, node.end_lineno + 1))
uncovered = func_lines & missing_lines
coverage_pct = 1 - len(uncovered) / len(func_lines) if func_lines else 1.0
if coverage_pct < threshold:
# Extract function source
source_lines = source.split("\n")
func_source = "\n".join(
source_lines[node.lineno - 1:node.end_lineno]
)
gaps.append({
"file": filepath,
"function": node.name,
"coverage": round(coverage_pct, 2),
"missing_lines": sorted(uncovered),
"source": func_source
})
return sorted(gaps, key=lambda x: x["coverage"])
def generate_tests_for_function(gap: dict) -> str:
"""Ask Claude to write tests for an uncovered function."""
prompt = f"""Write pytest test cases for this Python function.
The function currently has {int(gap['coverage'] * 100)}% coverage.
Missing lines: {gap['missing_lines']}
Focus on:
1. The uncovered branches (error cases, edge inputs, boundary values)
2. Each test should test one specific scenario
3. Use descriptive test names that explain the scenario
4. Include any necessary fixtures or mocks as inline setup
Function from {gap['file']}:
```python
{gap['source']}
Return only the test code, no explanation. Start with imports.”””
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2000,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
def main(): coverage_data = load_coverage_report() gaps = find_uncovered_functions(coverage_data, threshold=0.80)
print(f"Found {len(gaps)} functions below 80% coverage\n")
for gap in gaps[:5]: # Process top 5 worst-covered functions
print(f"Generating tests for {gap['function']} in {gap['file']}")
print(f" Current coverage: {int(gap['coverage'] * 100)}%")
print(f" Missing lines: {gap['missing_lines']}\n")
test_code = generate_tests_for_function(gap)
# Write to test file
test_filename = f"tests/ai_generated/test_{gap['function']}.py"
Path("tests/ai_generated").mkdir(exist_ok=True)
with open(test_filename, "w") as f:
f.write(f"# AI-generated tests for {gap['function']}\n")
f.write(f"# Source: {gap['file']}\n\n")
f.write(test_code)
print(f" Written to {test_filename}\n")
if name == “main”: main()
## Example Output
Given this function with low coverage:
```python
# src/payment_processor.py
def process_refund(transaction_id: str, amount: float, reason: str) -> dict:
if not transaction_id:
raise ValueError("transaction_id is required")
if amount <= 0:
raise ValueError(f"Refund amount must be positive, got {amount}")
if amount > MAX_REFUND_AMOUNT:
raise RefundLimitExceeded(f"Amount {amount} exceeds limit {MAX_REFUND_AMOUNT}")
transaction = db.get_transaction(transaction_id)
if transaction is None:
raise TransactionNotFound(transaction_id)
if transaction.status == "refunded":
return {"status": "already_refunded", "transaction_id": transaction_id}
result = payment_gateway.refund(transaction_id, amount)
db.update_transaction(transaction_id, status="refunded", refund_amount=amount)
audit_log.write(event="refund", transaction_id=transaction_id, reason=reason)
return {"status": "success", "refund_id": result.refund_id}
Claude generates:
# AI-generated tests for process_refund
import pytest
from unittest.mock import patch, MagicMock
from src.payment_processor import process_refund, RefundLimitExceeded, TransactionNotFound
class TestProcessRefund:
def test_raises_when_transaction_id_empty(self):
with pytest.raises(ValueError, match="transaction_id is required"):
process_refund("", 50.0, "customer request")
def test_raises_when_amount_zero(self):
with pytest.raises(ValueError, match="must be positive"):
process_refund("txn_123", 0, "test")
def test_raises_when_amount_negative(self):
with pytest.raises(ValueError, match="must be positive"):
process_refund("txn_123", -10.0, "test")
def test_raises_when_amount_exceeds_limit(self):
with pytest.raises(RefundLimitExceeded):
process_refund("txn_123", 999999.0, "large refund")
def test_raises_when_transaction_not_found(self):
with patch("src.payment_processor.db") as mock_db:
mock_db.get_transaction.return_value = None
with pytest.raises(TransactionNotFound):
process_refund("txn_missing", 50.0, "test")
def test_returns_already_refunded_for_duplicate(self):
with patch("src.payment_processor.db") as mock_db:
mock_txn = MagicMock()
mock_txn.status = "refunded"
mock_db.get_transaction.return_value = mock_txn
result = process_refund("txn_123", 50.0, "duplicate")
assert result["status"] == "already_refunded"
def test_successful_refund_returns_refund_id(self):
with patch("src.payment_processor.db") as mock_db, \
patch("src.payment_processor.payment_gateway") as mock_gw, \
patch("src.payment_processor.audit_log"):
mock_txn = MagicMock()
mock_txn.status = "completed"
mock_db.get_transaction.return_value = mock_txn
mock_gw.refund.return_value = MagicMock(refund_id="ref_abc")
result = process_refund("txn_123", 50.0, "customer request")
assert result["status"] == "success"
assert result["refund_id"] == "ref_abc"
Claude correctly identified all 7 branches and wrote a test for each one.
Approach 2: GitHub Copilot Chat for Coverage
In VS Code with the coverage gutters extension showing red lines:
- Select the uncovered function
- Open Copilot Chat:
Cmd+I - Prompt:
/tests Write tests for the uncovered branches highlighted in red
Copilot reads the active file, uses coverage annotations from the editor, and generates inline tests. No CLI setup required.
Copilot strengths: Faster for one-off coverage gaps. Integrates with the editor’s coverage visualization.
Copilot weaknesses: Can’t batch-process an entire codebase. No programmatic output — you have to copy-paste.
Approach 3: CodiumAI cover-agent (OSS)
pip install cover-agent
cover-agent \
--source-file-path "src/payment_processor.py" \
--test-file-path "tests/test_payment_processor.py" \
--code-coverage-report-path "coverage.xml" \
--test-command "pytest tests/test_payment_processor.py --cov=src --cov-report=xml" \
--coverage-type cobertura \
--desired-coverage 90 \
--max-iterations 3 \
--model "claude-opus-4-6"
cover-agent runs an iterative loop: generate tests → run them → check coverage → repeat until target is reached or max iterations hit. It handles import errors and fixture issues automatically in subsequent iterations.
Coverage Impact Comparison
| Tool | Setup Time | Batch Processing | Coverage Accuracy | Cost |
|---|---|---|---|---|
| Claude (script) | 20 min | Yes (unlimited) | High | ~$0.05/function |
| Copilot Chat | Zero | No (manual) | Medium | Included |
| cover-agent | 10 min | Yes (per file) | High (iterative) | API cost |
| CodiumAI PR-Agent | 30 min | Yes (per PR) | Medium | Free tier |
Integration in CI
Add a coverage gate that triggers AI test generation when coverage drops:
# .github/workflows/coverage.yml
- name: Check coverage threshold
run: |
COVERAGE=$(python -c "import json; d=json.load(open('coverage.json')); print(int(d['totals']['percent_covered']))")
echo "Coverage: ${COVERAGE}%"
if [ "$COVERAGE" -lt "80" ]; then
echo "coverage_gap=true" >> $GITHUB_OUTPUT
fi
- name: Generate missing tests
if: steps.coverage.outputs.coverage_gap == 'true'
run: python scripts/ai_coverage_gap.py
Validating AI-Generated Tests Before Committing
AI-generated tests can pass coverage metrics while testing the wrong thing — asserting result is not None instead of verifying the actual return value. Before committing generated tests, run a mutation testing check:
pip install mutmut
# Run mutation testing only on AI-generated test files
mutmut run \
--paths-to-mutate src/payment_processor.py \
--tests-dir tests/ai_generated \
--runner "pytest tests/ai_generated -x -q"
mutmut results
Mutation testing modifies your source code in small ways (flipping > to >=, removing a not, etc.) and checks if your tests catch those mutations. AI-generated tests often miss mutations in boundary conditions — this surfaces them before you ship.
A mutation survival rate above 30% signals the generated tests need manual review and strengthening.
Prioritizing Coverage by Risk
Not all uncovered code is equally important. A 40%-covered error handler in your payment flow is far more dangerous than a 40%-covered utility formatter. Add a risk weighting step to the gap analysis:
HIGH_RISK_PATTERNS = [
"payment", "auth", "refund", "delete", "admin",
"permission", "credential", "token", "charge"
]
def score_gap_risk(gap: dict) -> float:
base_score = 1.0 - gap["coverage"] # more uncovered = higher score
risk_multiplier = 1.0
# Boost risk score for high-value paths
name_lower = (gap["function"] + gap["file"]).lower()
if any(pattern in name_lower for pattern in HIGH_RISK_PATTERNS):
risk_multiplier = 3.0
# Boost for functions with many uncovered branches
if len(gap["missing_lines"]) > 10:
risk_multiplier *= 1.5
return base_score * risk_multiplier
def prioritize_gaps(gaps: list[dict]) -> list[dict]:
for gap in gaps:
gap["risk_score"] = score_gap_risk(gap)
return sorted(gaps, key=lambda x: x["risk_score"], reverse=True)
Running the AI test generator against the top 10 risk-weighted gaps rather than the bottom 10 coverage gaps produces tests with far more security value.
Tracking Coverage Trends Over Time
Coverage percentage is a lagging indicator. What matters is whether coverage is trending up or down as new code gets merged. Store historical coverage data and alert on drops:
# .github/workflows/coverage-trend.yml
- name: Store coverage snapshot
run: |
COVERAGE=$(python -c "
import json
d = json.load(open('coverage.json'))
print(int(d['totals']['percent_covered']))
")
DATE=$(date +%Y-%m-%d)
echo "${DATE},${COVERAGE}" >> coverage-history.csv
git add coverage-history.csv
git commit -m "chore: coverage snapshot ${DATE} (${COVERAGE}%)" || true
- name: Alert on coverage drop
run: |
PREV=$(tail -2 coverage-history.csv | head -1 | cut -d',' -f2)
CURR=$(tail -1 coverage-history.csv | cut -d',' -f2)
if [ "$((CURR - PREV))" -lt "-3" ]; then
echo "::error::Coverage dropped by more than 3% (${PREV}% → ${CURR}%)"
exit 1
fi
This creates an auditable record of coverage health alongside your commit history, making it easy to correlate coverage drops with specific feature branches.
Related Reading
- Claude Code Coverage Reporting Setup Guide
- AI Tools for Automated PR Description Generation
- Best AI Tools for Writing Playwright Tests
- AI-Powered Code Translation Tools Compared
Related Articles
- AI Tools for Automated Code Coverage Reporting
- Claude Code Coverage Reporting Setup Guide
- Best AI Tools for Generating Unit Tests
- Best AI Assistant for QA Engineers Writing Test Coverage
- Best AI Tools for Generating Unit Tests 2026
Built by theluckystrike — More at zovo.one