Technical debt accumulates in every codebase. AI tools don’t eliminate debt, but they can accelerate the three stages that matter: identifying it systematically, prioritizing what to fix, and generating the refactored code. This guide covers practical workflows for each stage.
Stage 1: Identification
AI-assisted debt identification goes beyond what linters catch. Linters find style violations; AI identifies architectural problems, outdated patterns, and code that works but was written before modern idioms existed.
Codebase Audit Prompt
Paste a file or module into Claude with this prompt:
Analyze this code for technical debt. Categorize each issue as:
1. CRITICAL - likely to cause bugs, security issues, or major maintenance problems
2. IMPORTANT - makes the code harder to maintain or extend
3. MINOR - style or modernization issues that don't affect function
For each issue include:
- The specific line(s) or function
- Why it's a problem
- Effort to fix: Low (< 1 hour), Medium (1-8 hours), High (> 1 day)
- Risk to fix: Low (isolated change), Medium (affects callers), High (system-wide impact)
This produces a structured list you can add directly to your issue tracker.
Automated Scanning Script
For a large codebase, run AI analysis across multiple files systematically:
#!/usr/bin/env python3
# scripts/debt_audit.py
import anthropic
from pathlib import Path
import json
from datetime import datetime
client = anthropic.Anthropic()
DEBT_PROMPT = """Analyze this {language} file for technical debt.
Output JSON only, no explanation:
{{
"issues": [
{{
"line": <line_number_or_null>,
"function": "<function_name_or_null>",
"category": "CRITICAL|IMPORTANT|MINOR",
"description": "<what the problem is>",
"fix_effort": "LOW|MEDIUM|HIGH",
"fix_risk": "LOW|MEDIUM|HIGH"
}}
],
"debt_score": <1-10, where 10 is worst>
}}"""
def audit_file(file_path: Path) -> dict:
extension_map = {
'.py': 'Python', '.ts': 'TypeScript', '.js': 'JavaScript',
'.go': 'Go', '.java': 'Java', '.cs': 'C#'
}
language = extension_map.get(file_path.suffix, 'code')
content = file_path.read_text()
if len(content) < 50: # Skip trivial files
return None
response = client.messages.create(
model='claude-haiku-4-5', # Fast and cheap for bulk scanning
max_tokens=1024,
messages=[{
'role': 'user',
'content': f"{DEBT_PROMPT.format(language=language)}\n\nFile: {file_path}\n\n{content}"
}]
)
try:
json_text = response.content[0].text
if '```json' in json_text:
json_text = json_text.split('```json')[1].split('```')[0]
result = json.loads(json_text)
result['file'] = str(file_path)
return result
except json.JSONDecodeError:
return {'file': str(file_path), 'issues': [], 'debt_score': 0, 'parse_error': True}
def audit_directory(root: str, extensions: list[str] = ['.py', '.ts']):
results = []
files = [p for ext in extensions for p in Path(root).rglob(f'*{ext}')
if 'node_modules' not in str(p) and '.git' not in str(p)]
print(f"Auditing {len(files)} files...")
for i, f in enumerate(files):
print(f" [{i+1}/{len(files)}] {f}")
result = audit_file(f)
if result:
results.append(result)
return results
def generate_report(results: list[dict]) -> str:
critical = sum(1 for r in results for i in r.get('issues', []) if i['category'] == 'CRITICAL')
important = sum(1 for r in results for i in r.get('issues', []) if i['category'] == 'IMPORTANT')
high_debt = sorted(results, key=lambda r: r.get('debt_score', 0), reverse=True)[:10]
report = f"""# Technical Debt Audit — {datetime.now().strftime('%Y-%m-%d')}
## Summary
- Files scanned: {len(results)}
- Critical issues: {critical}
- Important issues: {important}
- Average debt score: {sum(r.get('debt_score', 0) for r in results) / len(results):.1f}/10
## Top 10 Most Indebted Files
"""
for r in high_debt:
report += f"\n### {r['file']} (Score: {r.get('debt_score', 0)}/10)\n"
for issue in r.get('issues', []):
report += f"- [{issue['category']}] {issue['description']} "
report += f"(Effort: {issue['fix_effort']}, Risk: {issue['fix_risk']})\n"
return report
if __name__ == '__main__':
results = audit_directory('./src')
report = generate_report(results)
Path('debt-audit.md').write_text(report)
print(f"\nReport written to debt-audit.md")
Run this on a 200-file codebase in about 10 minutes at ~$2-3 in API costs with Claude Haiku.
Stage 2: Prioritization
Raw debt lists are not actionable. Prioritize by combining impact and effort:
def prioritize_debt(audit_results: list[dict]) -> list[dict]:
"""Score each issue by bang-for-buck: high impact, low effort, low risk."""
impact_score = {'CRITICAL': 3, 'IMPORTANT': 2, 'MINOR': 1}
effort_cost = {'LOW': 1, 'MEDIUM': 2, 'HIGH': 4}
risk_penalty = {'LOW': 0, 'MEDIUM': 1, 'HIGH': 3}
prioritized = []
for result in audit_results:
for issue in result.get('issues', []):
score = (
impact_score.get(issue['category'], 0) /
(effort_cost.get(issue['fix_effort'], 2) + risk_penalty.get(issue['fix_risk'], 1))
)
prioritized.append({
**issue,
'file': result['file'],
'priority_score': round(score, 2)
})
return sorted(prioritized, key=lambda x: x['priority_score'], reverse=True)
The prioritization formula rewards: high category issues that are easy to fix with low risk. A CRITICAL issue that takes 30 minutes and affects only one file scores higher than an IMPORTANT issue that touches 20 files.
Stage 3: Fixing with AI Assistance
For identified debt items, AI generates the refactored code:
def generate_fix(file_path: str, issue: dict) -> str:
content = Path(file_path).read_text()
response = client.messages.create(
model='claude-sonnet-4-5', # Use better model for actual fixes
max_tokens=3000,
messages=[{
'role': 'user',
'content': f"""Fix this technical debt issue in the file below.
Issue: {issue['description']}
Location: {issue.get('function') or f'line {issue.get("line")}'}
Category: {issue['category']}
Requirements:
- Fix ONLY this specific issue, don't refactor anything else
- Maintain the same public API and function signatures
- Keep all existing tests passing
- Use the same style as the surrounding code
File: {file_path}
{content}
Return the complete updated file."""
}]
)
return response.content[0].text
The “fix ONLY this specific issue” instruction prevents AI from over-refactoring, which introduces risk.
Measuring Progress
Track debt reduction over time:
# Add to CI pipeline — fail if debt score exceeds threshold
python scripts/debt_audit.py --output json | \
python -c "
import sys, json
data = json.load(sys.stdin)
critical = sum(1 for r in data for i in r.get('issues',[]) if i['category']=='CRITICAL')
if critical > 0:
print(f'FAIL: {critical} critical debt issues found')
sys.exit(1)
print(f'OK: 0 critical issues')
"
Set a zero-critical-debt policy: new code can’t introduce critical technical debt. Existing debt is tracked and paid down sprint by sprint.
The 20% Time Model
The most successful teams allocate 20% of each sprint to debt reduction. AI makes this viable because:
- AI identifies debt faster than manual review
- AI generates fixes faster than manual refactoring
- The remaining human time goes to review and testing, not writing
A developer who would previously fix 2 debt items in a sprint can address 6-8 with AI assistance.
Related Reading
- How to Use AI Coding Assistants for Technical Debt Reduction
- AI Code Review Automation Tools Comparison
- Free AI Tools for Code Refactoring That Actually Improve Quality
Built by theluckystrike — More at zovo.one