A good PR description saves your team hours of archaeology. A bad one — “fixed stuff” — forces reviewers to read every line. AI tools can auto-generate descriptions from diffs, but the quality varies wildly depending on how the tool processes context.
This guide covers the main options: a Claude-powered GitHub Action, GPT-4 via the API, and the built-in GitHub Copilot PR description feature.
Table of Contents
- The Test Diff
- Option 1: Claude via GitHub Actions
- What Changed
- Why It Matters
- Testing Notes
- Option 2: GPT-4 via API
- Summary
- Option 3: GitHub Copilot Built-In
- Description
- Comparison Table
- PR Agent (Open Source)
- Custom Prompt Engineering
- TL;DR
- What changed
- Why
- Testing
- Breaking changes
- Handling Large Diffs
- Enforcing Team PR Templates
- Measuring Description Quality Over Time
- Related Reading
The Test Diff
All tools were given the same diff: a 200-line change adding Redis caching to a FastAPI endpoint, including a new CacheManager class, TTL configuration, and a cache invalidation hook.
Option 1: Claude via GitHub Actions
This approach uses a GitHub Action that sends your diff to the Anthropic API and patches the PR body.
# .github/workflows/pr-description.yml
name: Auto PR Description
on:
pull_request:
types: [opened, synchronize]
permissions:
pull-requests: write
contents: read
jobs:
describe-pr:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Generate diff
id: diff
run: |
git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
echo "diff_size=$(wc -l < /tmp/pr.diff)" >> $GITHUB_OUTPUT
- name: Generate PR description
if: steps.diff.outputs.diff_size < 2000
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
DIFF=$(cat /tmp/pr.diff | head -500)
RESPONSE=$(curl -s https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d "{
\"model\": \"claude-opus-4-6\",
\"max_tokens\": 1024,
\"messages\": [{
\"role\": \"user\",
\"content\": \"Write a concise PR description for this diff. Use markdown. Include: What changed (2-3 bullets), Why it matters (1 sentence), Testing notes (1-2 bullets). Keep it under 200 words.\n\nDiff:\n$DIFF\"
}]
}")
echo "$RESPONSE" | jq -r '.content[0].text' > /tmp/pr_body.md
- name: Update PR description
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const body = fs.readFileSync('/tmp/pr_body.md', 'utf8');
await github.rest.pulls.update({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: context.issue.number,
body: body
});
Claude’s generated output for the test diff:
## What Changed
- Added `CacheManager` class wrapping Redis with TTL support and key namespacing
- `GET /products/{id}` endpoint now checks cache before hitting the database;
cache miss triggers DB query + write-through to Redis
- Cache invalidation hook added to `POST /products/{id}` update endpoint
## Why It Matters
Reduces database load on the products endpoint by ~80% for repeated reads,
targeting the P95 latency from 340ms to under 50ms.
## Testing Notes
- Run `pytest tests/test_cache.py` — 12 new tests covering hit/miss/eviction
- Requires Redis running locally: `docker run -p 6379:6379 redis:7`
Clean, accurate, and picks up the latency motivation from a code comment. The 500-line diff limit is a real constraint — large PRs need chunking.
Option 2: GPT-4 via API
# scripts/generate_pr_description.py
import subprocess
import sys
import os
from openai import OpenAI
def get_diff(base_branch: str = "main") -> str:
result = subprocess.run(
["git", "diff", f"origin/{base_branch}...HEAD"],
capture_output=True,
text=True
)
return result.stdout[:8000] # token limit safety
def generate_description(diff: str) -> str:
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": (
"You are a senior engineer writing PR descriptions. "
"Be concise and technical. Use markdown bullet points. "
"Focus on what changed and why, not how."
)
},
{
"role": "user",
"content": f"Write a PR description for this diff:\n\n{diff}"
}
],
max_tokens=600,
temperature=0.2
)
return response.choices[0].message.content
def update_pr_via_gh_cli(body: str):
pr_number = subprocess.run(
["gh", "pr", "view", "--json", "number", "-q", ".number"],
capture_output=True,
text=True
).stdout.strip()
subprocess.run(
["gh", "pr", "edit", pr_number, "--body", body],
check=True
)
if __name__ == "__main__":
base = sys.argv[1] if len(sys.argv) > 1 else "main"
diff = get_diff(base)
description = generate_description(diff)
update_pr_via_gh_cli(description)
print("PR description updated.")
GPT-4’s output for the same diff:
## Option 3: GitHub Copilot Built-In
Copilot's PR description feature (available in GitHub.com UI) requires no setup. Click "Copilot" button in the PR description field.
**Strengths:**
- Zero configuration
- Reads full diff including file tree context
- Integrated into the PR UI
**Weaknesses:**
- No customization of output format
- Can't enforce your team's PR template structure
- Output often too verbose (500+ words)
- No programmatic access — can't run in CI
**Copilot's output for the test diff:**
```markdown
## Description
This pull request introduces a caching layer to the FastAPI application to
improve the performance of product retrieval operations.
### Changes Made
**New Files:**
- `app/cache.py`: Implements `CacheManager` class using Redis...
[...continues for 400 more words about every function signature]
Copilot reads the diff deeply but doesn’t know when to stop. Good for large PRs where you want full coverage; bad for teams that want concise descriptions.
Comparison Table
| Tool | Setup | Format Control | Diff Size Limit | Cost |
|---|---|---|---|---|
| Claude (GitHub Action) | 30 min | Full (prompt) | ~500 lines | ~$0.01/PR |
| GPT-4 (script) | 20 min | Full (system prompt) | ~1,500 lines | ~$0.02/PR |
| Copilot built-in | Zero | None | Full diff | Included in Copilot |
| Open-source (pr-agent) | 1 hour | Config file | Chunked | Free + API cost |
PR Agent (Open Source)
For teams wanting more control, pr-agent by CodiumAI is worth evaluating:
pip install pr-agent
# Configure in .pr_agent.toml
[config]
model = "claude-opus-4-6"
[pr_description]
extra_instructions = """
Always include:
1. A one-line TL;DR
2. Breaking change flag if applicable
3. Migration steps if schema changes
"""
PR agent supports multiple models, custom templates, and ticket linking (Jira/Linear). It’s the right choice for teams with existing review workflows.
Custom Prompt Engineering
The biggest ROI improvement comes from prompt tuning. This prompt consistently produces descriptions that match PR templates:
SYSTEM_PROMPT = """You write PR descriptions for a team that uses this template:
## TL;DR
One sentence.
## What changed
- Bullet per logical change
## Why
One sentence justification.
## Testing
- How to verify
## Breaking changes
None / [list with migration steps]
Rules:
- Skip sections with nothing to say
- Never explain implementation details
- Always flag breaking changes explicitly
- Keep total length under 250 words"""
Pass this as the system prompt regardless of which model you use.
Handling Large Diffs
The biggest practical problem with AI PR descriptions is diff size. Anything over 500 lines pushes against token limits and produces vague summaries. Two approaches work well.
Chunked summarization: Split the diff by file, summarize each file independently, then combine the file summaries into a final description.
def summarize_large_diff(diff: str, max_chunk_lines: int = 200) -> str:
lines = diff.split("\n")
file_chunks = []
current_chunk = []
for line in lines:
if line.startswith("diff --git") and current_chunk:
file_chunks.append("\n".join(current_chunk))
current_chunk = [line]
else:
current_chunk.append(line)
if current_chunk:
file_chunks.append("\n".join(current_chunk))
file_summaries = []
for chunk in file_chunks:
summary = client.messages.create(
model="claude-opus-4-6",
max_tokens=200,
messages=[{
"role": "user",
"content": f"Summarize this file diff in 2-3 bullets:\n\n{chunk[:3000]}"
}]
)
file_summaries.append(summary.content[0].text)
combined = "\n\n".join(file_summaries)
final = client.messages.create(
model="claude-opus-4-6",
max_tokens=400,
messages=[{
"role": "user",
"content": f"Write a cohesive PR description from these file summaries:\n\n{combined}"
}]
)
return final.content[0].text
Semantic diff filtering: For very large diffs, strip test files and auto-generated code before sending to the model. Test changes rarely add signal to the PR description, and generated files (migrations, protobuf output) are noise.
git diff origin/main...HEAD \
-- ':(exclude)tests/' \
-- ':(exclude)*_pb2.py' \
-- ':(exclude)migrations/' \
| head -600 > /tmp/filtered.diff
Enforcing Team PR Templates
Most teams have a PR template in .github/PULL_REQUEST_TEMPLATE.md. The AI should fill that template rather than invent its own structure. Pass the template content directly in the system prompt:
import subprocess
def get_pr_template() -> str:
try:
with open(".github/PULL_REQUEST_TEMPLATE.md") as f:
return f.read()
except FileNotFoundError:
return ""
PR_TEMPLATE = get_pr_template()
SYSTEM_PROMPT = f"""Fill in this PR template based on the diff provided.
Keep all section headers exactly as written.
If a section is not applicable, write 'N/A' — do not delete the section.
Template:
{PR_TEMPLATE}"""
This approach guarantees the AI output matches what reviewers expect to see, and makes the generated description easier to edit manually if needed.
Measuring Description Quality Over Time
Track whether AI-generated descriptions correlate with faster reviews. Add a label to AI-generated PRs and then query GitHub’s API after merge:
# Get average review time for AI-described vs manual PRs
gh api graphql -f query='
{
repository(owner: "org", name: "repo") {
pullRequests(last: 100, labels: ["ai-description"]) {
nodes {
createdAt
mergedAt
reviewDecision
}
}
}
}'
Teams consistently report 20-40% reduction in reviewer question comments on PRs with AI-generated descriptions — the model surfaces context that authors forget to mention.
Related Reading
- Best AI Tools for Writing GitHub Actions Workflows
- AI Tools for Automated Code Documentation Generation
- Claude vs ChatGPT for Technical Writing
-
AI for Automated Regression Test Generation from Bug
Related Articles
- AI Tools for Automated Code Documentation Generation in 2026
- AI Tools for Generating GitHub Actions Workflows (2)
- AI Tools for Generating GitHub Actions Workflows
- AI Tools for Automated Schema Validation
- Best AI Tools for Automated Code Review 2026
Built by theluckystrike — More at zovo.one ```