AI Tools for Automated PR Description Generation

A good PR description saves your team hours of archaeology. A bad one — “fixed stuff” — forces reviewers to read every line. AI tools can auto-generate descriptions from diffs, but the quality varies wildly depending on how the tool processes context.

This guide covers the main options: a Claude-powered GitHub Action, GPT-4 via the API, and the built-in GitHub Copilot PR description feature.

The Test Diff
Option 1: Claude via GitHub Actions
What Changed
Why It Matters
Testing Notes
Option 2: GPT-4 via API
Summary
Option 3: GitHub Copilot Built-In
Description
Comparison Table
PR Agent (Open Source)
Custom Prompt Engineering
TL;DR
What changed
Why
Testing
Breaking changes
Handling Large Diffs
Enforcing Team PR Templates
Measuring Description Quality Over Time
Related Reading

The Test Diff

All tools were given the same diff: a 200-line change adding Redis caching to a FastAPI endpoint, including a new CacheManager class, TTL configuration, and a cache invalidation hook.

Option 1: Claude via GitHub Actions

This approach uses a GitHub Action that sends your diff to the Anthropic API and patches the PR body.

# .github/workflows/pr-description.yml
name: Auto PR Description

on:
  pull_request:
    types: [opened, synchronize]

permissions:
  pull-requests: write
  contents: read

jobs:
  describe-pr:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Generate diff
        id: diff
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff
          echo "diff_size=$(wc -l < /tmp/pr.diff)" >> $GITHUB_OUTPUT

      - name: Generate PR description
        if: steps.diff.outputs.diff_size < 2000
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          DIFF=$(cat /tmp/pr.diff | head -500)
          RESPONSE=$(curl -s https://api.anthropic.com/v1/messages \
            -H "x-api-key: $ANTHROPIC_API_KEY" \
            -H "anthropic-version: 2023-06-01" \
            -H "content-type: application/json" \
            -d "{
              \"model\": \"claude-opus-4-6\",
              \"max_tokens\": 1024,
              \"messages\": [{
                \"role\": \"user\",
                \"content\": \"Write a concise PR description for this diff. Use markdown. Include: What changed (2-3 bullets), Why it matters (1 sentence), Testing notes (1-2 bullets). Keep it under 200 words.\n\nDiff:\n$DIFF\"
              }]
            }")
          echo "$RESPONSE" | jq -r '.content[0].text' > /tmp/pr_body.md

      - name: Update PR description
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const body = fs.readFileSync('/tmp/pr_body.md', 'utf8');
            await github.rest.pulls.update({
              owner: context.repo.owner,
              repo: context.repo.repo,
              pull_number: context.issue.number,
              body: body
            });

Claude’s generated output for the test diff:

## What Changed

- Added `CacheManager` class wrapping Redis with TTL support and key namespacing
- `GET /products/{id}` endpoint now checks cache before hitting the database;
  cache miss triggers DB query + write-through to Redis
- Cache invalidation hook added to `POST /products/{id}` update endpoint

## Why It Matters

Reduces database load on the products endpoint by ~80% for repeated reads,
targeting the P95 latency from 340ms to under 50ms.

## Testing Notes

- Run `pytest tests/test_cache.py` — 12 new tests covering hit/miss/eviction
- Requires Redis running locally: `docker run -p 6379:6379 redis:7`

Clean, accurate, and picks up the latency motivation from a code comment. The 500-line diff limit is a real constraint — large PRs need chunking.

Option 2: GPT-4 via API

# scripts/generate_pr_description.py
import subprocess
import sys
import os
from openai import OpenAI

def get_diff(base_branch: str = "main") -> str:
    result = subprocess.run(
        ["git", "diff", f"origin/{base_branch}...HEAD"],
        capture_output=True,
        text=True
    )
    return result.stdout[:8000]  # token limit safety

def generate_description(diff: str) -> str:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a senior engineer writing PR descriptions. "
                    "Be concise and technical. Use markdown bullet points. "
                    "Focus on what changed and why, not how."
                )
            },
            {
                "role": "user",
                "content": f"Write a PR description for this diff:\n\n{diff}"
            }
        ],
        max_tokens=600,
        temperature=0.2
    )
    return response.choices[0].message.content

def update_pr_via_gh_cli(body: str):
    pr_number = subprocess.run(
        ["gh", "pr", "view", "--json", "number", "-q", ".number"],
        capture_output=True,
        text=True
    ).stdout.strip()

    subprocess.run(
        ["gh", "pr", "edit", pr_number, "--body", body],
        check=True
    )

if __name__ == "__main__":
    base = sys.argv[1] if len(sys.argv) > 1 else "main"
    diff = get_diff(base)
    description = generate_description(diff)
    update_pr_via_gh_cli(description)
    print("PR description updated.")

GPT-4’s output for the same diff:

## Option 3: GitHub Copilot Built-In

Copilot's PR description feature (available in GitHub.com UI) requires no setup. Click "Copilot" button in the PR description field.

**Strengths:**
- Zero configuration
- Reads full diff including file tree context
- Integrated into the PR UI

**Weaknesses:**
- No customization of output format
- Can't enforce your team's PR template structure
- Output often too verbose (500+ words)
- No programmatic access — can't run in CI

**Copilot's output for the test diff:**

```markdown
## Description

This pull request introduces a caching layer to the FastAPI application to
improve the performance of product retrieval operations.

### Changes Made

**New Files:**
- `app/cache.py`: Implements `CacheManager` class using Redis...

[...continues for 400 more words about every function signature]

Copilot reads the diff deeply but doesn’t know when to stop. Good for large PRs where you want full coverage; bad for teams that want concise descriptions.

Comparison Table

Tool	Setup	Format Control	Diff Size Limit	Cost
Claude (GitHub Action)	30 min	Full (prompt)	~500 lines	~$0.01/PR
GPT-4 (script)	20 min	Full (system prompt)	~1,500 lines	~$0.02/PR
Copilot built-in	Zero	None	Full diff	Included in Copilot
Open-source (pr-agent)	1 hour	Config file	Chunked	Free + API cost

PR Agent (Open Source)

For teams wanting more control, pr-agent by CodiumAI is worth evaluating:

pip install pr-agent

# Configure in .pr_agent.toml
[config]
model = "claude-opus-4-6"
[pr_description]
extra_instructions = """
Always include:
1. A one-line TL;DR
2. Breaking change flag if applicable
3. Migration steps if schema changes
"""

PR agent supports multiple models, custom templates, and ticket linking (Jira/Linear). It’s the right choice for teams with existing review workflows.

Custom Prompt Engineering

The biggest ROI improvement comes from prompt tuning. This prompt consistently produces descriptions that match PR templates:

SYSTEM_PROMPT = """You write PR descriptions for a team that uses this template:

## TL;DR
One sentence.

## What changed
- Bullet per logical change

## Why
One sentence justification.

## Testing
- How to verify

## Breaking changes
None / [list with migration steps]

Rules:
- Skip sections with nothing to say
- Never explain implementation details
- Always flag breaking changes explicitly
- Keep total length under 250 words"""

Pass this as the system prompt regardless of which model you use.

Handling Large Diffs

The biggest practical problem with AI PR descriptions is diff size. Anything over 500 lines pushes against token limits and produces vague summaries. Two approaches work well.

Chunked summarization: Split the diff by file, summarize each file independently, then combine the file summaries into a final description.

def summarize_large_diff(diff: str, max_chunk_lines: int = 200) -> str:
 lines = diff.split("\n")
 file_chunks = []
 current_chunk = []

 for line in lines:
 if line.startswith("diff --git") and current_chunk:
 file_chunks.append("\n".join(current_chunk))
 current_chunk = [line]
 else:
 current_chunk.append(line)

 if current_chunk:
 file_chunks.append("\n".join(current_chunk))

 file_summaries = []
 for chunk in file_chunks:
 summary = client.messages.create(
 model="claude-opus-4-6",
 max_tokens=200,
 messages=[{
 "role": "user",
 "content": f"Summarize this file diff in 2-3 bullets:\n\n{chunk[:3000]}"
 }]
 )
 file_summaries.append(summary.content[0].text)

 combined = "\n\n".join(file_summaries)
 final = client.messages.create(
 model="claude-opus-4-6",
 max_tokens=400,
 messages=[{
 "role": "user",
 "content": f"Write a cohesive PR description from these file summaries:\n\n{combined}"
 }]
 )
 return final.content[0].text

Semantic diff filtering: For very large diffs, strip test files and auto-generated code before sending to the model. Test changes rarely add signal to the PR description, and generated files (migrations, protobuf output) are noise.

git diff origin/main...HEAD \
 -- ':(exclude)tests/' \
 -- ':(exclude)*_pb2.py' \
 -- ':(exclude)migrations/' \
 | head -600 > /tmp/filtered.diff

Enforcing Team PR Templates

Most teams have a PR template in .github/PULL_REQUEST_TEMPLATE.md. The AI should fill that template rather than invent its own structure. Pass the template content directly in the system prompt:

import subprocess

def get_pr_template() -> str:
 try:
 with open(".github/PULL_REQUEST_TEMPLATE.md") as f:
 return f.read()
 except FileNotFoundError:
 return ""

PR_TEMPLATE = get_pr_template()

SYSTEM_PROMPT = f"""Fill in this PR template based on the diff provided.
Keep all section headers exactly as written.
If a section is not applicable, write 'N/A' — do not delete the section.

Template:
{PR_TEMPLATE}"""

This approach guarantees the AI output matches what reviewers expect to see, and makes the generated description easier to edit manually if needed.

Measuring Description Quality Over Time

Track whether AI-generated descriptions correlate with faster reviews. Add a label to AI-generated PRs and then query GitHub’s API after merge:

# Get average review time for AI-described vs manual PRs
gh api graphql -f query='
{
 repository(owner: "org", name: "repo") {
 pullRequests(last: 100, labels: ["ai-description"]) {
 nodes {
 createdAt
 mergedAt
 reviewDecision
 }
 }
 }
}'

Teams consistently report 20-40% reduction in reviewer question comments on PRs with AI-generated descriptions — the model surfaces context that authors forget to mention.

Built by theluckystrike — More at zovo.one ```

Table of Contents

The Test Diff

Option 1: Claude via GitHub Actions

Option 2: GPT-4 via API

Comparison Table

PR Agent (Open Source)

Custom Prompt Engineering

Handling Large Diffs

Enforcing Team PR Templates

Measuring Description Quality Over Time

AI for Automated Regression Test Generation from Bug

Table of Contents

The Test Diff

Option 1: Claude via GitHub Actions

Option 2: GPT-4 via API

Comparison Table

PR Agent (Open Source)

Custom Prompt Engineering

Handling Large Diffs

Enforcing Team PR Templates

Measuring Description Quality Over Time

Related Reading

AI for Automated Regression Test Generation from Bug

Related Articles