Database schema diffing is more than ALTER TABLE generation. A good schema diff tool identifies breaking changes, flags unsafe migrations (locking issues, large table rewrites), and generates rollback scripts. AI adds a layer on top: natural language explanation of what changed and why it might be dangerous. This guide covers Atlas, pgcmp, and building a Claude-powered schema diff pipeline.
What AI Adds to Schema Diffing
Traditional schema diff tools tell you what changed. AI tells you:
- Whether the migration is safe to run online (zero-downtime) or requires maintenance window
- Which application queries will break after the change
- Whether indexes need to be added before or after column changes
- What the rollback strategy should be
- Lock acquisition estimates for large table operations
Atlas (ariga.io)
Atlas is an open-source schema management tool with a built-in diff engine and an AI explanation layer (Atlas Cloud).
Basic schema diff:
# Compare current DB schema to desired state
atlas schema diff \
--from "postgres://user:pass@localhost/prod_db" \
--to "file://schema.hcl" \
--dev-url "docker://postgres/15/dev"
Atlas HCL schema definition:
# schema.hcl
table "users" {
schema = schema.public
column "id" {
type = bigserial
}
column "email" {
type = varchar(255)
null = false
}
column "email_verified_at" {
type = timestamptz
null = true # New: add email verification
}
primary_key {
columns = [column.id]
}
index "users_email_unique" {
columns = [column.email]
unique = true
}
}
Atlas diff output:
-- Planned Changes:
-- Add column "email_verified_at" to table: "users"
ALTER TABLE "users" ADD COLUMN "email_verified_at" timestamptz NULL;
Atlas marks this as safe — adding a nullable column is a non-blocking operation. It flags unsafe operations like adding a NOT NULL column without a default on a large table.
Atlas Cloud AI explanation (for paid tier):
This migration adds an email_verified_at column to the users table.
Risk: LOW — Adding a nullable column in PostgreSQL takes an exclusive lock
briefly but does not rewrite the table. Safe for online deployment.
If your users table has > 10M rows, consider monitoring lock wait time.
No application queries will break unless you add NOT NULL to this column
in a future migration without a default value.
pgcmp
pgcmp is a PostgreSQL-specific schema comparison tool that produces detailed diff reports. It compares two database schemas and outputs differences in tables, columns, indexes, constraints, and functions.
# Install and compare two Postgres schemas
npm install -g pgcmp
pgcmp \
"postgres://user:pass@localhost/db_v1" \
"postgres://user:pass@localhost/db_v2" \
--output diff-report.json
pgcmp output (JSON excerpt):
{
"tables": {
"added": ["audit_logs"],
"removed": [],
"modified": {
"orders": {
"columns": {
"added": [
{ "name": "cancelled_reason", "type": "text", "nullable": true }
],
"modified": [
{
"name": "status",
"from": "varchar(20)",
"to": "varchar(50)",
"breaking": false
}
]
},
"indexes": {
"added": [
"idx_orders_status_created_at"
]
}
}
}
}
}
pgcmp correctly flags varchar(20) → varchar(50) as non-breaking (increasing varchar length is safe in Postgres). It flags varchar(50) → varchar(20) as breaking.
Building a Claude-Powered Schema Diff Analyzer
For teams without dedicated schema tooling, use Claude to analyze schema diffs and provide migration safety assessments:
# schema_diff_analyzer.py
import anthropic
import subprocess
import json
from typing import NamedTuple
client = anthropic.Anthropic()
class SchemaDiffResult(NamedTuple):
raw_diff: str
safety_assessment: str
migration_sql: str
rollback_sql: str
def get_schema_dump(db_url: str) -> str:
"""Dump schema using pg_dump."""
result = subprocess.run(
["pg_dump", "--schema-only", "--no-owner", "--no-acl", db_url],
capture_output=True, text=True, check=True
)
return result.stdout
def analyze_schema_diff(
old_schema: str,
new_schema: str,
table_row_counts: dict[str, int] | None = None
) -> SchemaDiffResult:
"""Use Claude to analyze schema differences and generate migration plan."""
row_count_context = ""
if table_row_counts:
row_count_context = "Table row counts:\n" + "\n".join(
f" {table}: {count:,} rows"
for table, count in table_row_counts.items()
)
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"""Analyze the differences between these two PostgreSQL schemas.
OLD SCHEMA:
{old_schema[:8000]}
NEW SCHEMA:
{new_schema[:8000]}
{row_count_context}
Provide:
1. SAFETY ASSESSMENT: List each change with risk level (LOW/MEDIUM/HIGH) and reason
2. MIGRATION SQL: The ALTER statements to migrate from old to new schema, in correct order
3. ROLLBACK SQL: The statements to undo the migration
4. ONLINE SAFE: Which changes can run without a maintenance window
5. WARNINGS: Any potential breaking changes for application code
Format your response as JSON with keys: safety_assessment, migration_sql, rollback_sql, warnings"""
}]
)
response_text = message.content[0].text
# Parse JSON response
try:
# Claude sometimes wraps in ```json
if "```json" in response_text:
json_str = response_text.split("```json")[1].split("```")[0]
else:
json_str = response_text
data = json.loads(json_str)
return SchemaDiffResult(
raw_diff=f"Diff between provided schemas",
safety_assessment=data.get("safety_assessment", ""),
migration_sql=data.get("migration_sql", ""),
rollback_sql=data.get("rollback_sql", ""),
)
except json.JSONDecodeError:
# Fall back to raw text
return SchemaDiffResult(
raw_diff="Parse error",
safety_assessment=response_text,
migration_sql="",
rollback_sql="",
)
def check_migration_safety(migration_file: str, db_url: str) -> dict:
"""Check a migration file for unsafe operations."""
migration_sql = open(migration_file).read()
# Get approximate row counts for affected tables
affected_tables = extract_affected_tables(migration_sql)
row_counts = get_table_row_counts(db_url, affected_tables)
message = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
messages=[{
"role": "user",
"content": f"""Review this PostgreSQL migration for safety issues.
Migration SQL:
{migration_sql}
Table sizes:
{json.dumps(row_counts, indent=2)}
Identify:
1. Operations that require full table locks (AccessExclusiveLock)
2. Operations that rewrite the table (sequential scan + write)
3. Estimated lock duration for large tables
4. Safer alternatives (e.g., CREATE INDEX CONCURRENTLY, batched updates)
5. Overall risk: LOW / MEDIUM / HIGH
Be specific about PostgreSQL internals."""
}]
)
return {"analysis": message.content[0].text, "row_counts": row_counts}
def extract_affected_tables(sql: str) -> list[str]:
"""Extract table names from SQL statements."""
import re
tables = re.findall(
r'(?:ALTER TABLE|UPDATE|DROP TABLE|CREATE INDEX ON)\s+"?(\w+)"?',
sql, re.IGNORECASE
)
return list(set(tables))
def get_table_row_counts(db_url: str, tables: list[str]) -> dict[str, int]:
"""Get approximate row counts from pg_stat_user_tables."""
import psycopg2
counts = {}
if not tables:
return counts
try:
conn = psycopg2.connect(db_url)
cur = conn.cursor()
placeholders = ','.join(['%s'] * len(tables))
cur.execute(
f"SELECT relname, n_live_tup FROM pg_stat_user_tables "
f"WHERE relname IN ({placeholders})",
tables
)
counts = dict(cur.fetchall())
conn.close()
except Exception:
pass
return counts
Claude Analysis Example
Input migration:
-- Add NOT NULL column with no default to large table
ALTER TABLE orders ADD COLUMN priority_level integer NOT NULL;
-- Change column type
ALTER TABLE products ALTER COLUMN price TYPE numeric(12,4);
-- Add index concurrently (safe)
CREATE INDEX CONCURRENTLY idx_orders_priority ON orders(priority_level);
Claude’s safety assessment:
{
"safety_assessment": [
{
"operation": "ALTER TABLE orders ADD COLUMN priority_level integer NOT NULL",
"risk": "HIGH",
"reason": "Adding a NOT NULL column without DEFAULT rewrites the entire table in PostgreSQL < 11. In PG 11+, columns with constant defaults avoid table rewrites, but NOT NULL without DEFAULT still requires a full table scan to verify. With 8M rows, this could lock the table for 30-60 seconds.",
"recommendation": "Add column as nullable first, backfill in batches, then add NOT NULL constraint."
},
{
"operation": "ALTER TABLE products ALTER COLUMN price TYPE numeric(12,4)",
"risk": "MEDIUM",
"reason": "Type change from numeric to numeric with higher precision rewrites the table.",
"recommendation": "Use a background migration pattern if the products table is large."
},
{
"operation": "CREATE INDEX CONCURRENTLY idx_orders_priority ON orders(priority_level)",
"risk": "LOW",
"reason": "CONCURRENTLY builds the index without blocking reads or writes.",
"recommendation": "Safe to run online. Cannot run inside a transaction block."
}
]
}
Tool Comparison
| Feature | Atlas | pgcmp | Claude DIY |
|---|---|---|---|
| Schema diff generation | Excellent | Good | Via prompt |
| Safety analysis | Cloud tier | Manual | Excellent |
| Migration SQL generation | Excellent | No | Good |
| Rollback generation | Good | No | Good |
| Lock duration estimates | No | No | Yes |
| Custom rules | Via providers | No | Via prompt |
| Cost | Open source + paid Cloud | Free | API costs |
Related Reading
- AI Tools for Database Schema Migration Review 2026
- How to Use AI for PostgreSQL Query Optimization
- AI Tools for Generating Pytest Fixtures from Database Schema
- AI-Powered Database Migration Tools Comparison 2026
Built by theluckystrike — More at zovo.one