AI Tools Compared

Database performance tuning requires reading query plans, understanding statistics, and knowing which indexes help and which hurt. This is historically expert-only territory — but AI tools have gotten good enough to replace a lot of that expertise for the 80% case. This guide covers practical AI-assisted tuning for PostgreSQL and MySQL.

Table of Contents

The Core Workflow

Identify slow queries (pg_stat_statements / slow query log)
          ↓
Run EXPLAIN ANALYZE
          ↓
AI interpretation → Index recommendations → Query rewrite suggestions
          ↓
Measure improvement (before/after latency)

This loop works for both reactive tuning (fix slow queries in production) and proactive tuning (review new queries before they ship). The AI component fits at the interpretation and recommendation step — not at the measurement step, which requires real query execution.

Prerequisites: Enabling Query Tracking

Before you can identify slow queries, you need the data. For PostgreSQL:

-- Enable pg_stat_statements (requires superuser, restart not needed)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Add to postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'
-- pg_stat_statements.track = all
-- pg_stat_statements.max = 10000

-- Verify it's collecting data
SELECT count(*) FROM pg_stat_statements;

For MySQL, enable the slow query log:

SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.1;  -- Log queries over 100ms
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

-- Or use Performance Schema (MySQL 5.7+)
UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME = 'events_statements_history_long';

Extracting Slow Queries

# query_harvester.py
import psycopg2
import json
from anthropic import Anthropic

client = Anthropic()

def get_slow_queries(conn_string: str, min_mean_ms: float = 100) -> list[dict]:
    """Get the top slow queries from pg_stat_statements."""
    with psycopg2.connect(conn_string) as conn:
        with conn.cursor() as cur:
            cur.execute("""
                SELECT
                    query,
                    calls,
                    round(mean_exec_time::numeric, 2) AS mean_ms,
                    round(total_exec_time::numeric, 2) AS total_ms,
                    round(stddev_exec_time::numeric, 2) AS stddev_ms,
                    rows,
                    shared_blks_hit,
                    shared_blks_read,
                    round(100.0 * shared_blks_hit /
                        NULLIF(shared_blks_hit + shared_blks_read, 0), 1
                    ) AS cache_hit_pct
                FROM pg_stat_statements
                WHERE mean_exec_time > %s
                ORDER BY mean_exec_time DESC
                LIMIT 20
            """, (min_mean_ms,))

            columns = [desc[0] for desc in cur.description]
            return [dict(zip(columns, row)) for row in cur.fetchall()]

def get_explain_plan(conn_string: str, query: str) -> str:
    """Get the EXPLAIN ANALYZE output for a query."""
    # Replace parameter placeholders for analysis
    safe_query = query.replace("$1", "'placeholder'").replace("$2", "'placeholder2'")

    with psycopg2.connect(conn_string) as conn:
        conn.set_session(readonly=True)
        with conn.cursor() as cur:
            cur.execute(f"EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) {safe_query}")
            rows = cur.fetchall()
    return "\n".join(row[0] for row in rows)

AI Query Plan Interpretation

def interpret_query_plan(
    query: str,
    explain_output: str,
    query_stats: dict
) -> str:
    """Have Claude interpret an EXPLAIN ANALYZE output and suggest improvements."""
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1500,
        messages=[{
            "role": "user",
            "content": f"""You are a PostgreSQL performance expert. Analyze this slow query.

Query statistics:
- Mean execution: {query_stats.get('mean_ms')}ms
- Calls: {query_stats.get('calls')}
- Cache hit rate: {query_stats.get('cache_hit_pct')}%
- Rows returned: {query_stats.get('rows', 0) / max(query_stats.get('calls', 1), 1):.1f} avg

SQL:
{query}

EXPLAIN ANALYZE output:
{explain_output}

Provide:
1. ROOT_CAUSE: The primary reason this query is slow (1 sentence)
2. INDEX_SUGGESTIONS: Exact CREATE INDEX statements to add (if applicable)
3. QUERY_REWRITE: Improved SQL if the query structure is inefficient
4. QUICK_WINS: Changes that can be made without schema changes
5. EXPECTED_IMPROVEMENT: Estimated speedup if suggestions are applied"""
        }]
    )
    return response.content[0].text

def batch_analyze_slow_queries(conn_string: str) -> list[dict]:
    """Analyze all slow queries and generate a tuning report."""
    slow_queries = get_slow_queries(conn_string, min_mean_ms=100)
    results = []

    for query_stat in slow_queries[:10]:  # Top 10
        query = query_stat["query"]

        try:
            explain = get_explain_plan(conn_string, query)
        except Exception as e:
            explain = f"Could not EXPLAIN: {e}"

        analysis = interpret_query_plan(query, explain, query_stat)
        results.append({
            "query": query[:200],
            "mean_ms": query_stat["mean_ms"],
            "calls": query_stat["calls"],
            "analysis": analysis
        })

    return results

Index Recommendation Engine

def recommend_indexes(
    conn_string: str,
    table_name: str
) -> str:
    """Analyze a table and recommend missing indexes."""
    with psycopg2.connect(conn_string) as conn:
        with conn.cursor() as cur:
            # Get table structure
            cur.execute("""
                SELECT column_name, data_type, is_nullable
                FROM information_schema.columns
                WHERE table_name = %s
                ORDER BY ordinal_position
            """, (table_name,))
            columns = cur.fetchall()

            # Get existing indexes
            cur.execute("""
                SELECT indexname, indexdef
                FROM pg_indexes
                WHERE tablename = %s
            """, (table_name,))
            indexes = cur.fetchall()

            # Get table size
            cur.execute("""
                SELECT pg_size_pretty(pg_total_relation_size(%s))
            """, (table_name,))
            size = cur.fetchone()[0]

            # Get sequential scan stats (high seq_scan with low idx_scan = missing index)
            cur.execute("""
                SELECT seq_scan, idx_scan, n_live_tup
                FROM pg_stat_user_tables
                WHERE relname = %s
            """, (table_name,))
            stats = cur.fetchone()

    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1200,
        messages=[{
            "role": "user",
            "content": f"""Recommend indexes for this PostgreSQL table.

Table: {table_name}
Size: {size}
Columns: {json.dumps([{"name": c[0], "type": c[1], "nullable": c[2]} for c in columns], indent=2)}

Existing indexes:
{json.dumps([{"name": i[0], "def": i[1]} for i in indexes], indent=2)}

Scan statistics:
- Sequential scans: {stats[0] if stats else 'unknown'}
- Index scans: {stats[1] if stats else 'unknown'}
- Live rows: {stats[2] if stats else 'unknown'}

Based on column types and naming patterns (foreign keys, timestamps, status fields):
1. List indexes that are likely missing
2. Provide exact CREATE INDEX CONCURRENTLY statements
3. Flag any existing indexes that might be redundant
4. Note if the seq_scan:idx_scan ratio suggests a missing index"""
        }]
    )
    return response.content[0].text

Reading EXPLAIN Output: What the AI Sees

To understand why AI interpretation helps, here’s what a typical slow query plan looks like:

Gather  (cost=1000.00..45231.43 rows=1 width=208) (actual time=18241.321..18241.323 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on orders  (cost=0.00..44231.33 rows=1 width=208) (actual time=18238.471..18238.471 rows=0 loops=3)
        Filter: ((status = 'pending'::text) AND (created_at > '2026-01-01'::date))
        Rows Removed by Filter: 1483721

A developer without database experience sees numbers. Claude sees: “Sequential scan on 1.5 million rows removing almost all of them — this table needs a composite index on (status, created_at). The parallel workers are compensating but the fundamental problem is that there’s no index on status.”

The natural language output gets developers to the right fix without requiring them to memorize query plan node types.

Query Rewrite Examples

AI tools are particularly good at rewriting anti-patterns. Give Claude these and it consistently fixes them:

REWRITE_EXAMPLES = """
-- Anti-pattern 1: SELECT * with unnecessary columns
SELECT * FROM orders WHERE user_id = $1;

-- Anti-pattern 2: LIKE with leading wildcard (full table scan)
SELECT * FROM products WHERE name LIKE '%bluetooth%';

-- Anti-pattern 3: Non-sargable predicate
SELECT * FROM orders WHERE YEAR(created_at) = 2026;

-- Anti-pattern 4: Correlated subquery in SELECT
SELECT
    o.id,
    (SELECT COUNT(*) FROM items WHERE items.order_id = o.id) AS item_count
FROM orders o;

-- Anti-pattern 5: IN with large subquery
SELECT * FROM users WHERE id IN (
    SELECT user_id FROM orders WHERE status = 'pending'
);
"""

def rewrite_anti_patterns(sql_batch: str) -> str:
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""Rewrite these SQL queries to fix common performance anti-patterns.
For each query, show the original and the rewrite with a one-line explanation.

{sql_batch}

Rules:
- Prefer EXISTS over IN for subqueries
- Use specific columns instead of SELECT *
- Replace non-sargable predicates with index-friendly equivalents
- Convert correlated subqueries to JOINs where possible
- Use CTEs for readability when joins get complex"""
        }]
    )
    return response.content[0].text

For the leading wildcard case, Claude typically recommends a full-text search index (CREATE INDEX USING gin(to_tsvector('english', name))) along with the rewritten query using @@ instead of LIKE. This is a non-obvious fix that most developers need to look up — the AI surfaces it immediately.

Automated Slow Query Report

def generate_tuning_report(conn_string: str, output_path: str = "db_tuning_report.md"):
    """Generate a full database performance report."""
    analyses = batch_analyze_slow_queries(conn_string)

    with open(output_path, "w") as f:
        f.write("# Database Performance Tuning Report\n\n")
        f.write(f"Generated: {__import__('datetime').date.today()}\n\n")

        for i, item in enumerate(analyses, 1):
            f.write(f"## Query #{i}{item['mean_ms']}ms avg ({item['calls']} calls)\n\n")
            f.write(f"```sql\n{item['query']}\n```\n\n")
            f.write(item["analysis"])
            f.write("\n\n---\n\n")

    print(f"Report written to {output_path}")

if __name__ == "__main__":
    import os
    generate_tuning_report(os.environ["DATABASE_URL"])

Schedule this as a weekly cron job and route the report to a Slack channel. Teams that do this consistently find performance regressions during the week they’re introduced rather than after they’ve affected enough users to appear in support tickets.

When AI Recommendations Fall Short

AI tuning tools have two common failure modes worth knowing.

Write-heavy tables: Index recommendations from AI tools optimize for read performance. On a table with 10,000 writes per second, adding an index on every filtered column will slow writes enough to create a different kind of incident. Always check n_dead_tup and last_autovacuum alongside index recommendations for high-write tables.

Optimizer statistics mismatch: If ANALYZE hasn’t run recently, the query planner is working from stale statistics. AI interpretation of the plan will be correct but the recommendations may not match production behavior. Run ANALYZE table_name before capturing EXPLAIN output for analysis.

Tool Comparison

Tool Query Analysis Index Recs Plan Interpretation Cost
Claude (API) Excellent Yes, with context Full narrative ~$0.02/query
GPT-4 Good Yes Good ~$0.03/query
PgHero Visual only Basic No Free/paid
pganalyze Automated Good Limited $149+/mo
OtterTune ML-based Yes No narrative $400+/mo

Claude’s advantage is explaining why a query is slow in plain language — useful for developers who aren’t database experts. pganalyze and OtterTune are better for automated monitoring at scale; Claude is better for understanding and teaching.


Built by theluckystrike — More at zovo.one