AI Tools Compared

SQL query generation is one of the few AI tasks with objective evaluation criteria — the query either returns correct results or it doesn’t. This makes it unusually easy to benchmark. This guide covers dedicated text-to-SQL tools, general LLMs, and IDE-integrated options, with accuracy benchmarks on real-world query patterns.

Tools Compared

Why SQL Generation Is Hard

Simple SELECTs are trivial for any modern LLM. The failure modes appear at complexity:

Benchmark Query 1: Year-over-Year Revenue with Window Functions

Prompt: “Show monthly revenue for 2024 and 2025 side by side with percent change, for each product category”

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', order_date) AS month,
        category,
        SUM(revenue) AS total_revenue
    FROM orders
    JOIN products USING (product_id)
    WHERE order_date >= '2024-01-01'
    GROUP BY 1, 2
)
SELECT
    category,
    month,
    total_revenue AS current_revenue,
    LAG(total_revenue, 12) OVER (
        PARTITION BY category ORDER BY month
    ) AS prior_year_revenue,
    ROUND(
        100.0 * (total_revenue - LAG(total_revenue, 12) OVER (
            PARTITION BY category ORDER BY month
        )) / NULLIF(LAG(total_revenue, 12) OVER (
            PARTITION BY category ORDER BY month
        ), 0),
        2
    ) AS yoy_pct_change
FROM monthly_revenue
ORDER BY category, month;
Tool Generated Correct Query Notes
Claude (Sonnet 4.5) Yes Used LAG with correct NULLIF for division safety
GPT-4o Yes Slightly different CTE structure, same result
SQLAI.ai Partial Missed the NULLIF, division by zero on new categories
Copilot Chat Partial Generated window function but wrong PARTITION BY

Benchmark Query 2: Recursive CTE for Hierarchy

Prompt: “Find all managers in the org chart above employee ID 1042, including their level”

WITH RECURSIVE org_hierarchy AS (
    SELECT employee_id, manager_id, name, 0 AS depth
    FROM employees
    WHERE employee_id = 1042

    UNION ALL

    SELECT e.employee_id, e.manager_id, e.name, oh.depth + 1
    FROM employees e
    INNER JOIN org_hierarchy oh ON e.employee_id = oh.manager_id
)
SELECT employee_id, name, depth AS levels_above_1042
FROM org_hierarchy
WHERE depth > 0
ORDER BY depth;
Tool Generated Correct Query Notes
Claude (Sonnet 4.5) Yes Correct recursive structure
GPT-4o Yes Added cycle detection guard (good extra)
SQLAI.ai No Generated iterative approach, not recursive CTE
Copilot Chat Partial Recursive structure correct, wrong termination

Benchmark Query 3: BigQuery QUALIFY Clause

Prompt: “In BigQuery, filter rows where the most recent event per user is a ‘purchase’, using QUALIFY”

SELECT
    user_id,
    event_type,
    event_timestamp,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_timestamp DESC) AS rn
FROM events
QUALIFY rn = 1 AND event_type = 'purchase';
Tool Used QUALIFY Notes
Claude (Sonnet 4.5) Yes Correctly used when dialect specified
GPT-4o Yes Also correct
SQLAI.ai No Used subquery with WHERE, valid but verbose
Copilot Chat No Used subquery approach

Schema-Aware Tools

For production use, schema awareness matters more than raw generation quality.

Using General LLMs Effectively for SQL

Always include your schema when using Claude or ChatGPT:

I'm querying a PostgreSQL database. Here is the relevant schema:

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    status TEXT CHECK (status IN ('pending', 'paid', 'shipped', 'cancelled')),
    total_cents INTEGER
);

CREATE TABLE order_items (
    item_id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER,
    unit_price_cents INTEGER
);

Query: Find the top 10 products by revenue in the last 30 days,
showing product name, units sold, and total revenue in dollars.
Include only products with at least 5 orders.

With schema context, Claude and GPT-4o produce correct column names and join conditions on the first attempt. Without it, they hallucinate reasonable-sounding but wrong column names about 30% of the time.

Accuracy Summary

Tested on 40 queries covering: basic aggregations, window functions, CTEs, recursive queries, dialect-specific syntax, and NULL edge cases.

Tool Accuracy Schema Aware Dialect Support Cost
Claude Sonnet 4.5 87% Prompt-based All major API pricing
GPT-4o 85% Prompt-based All major API pricing
SQLAI.ai 78% Native connection Postgres, MySQL, BQ, Snowflake $29/mo
Outerbase AI 72% Native connection Most major Free + paid
Copilot Chat 68% File context only All major $10-19/mo

Practical Workflow

For analysts without IDE access, SQLAI.ai or Outerbase gives the best experience because of native schema connections. For engineers in an IDE, providing schema in the prompt to Claude or GPT-4o achieves higher accuracy at lower cost.

For production query generation, the schema-in-prompt approach with Claude is most reliable — especially for complex analytical queries where the accuracy gap between tools is widest.

Benchmark Query 4: Percentile Calculation Without Window Functions

Prompt: “Find the 95th percentile of order value in the last 30 days, for each product category”

-- Claude output (correct for PostgreSQL)
WITH ordered_values AS (
    SELECT category, revenue
    FROM orders
    JOIN products USING (product_id)
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT DISTINCT
    category,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY revenue)
        OVER (PARTITION BY category) AS p95_revenue
FROM ordered_values;
Tool Correct Notes
Claude Yes Used correct PERCENTILE_CONT aggregation
GPT-4o Yes Also correct, slightly different syntax
SQLAI.ai No Used NTILE instead, off by one tier

Integrating Generated Queries into Applications

Python with SQLAlchemy

from sqlalchemy import text

def get_revenue_by_category(session, days_back: int = 30):
    query = text("""
    WITH monthly_revenue AS (
        SELECT
            DATE_TRUNC('month', order_date) AS month,
            category,
            SUM(revenue) AS total_revenue
        FROM orders
        JOIN products USING (product_id)
        WHERE order_date >= NOW() - INTERVAL ':days days'
        GROUP BY 1, 2
    )
    SELECT
        category,
        month,
        total_revenue,
        LAG(total_revenue, 12) OVER (
            PARTITION BY category ORDER BY month
        ) AS prior_year_revenue
    FROM monthly_revenue
    ORDER BY category, month
    """)

    return session.execute(query, {"days": days_back}).fetchall()

When using Claude to generate text() queries, always include :param_name placeholders for parameterization. Claude respects this pattern when you show an example.

JavaScript with Knex.js

function getProductsByRevenue(knex, minRevenue) {
    return knex
        .with("ranked_products", (qb) => {
            qb.select("product_id", "category", "revenue")
                .from("orders")
                .join("products", "orders.product_id", "=", "products.product_id")
                .select(knex.raw(
                    "ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rank"
                ))
                .where("revenue", ">", minRevenue);
        })
        .select("*")
        .from("ranked_products")
        .where("rank", "<=", 10);
}

For Knex, provide Claude with the builder syntax examples from your codebase. Claude learns the pattern and generates compatible queries.

Direct Database Connection (Node.js)

async function analyzeUserTrends(pool, userId) {
    const query = `
        WITH user_orders AS (
            SELECT
                order_id,
                DATE(order_date) AS order_day,
                COUNT(*) OVER (
                    ORDER BY DATE(order_date)
                    ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
                ) AS orders_in_window
            FROM orders
            WHERE user_id = $1
        )
        SELECT
            order_day,
            orders_in_window,
            ROUND(100.0 * orders_in_window / MAX(orders_in_window) OVER (), 1) AS pct_of_max
        FROM user_orders
        ORDER BY order_day DESC;
    `;

    return pool.query(query, [userId]);
}

Testing Generated Queries

Always validate generated queries before deploying:

def test_generated_query_correctness():
    # Test data
    session.add_all([
        Order(product_id=1, revenue=100, order_date=datetime(2024, 1, 1)),
        Order(product_id=1, revenue=150, order_date=datetime(2025, 1, 1)),
    ])
    session.commit()

    results = get_revenue_by_category(session)

    # Assert: should have row for month=2025-01 with revenue=150 and prior_year=100
    assert results[0].yoy_pct_change == 50.0

def test_edge_case_no_prior_data():
    # Test: if only current year data exists, prior_year should be NULL
    session.add(Order(product_id=1, revenue=100, order_date=datetime(2025, 1, 1)))
    session.commit()

    results = get_revenue_by_category(session)
    assert results[0].prior_year_revenue is None

Common Query Generation Mistakes to Watch For

  1. Window function frame misspecification — ORDER BY without frame clause causes incorrect rows
  2. Aggregation with GROUP BY missing columns — Some databases require all non-aggregated columns in GROUP BY
  3. NULL handling in comparisonscolumn = NULL should be IS NULL
  4. Subquery correlation errors — Outer query columns not visible in subqueries without explicit correlation names
  5. Date arithmetic across databases — MySQL uses DATE_ADD, Postgres uses INTERVAL syntax

Ask Claude explicitly: “Include NULL checks. Ensure all non-aggregated columns in GROUP BY. Use {database} specific syntax.”

When to Skip AI Query Generation

Some queries are too domain-specific for reliable generation:

For these, AI-generated boilerplate + manual refinement is faster than pure generation.

Built by theluckystrike — More at zovo.one