AI Tools Compared

When working with dbt (data build tool), defining models from existing database tables can feel repetitive. You examine each source table, determine the appropriate transformation logic, and write SQL files with columns, tests, and documentation. AI tools now automate this process, taking raw database table schemas and generating ready-to-use dbt model definitions.

The Problem with Manual dbt Model Creation

Building dbt models from scratch involves several steps: reading source table definitions, creating YAML configurations for sources and models, writing SQL transformation queries, adding tests for data quality, and documenting columns with descriptions. For a database with dozens or hundreds of tables, this becomes a significant undertaking.

Consider a typical e-commerce database with tables like orders, order_items, customers, products, and inventory. Manually creating dbt models for each table requires examining column names, data types, and relationships to write appropriate transformations. This boilerplate work distracts from the analytical logic that actually adds business value.

How AI Tools Generate dbt Models

AI tools analyze your database schema and generate dbt-compatible files automatically. The process typically works like this: you provide the table definitions (either through a SQL DESCRIBE query, a schema export, or a connection to your data warehouse), and the AI outputs source YAML files, model SQL files, and sometimes tests and documentation.

Input: Raw Table Schema

A typical table schema looks like this:

CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    order_date TIMESTAMP NOT NULL,
    total_amount DECIMAL(10,2),
    status VARCHAR(20),
    shipping_address VARCHAR(500),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Output: Generated dbt Source Configuration

The AI transforms this into a dbt sources.yml file:

version: 2

sources:
  - name: ecommerce
    database: production
    schema: orders
    tables:
      - name: orders
        description: "Customer orders table containing transaction data"
        columns:
          - name: order_id
            description: "Unique order identifier"
            tests:
              - unique
              - not_null
          - name: customer_id
            description: "Foreign key to customers table"
            tests:
              - not_null
              - relationships:
                  to: ref('customers')
                  field: customer_id
          - name: order_date
            description: "Timestamp when order was placed"
            tests:
              - not_null
          - name: total_amount
            description: "Total order value in USD"
          - name: status
            description: "Order status (pending, shipped, delivered, cancelled)"
            tests:
              - accepted_values:
                  values: ['pending', 'shipped', 'delivered', 'cancelled']

Output: Generated dbt Model SQL

The AI also creates the corresponding model file:

{{ config(materialized='table') }}

SELECT
    order_id,
    customer_id,
    order_date,
    total_amount,
    status,
    shipping_address,
    created_at,
    updated_at,
    -- Derived columns
    DATE_TRUNC('day', order_date) AS order_date_day,
    EXTRACT(YEAR FROM order_date) AS order_year,
    EXTRACT(MONTH FROM order_date) AS order_month
FROM {{ source('ecommerce', 'orders') }}
WHERE order_id IS NOT NULL

Practical AI Tools for This Task

Several AI tools handle schema-to-dbt conversion with varying capabilities.

GitHub Copilot

Copilot suggests dbt model code based on comments and existing patterns. You write a comment describing the model you want, and Copilot completes the SQL. For schema generation, you paste your table definition and ask for a dbt model. The quality depends on your prompts and the surrounding context in your project.

Claude and GPT-4

Large language models excel at this task. You provide the schema and specify your requirements, and the model generates complete YAML configurations and SQL files. The advantage is flexibility—you can ask for specific materializations, test types, or coding patterns.

# Example prompt to an AI assistant
"""
Generate dbt source configuration and model SQL for a table with this schema:
- order_id: BIGINT PRIMARY KEY
- customer_id: BIGINT NOT NULL
- order_date: TIMESTAMP NOT NULL
- total_amount: DECIMAL(10,2)
- status: VARCHAR(20)

Use table materialization, add unique and not_null tests on order_id,
and include a relationship test to a hypothetical customers table.
"""

Specialized dbt Generation Tools

Some tools target dbt specifically. These often integrate with your warehouse to read schemas directly, generating configurations without manual schema copying.

Best Practices When Using AI-Generated Models

AI-generated dbt models provide a solid starting point, but you should review and enhance them.

Add Business Logic

The AI captures technical transformations but misses business context. You know that total_amount should exclude returns, that status needs specific handling for your fulfillment process, or that certain customer segments require special treatment. Add these rules after generation.

Customize Tests

AI tools add standard tests, but your business requires domain-specific validations. A fraud detection team needs anomaly detection tests. A finance team needs reconciliation tests against source systems. Add these manually after AI generates the baseline.

Maintain Naming Conventions

Review column names for consistency with your project standards. The AI might generate order_date_day while your project uses order_dt. Fix these before committing to avoid technical debt.

Document Relationships

AI tools struggle to infer table relationships without foreign key constraints. If your warehouse lacks foreign key definitions, explicitly document relationships in your model configurations.

Workflow Integration

Integrate AI generation into your development process:

  1. Export schema: Run DESCRIBE TABLE or export schema from your data warehouse

  2. Generate models: Use your AI tool to create initial files

  3. Review output: Check SQL correctness and test coverage

  4. Enhance: Add business logic, custom tests, and documentation

  5. Commit: Version control the refined models

This workflow reduces model creation time significantly while maintaining quality through human review.

Limitations to Consider

AI tools have constraints. They generate generic transformations and may miss warehouse-specific optimizations. They assume correct data types and won’t catch logical errors in your business rules. They also struggle with complex joins, incremental models, and advanced dbt features without explicit guidance.

For simple staging models, AI generation works well. For complex business logic, use AI as a starting point and build manually from there.

Built by theluckystrike — More at zovo.one