AI Tools for Generating dbt Project Structure from Existing

Many data teams accumulate years of SQL queries across their analytics workflows. When it comes time to adopt dbt (data build tool) for better data transformation management, the prospect of manually converting hundreds or thousands of existing queries into a proper dbt project structure becomes overwhelming. AI tools offer a practical solution for automating this migration process while maintaining query functionality and establishing sensible project organization.

Understanding the Migration Challenge

Converting an existing analytics query library to dbt involves more than just copying SQL files into a new directory structure. A proper dbt project requires organized models, appropriate naming conventions, source definitions, tests, and documentation. Each legacy query typically represents a transformation that needs to be decomposed into discrete, maintainable models.

The manual approach to this migration consumes significant developer time and introduces opportunities for errors. AI-powered tools can accelerate this process by analyzing query patterns, identifying common table expressions, recognizing repeated logic, and generating the corresponding dbt project structure automatically.

How AI Tools Approach the Transformation

Modern AI coding assistants and specialized data tools can parse existing SQL queries and generate appropriate dbt components. The process typically involves several stages that the AI handles intelligently.

Analyzing Query Dependencies

AI tools examine your existing queries to understand table dependencies, identify source tables, and map how different queries relate to each other. This dependency analysis forms the foundation for organizing models into a logical folder structure.

Consider a typical analytics query that might exist in your current library:

SELECT
    o.order_id,
    o.order_date,
    c.customer_name,
    c.customer_email,
    SUM(oi.quantity * oi.unit_price) AS total_order_value
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
GROUP BY o.order_id, o.order_date, c.customer_name, c.customer_email

An AI tool recognizes the source tables (orders, customers, order_items), identifies the transformation logic, and can generate a corresponding dbt model structure.

Generating dbt Model Files

The AI produces YAML configuration and SQL model files that reflect the query structure. For the example above, the tool would generate something similar to:

-- models/marts/orders/recent_order_summary.sql
{{ config(materialized='table') }}

SELECT
    o.order_id,
    o.order_date,
    c.customer_name,
    c.customer_email,
    SUM(oi.quantity * oi.unit_price) AS total_order_value
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.customer_id
JOIN {{ ref('stg_order_items') }} oi ON o.order_id = oi.order_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
GROUP BY o.order_id, o.order_date, c.customer_name, c.customer_email

The AI automatically converts raw table references to ref() macros, organizes models into appropriate directories (staging, intermediate, marts), and sets appropriate materialization strategies.

Creating Source and Staging Layers

AI tools also generate the staging layer that wraps raw source tables. This is crucial for maintaining a clean separation between raw data and business logic:

# models/staging/stg_orders.yml
version: 2

models:
  - name: stg_orders
    description: Staged order data with basic cleaning
    columns:
      - name: order_id
        description: Primary key
        tests:
          - unique
          - not_null
      - name: order_date
        description: Date the order was placed
      - name: customer_id
        description: Foreign key to customer

This automation ensures consistency across your staging models and provides a foundation for testing and documentation.

Practical Workflow for Migration

Using AI tools for this migration follows a structured approach that maximizes efficiency while maintaining quality.

First, gather all your existing queries into a centralized location. Organize them by functional area if possible, as this helps the AI understand the logical groupings. Many teams find success with simple directory structures like legacy_queries/reports/, legacy_queries/etl/, or similar groupings.

Next, provide context to your AI tool about your data warehouse environment. Specify your warehouse type (Snowflake, BigQuery, Redshift), your existing table naming conventions, and any established business logic rules. This context enables the AI to make better decisions about model organization and naming.

The AI then processes queries in batches, generating corresponding dbt models. Review the output carefully, particularly around complex joins and window functions, as these often require manual verification to ensure the generated logic matches the original intent.

Limitations and Manual Verification

While AI tools significantly accelerate the migration process, certain aspects require developer attention.

Complex business logic embedded in queries often relies on implicit assumptions about data quality or timing that may not transfer cleanly. Window functions with complex partitioning logic, recursive CTEs, and queries handling late-arriving data all benefit from manual review.

Data type handling varies between warehouses, and the AI may not always select the optimal data type for your target platform. A quick review of column types in generated staging models catches these issues before they cause problems downstream.

Finally, test generation requires oversight. While AI can create basic uniqueness and not-null tests, relationship tests between tables and custom business logic tests need explicit definition based on your team’s data quality standards.

Building a Maintainable dbt Project

The ultimate goal of migrating to dbt extends beyond simply converting queries. You want a project structure that supports ongoing development, enables collaboration, and provides clear data lineage.

AI-generated models serve as an excellent starting point. After migration, invest time in organizing models into appropriate layers: staging for source-aligned transformations, intermediate for reusable logic, and marts for business-ready datasets. Add documentation that explains not just what each model does, but why it exists and how it relates to business processes.

Establish conventions for naming, testing, and versioning early. These conventions become the foundation for your team’s dbt development practices and ensure consistency as the project grows.

Practical dbt Project Structure Generated by AI

Here’s what a well-organized dbt project typically looks like after AI-assisted generation:

my_analytics/
├── dbt_project.yml
├── models/
│   ├── staging/
│   │   ├── stg_customers.sql
│   │   ├── stg_orders.sql
│   │   ├── stg_payments.sql
│   │   └── staging.yml
│   ├── intermediate/
│   │   ├── int_order_items.sql
│   │   └── intermediate.yml
│   └── marts/
│       ├── core/
│       │   ├── fact_orders.sql
│       │   ├── dim_customers.sql
│       │   └── core.yml
│       └── finance/
│           ├── fct_revenue.sql
│           └── finance.yml
├── tests/
│   ├── generic/
│   └── specific/
├── macros/
│   └── cents_to_dollars.sql
├── seeds/
└── analyses/

CLI Commands for dbt Migrations

AI tools work best when integrated into your dbt workflow:

# Initialize dbt project
dbt init my_analytics

# Run all models
dbt run

# Test generated models for issues
dbt test

# Document all tables and columns
dbt docs generate

# Debug specific model
dbt debug

# Run specific model staging layer
dbt run --select tag:staging

# Validate all generated SQL before deployment
dbt parse

# Get test results in JSON
dbt test --select stg_customers --output json

Cost Comparison for Migration Tools

Tool	Price	Best For	Speed
Claude Code	$20/month	Complex logic analysis, custom SQL	Fast, thorough explanations
GitHub Copilot	$10/month	Inline suggestions, quick models	Very fast
dbt Cloud IDE	Free - $5000+/month	Native dbt support, lineage visualization	Integrated but vendor-locked
Custom Python script	Free	Batch migration of similar queries	Very fast but less intelligent

Real Migration Example: From Raw Queries to dbt

Original legacy query:

SELECT
  customer_id,
  COUNT(DISTINCT order_id) as total_orders,
  SUM(order_amount) as lifetime_value,
  MAX(order_date) as last_order_date
FROM raw_orders
WHERE deleted_at IS NULL
GROUP BY customer_id
HAVING COUNT(DISTINCT order_id) >= 5

AI-generated dbt transformation:

-- models/marts/core/dim_customers.sql
{{ config(materialized='table') }}

WITH customer_summary AS (
  SELECT
    customer_id,
    COUNT(DISTINCT order_id) as total_orders,
    SUM(order_amount) as lifetime_value,
    MAX(order_date) as last_order_date
  FROM {{ ref('stg_orders') }}
  WHERE deleted_at IS NULL
  GROUP BY customer_id
  HAVING COUNT(DISTINCT order_id) >= 5
),

customers_with_details AS (
  SELECT
    c.customer_id,
    c.customer_name,
    s.total_orders,
    s.lifetime_value,
    s.last_order_date
  FROM {{ ref('stg_customers') }} c
  JOIN customer_summary s USING (customer_id)
)

SELECT * FROM customers_with_details

Plus accompanying YAML:

models:
  - name: dim_customers
    description: Customer dimension with order metrics
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: total_orders
        description: Count of distinct orders
      - name: lifetime_value
        description: Total monetary value of all orders

Validation Before Deployment

After AI generates your dbt project, validate thoroughly:

# Compile and check syntax without running
dbt compile

# Run tests on generated models
dbt test --select stg_*

# Compare record counts with original queries
dbt run-operation compare_row_counts --args '{"model": "fact_orders"}'

# Check for null values where not expected
dbt test --select tag:critical_columns

# Validate referential integrity between models
dbt test --select dbt_expectations

Performance Optimization After Generation

AI-generated models may not include optimal indexing or partitioning. Review and add after initial generation:

-- Add partitioning for fact tables (Snowflake example)
{{ config(
  materialized='table',
  partition_by = {
    "field": "order_date",
    "data_type": "date",
    "granularity": "month"
  }
) }}

-- Add clustering for query performance
{{ config(
  cluster_by = ["customer_id", "order_date"]
) }}

Common Issues After Migration

Issue: Models reference wrong source tables

Solution: Review sources.yml file before running, ensure table names match warehouse exactly

Issue: Tests fail on generated models

Solution: Run dbt test immediately after generation, fix failing tests before production use

Issue: dbt parse errors in generated YAML

Solution: Validate YAML syntax with dbt parse command

Issue: Generated models produce different results than legacy queries

Solution: Compare row counts and sample data between old query and new model until they match

Built by theluckystrike — More at zovo.one