Many data teams accumulate years of SQL queries across their analytics workflows. When it comes time to adopt dbt (data build tool) for better data transformation management, the prospect of manually converting hundreds or thousands of existing queries into a proper dbt project structure becomes overwhelming. AI tools offer a practical solution for automating this migration process while maintaining query functionality and establishing sensible project organization.
Understanding the Migration Challenge
Converting an existing analytics query library to dbt involves more than just copying SQL files into a new directory structure. A proper dbt project requires organized models, appropriate naming conventions, source definitions, tests, and documentation. Each legacy query typically represents a transformation that needs to be decomposed into discrete, maintainable models.
The manual approach to this migration consumes significant developer time and introduces opportunities for errors. AI-powered tools can accelerate this process by analyzing query patterns, identifying common table expressions, recognizing repeated logic, and generating the corresponding dbt project structure automatically.
How AI Tools Approach the Transformation
Modern AI coding assistants and specialized data tools can parse existing SQL queries and generate appropriate dbt components. The process typically involves several stages that the AI handles intelligently.
Analyzing Query Dependencies
AI tools examine your existing queries to understand table dependencies, identify source tables, and map how different queries relate to each other. This dependency analysis forms the foundation for organizing models into a logical folder structure.
Consider a typical analytics query that might exist in your current library:
SELECT
o.order_id,
o.order_date,
c.customer_name,
c.customer_email,
SUM(oi.quantity * oi.unit_price) AS total_order_value
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
GROUP BY o.order_id, o.order_date, c.customer_name, c.customer_email
An AI tool recognizes the source tables (orders, customers, order_items), identifies the transformation logic, and can generate a corresponding dbt model structure.
Generating dbt Model Files
The AI produces YAML configuration and SQL model files that reflect the query structure. For the example above, the tool would generate something similar to:
-- models/marts/orders/recent_order_summary.sql
{{ config(materialized='table') }}
SELECT
o.order_id,
o.order_date,
c.customer_name,
c.customer_email,
SUM(oi.quantity * oi.unit_price) AS total_order_value
FROM {{ ref('stg_orders') }} o
JOIN {{ ref('stg_customers') }} c ON o.customer_id = c.customer_id
JOIN {{ ref('stg_order_items') }} oi ON o.order_id = oi.order_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)
GROUP BY o.order_id, o.order_date, c.customer_name, c.customer_email
The AI automatically converts raw table references to ref() macros, organizes models into appropriate directories (staging, intermediate, marts), and sets appropriate materialization strategies.
Creating Source and Staging Layers
AI tools also generate the staging layer that wraps raw source tables. This is crucial for maintaining a clean separation between raw data and business logic:
# models/staging/stg_orders.yml
version: 2
models:
- name: stg_orders
description: Staged order data with basic cleaning
columns:
- name: order_id
description: Primary key
tests:
- unique
- not_null
- name: order_date
description: Date the order was placed
- name: customer_id
description: Foreign key to customer
This automation ensures consistency across your staging models and provides a foundation for testing and documentation.
Practical Workflow for Migration
Using AI tools for this migration follows a structured approach that maximizes efficiency while maintaining quality.
First, gather all your existing queries into a centralized location. Organize them by functional area if possible, as this helps the AI understand the logical groupings. Many teams find success with simple directory structures like legacy_queries/reports/, legacy_queries/etl/, or similar groupings.
Next, provide context to your AI tool about your data warehouse environment. Specify your warehouse type (Snowflake, BigQuery, Redshift), your existing table naming conventions, and any established business logic rules. This context enables the AI to make better decisions about model organization and naming.
The AI then processes queries in batches, generating corresponding dbt models. Review the output carefully, particularly around complex joins and window functions, as these often require manual verification to ensure the generated logic matches the original intent.
Limitations and Manual Verification
While AI tools significantly accelerate the migration process, certain aspects require developer attention.
Complex business logic embedded in queries often relies on implicit assumptions about data quality or timing that may not transfer cleanly. Window functions with complex partitioning logic, recursive CTEs, and queries handling late-arriving data all benefit from manual review.
Data type handling varies between warehouses, and the AI may not always select the optimal data type for your target platform. A quick review of column types in generated staging models catches these issues before they cause problems downstream.
Finally, test generation requires oversight. While AI can create basic uniqueness and not-null tests, relationship tests between tables and custom business logic tests need explicit definition based on your team’s data quality standards.
Building a Maintainable dbt Project
The ultimate goal of migrating to dbt extends beyond simply converting queries. You want a project structure that supports ongoing development, enables collaboration, and provides clear data lineage.
AI-generated models serve as an excellent starting point. After migration, invest time in organizing models into appropriate layers: staging for source-aligned transformations, intermediate for reusable logic, and marts for business-ready datasets. Add documentation that explains not just what each model does, but why it exists and how it relates to business processes.
Establish conventions for naming, testing, and versioning early. These conventions become the foundation for your team’s dbt development practices and ensure consistency as the project grows.
Practical dbt Project Structure Generated by AI
Here’s what a well-organized dbt project typically looks like after AI-assisted generation:
my_analytics/
├── dbt_project.yml
├── models/
│ ├── staging/
│ │ ├── stg_customers.sql
│ │ ├── stg_orders.sql
│ │ ├── stg_payments.sql
│ │ └── staging.yml
│ ├── intermediate/
│ │ ├── int_order_items.sql
│ │ └── intermediate.yml
│ └── marts/
│ ├── core/
│ │ ├── fact_orders.sql
│ │ ├── dim_customers.sql
│ │ └── core.yml
│ └── finance/
│ ├── fct_revenue.sql
│ └── finance.yml
├── tests/
│ ├── generic/
│ └── specific/
├── macros/
│ └── cents_to_dollars.sql
├── seeds/
└── analyses/
CLI Commands for dbt Migrations
AI tools work best when integrated into your dbt workflow:
# Initialize dbt project
dbt init my_analytics
# Run all models
dbt run
# Test generated models for issues
dbt test
# Document all tables and columns
dbt docs generate
# Debug specific model
dbt debug
# Run specific model staging layer
dbt run --select tag:staging
# Validate all generated SQL before deployment
dbt parse
# Get test results in JSON
dbt test --select stg_customers --output json
Cost Comparison for Migration Tools
| Tool | Price | Best For | Speed |
|---|---|---|---|
| Claude Code | $20/month | Complex logic analysis, custom SQL | Fast, thorough explanations |
| GitHub Copilot | $10/month | Inline suggestions, quick models | Very fast |
| dbt Cloud IDE | Free - $5000+/month | Native dbt support, lineage visualization | Integrated but vendor-locked |
| Custom Python script | Free | Batch migration of similar queries | Very fast but less intelligent |
Real Migration Example: From Raw Queries to dbt
Original legacy query:
SELECT
customer_id,
COUNT(DISTINCT order_id) as total_orders,
SUM(order_amount) as lifetime_value,
MAX(order_date) as last_order_date
FROM raw_orders
WHERE deleted_at IS NULL
GROUP BY customer_id
HAVING COUNT(DISTINCT order_id) >= 5
AI-generated dbt transformation:
-- models/marts/core/dim_customers.sql
{{ config(materialized='table') }}
WITH customer_summary AS (
SELECT
customer_id,
COUNT(DISTINCT order_id) as total_orders,
SUM(order_amount) as lifetime_value,
MAX(order_date) as last_order_date
FROM {{ ref('stg_orders') }}
WHERE deleted_at IS NULL
GROUP BY customer_id
HAVING COUNT(DISTINCT order_id) >= 5
),
customers_with_details AS (
SELECT
c.customer_id,
c.customer_name,
s.total_orders,
s.lifetime_value,
s.last_order_date
FROM {{ ref('stg_customers') }} c
JOIN customer_summary s USING (customer_id)
)
SELECT * FROM customers_with_details
Plus accompanying YAML:
models:
- name: dim_customers
description: Customer dimension with order metrics
columns:
- name: customer_id
tests:
- unique
- not_null
- name: total_orders
description: Count of distinct orders
- name: lifetime_value
description: Total monetary value of all orders
Validation Before Deployment
After AI generates your dbt project, validate thoroughly:
# Compile and check syntax without running
dbt compile
# Run tests on generated models
dbt test --select stg_*
# Compare record counts with original queries
dbt run-operation compare_row_counts --args '{"model": "fact_orders"}'
# Check for null values where not expected
dbt test --select tag:critical_columns
# Validate referential integrity between models
dbt test --select dbt_expectations
Performance Optimization After Generation
AI-generated models may not include optimal indexing or partitioning. Review and add after initial generation:
-- Add partitioning for fact tables (Snowflake example)
{{ config(
materialized='table',
partition_by = {
"field": "order_date",
"data_type": "date",
"granularity": "month"
}
) }}
-- Add clustering for query performance
{{ config(
cluster_by = ["customer_id", "order_date"]
) }}
Common Issues After Migration
Issue: Models reference wrong source tables
- Solution: Review
sources.ymlfile before running, ensure table names match warehouse exactly
Issue: Tests fail on generated models
- Solution: Run
dbt testimmediately after generation, fix failing tests before production use
Issue: dbt parse errors in generated YAML
- Solution: Validate YAML syntax with
dbt parsecommand
Issue: Generated models produce different results than legacy queries
- Solution: Compare row counts and sample data between old query and new model until they match
Related Articles
- Best AI Tools for Go Project Structure and Module
- Best Way to Structure CursorRules for Microservices Project
- How to Structure Project Files So AI Coding Tools Understand
- Cheapest AI Tool for Generating an Entire Project
- Cheapest AI Tool for Generating Entire Project
Built by theluckystrike — More at zovo.one