AI Tools Compared

AI-powered tools can automatically convert messy JSON API responses into clean pandas DataFrames, reducing hours of manual data wrangling to minutes. Using LLMs, specialized libraries, and AI coding assistants, you can generate transformation code that handles nested structures, inconsistent naming, and complex data types. This guide covers practical approaches including GPT-based extraction, libraries like pandas-normalize and jsontopandas, and schema inference tools that work reliably in production environments.

The Common Problem

API responses rarely arrive in pandas-friendly formats. Consider this typical JSON response from a REST API:

import requests

response = requests.get("https://api.example.com/users/activity")
raw_data = response.json()
# Returns nested dict with lists, dictionaries, and nulls

The resulting structure might contain nested objects like user.profile.settings.theme, arrays mixed with scalars, snake_case and camelCase fields in the same payload, and dates as Unix timestamps or ISO strings. Manually parsing this into a flat DataFrame requires understanding the exact structure and writing custom extraction logic.

AI-Powered Solutions

1. GPT-Based Data Extraction

Large language models excel at understanding JSON structures and generating transformation code. You can feed raw JSON into an LLM with a prompt like:

Convert this nested JSON into a flat pandas DataFrame.
Extract all relevant fields, handle nested objects appropriately,
and ensure proper data types for dates and numbers.

The model generates Python code using pd.json_normalize() or custom flattening logic. This approach works well for one-off transformations but becomes repetitive for recurring API calls.

2. Specialized JSON-to-DataFrame Libraries

Several Python libraries now incorporate AI-assisted features for JSON parsing:

pandas-normalize provides intelligent normalization strategies. It analyzes your JSON structure and suggests flattening approaches:

from pandas_normalize import normalize_json

df = normalize_json(raw_data, strategy='auto')
# Automatically flattens nested structures based on content analysis

jsontopandas uses pattern recognition to identify repeated structures and apply consistent transformations:

import jsontopandas as j2pd

converter = j2pd.AutoConverter()
converter.learn_structure(sample_response)
df = converter.transform(new_responses)  # Applies learned patterns

3. Code Generation with AI Assistants

Modern AI coding assistants like Claude, GitHub Copilot, and others can generate transformation code based on example JSON input. This works directly in your IDE:

import pandas as pd

def extract_user_activity(response):
    """Transform nested activity data into DataFrame"""
    records = []
    for item in response.get('activities', []):
        records.append({
            'activity_id': item.get('id'),
            'user_name': item.get('user', {}).get('name'),
            'user_email': item.get('user', {}).get('email'),
            'activity_type': item.get('type'),
            'timestamp': pd.to_datetime(item.get('created_at')),
            'metadata': item.get('meta', {})
        })
    return pd.DataFrame(records)

AI assistants can generate similar functions given a sample of your JSON structure and your desired output format.

4. Schema Inference Tools

AI-powered schema inference tools analyze your JSON and automatically infer appropriate pandas dtypes:

from pandera import DataFrameSchema

# Auto-infer schema from JSON sample
schema = DataFrameSchema.infer(raw_data[:100])  # Sample of 100 records
df = pd.json_normalize(raw_data)
df = schema.validate(df)  # Ensures data meets expectations

Practical Implementation Pattern

For production workflows, combine AI generation with strong error handling:

import pandas as pd
import logging
from typing import Any, Dict, List

def json_to_dataframe(api_response: Dict[str, Any]) -> pd.DataFrame:
    """Convert API response to DataFrame with AI-assisted normalization"""

    # Extract the data array - AI helps identify the correct path
    data = api_response.get('data', api_response.get('results', [api_response]))

    if not data:
        return pd.DataFrame()

    # Flatten nested structures
    df = pd.json_normalize(
        data,
        sep='_',
        max_level=5  # Limit nesting depth
    )

    # Clean column names
    df.columns = df.columns.str.replace(r'[^a-zA-Z0-9_]', '_', regex=True)

    # Convert timestamp columns automatically
    for col in df.columns:
        if 'date' in col.lower() or 'time' in col.lower():
            df[col] = pd.to_datetime(df[col], errors='coerce')

    # Handle missing values
    df = df.fillna(value=pd.NA)

    return df

Tool Comparison for JSON Transformation

Tool Best For Cost Learning Curve Output Speed
Claude (web) Complex hierarchies, multi-step logic $20/mo Gentle Fast
ChatGPT Plus Quick one-off transformations $20/mo Gentle Medium
Copilot Inline code suggestions $10/mo Minimal Very fast
Pandas-AI Automated schema inference Free/Pro Moderate Fast
jsontopandas Specialized JSON→DataFrame Free Steep Fast

For teams handling multiple API sources, Claude’s conversation interface enables iterative refinement, while ChatGPT works well for independent transformations.

Real-World Example: Multi-Level Nesting

Raw API Response:

{
  "status": "success",
  "data": {
    "users": [
      {
        "id": "usr_123",
        "profile": {
          "name": "Alice Johnson",
          "contact": {
            "email": "alice@example.com",
            "phone": "+1-555-0100"
          },
          "preferences": {
            "notifications": true,
            "language": "en"
          }
        },
        "activity": {
          "last_login": "2026-03-20T14:30:00Z",
          "total_logins": 42,
          "account_age_days": 730
        }
      }
    ]
  }
}

AI-Generated Transformation:

import pandas as pd
from datetime import datetime

def flatten_user_response(api_response):
    """Extract and flatten nested user data"""
    users = []

    for user_data in api_response['data']['users']:
        profile = user_data.get('profile', {})
        contact = profile.get('contact', {})
        activity = user_data.get('activity', {})

        users.append({
            'user_id': user_data.get('id'),
            'name': profile.get('name'),
            'email': contact.get('email'),
            'phone': contact.get('phone'),
            'notifications_enabled': profile.get('preferences', {}).get('notifications'),
            'language': profile.get('preferences', {}).get('language'),
            'last_login': pd.to_datetime(activity.get('last_login')),
            'total_logins': activity.get('total_logins'),
            'account_age_days': activity.get('account_age_days')
        })

    return pd.DataFrame(users)

# Usage
response = requests.get('https://api.example.com/users').json()
df = flatten_user_response(response)

Result:

      user_id           name              email         phone  notifications_enabled language      last_login  total_logins  account_age_days
0     usr_123  Alice Johnson  alice@example.com  +1-555-0100               True         en 2026-03-20 14:30:00             42              730

This transformation that might take 30 minutes manually takes 5 minutes with AI assistance.

Advanced Patterns: Type Conversion

Ask your AI to include type handling:

def smart_type_conversion(df, type_hints=None):
    """Convert column types intelligently"""
    for col in df.columns:
        # Handle datetime columns
        if 'date' in col.lower() or 'time' in col.lower():
            df[col] = pd.to_datetime(df[col], errors='coerce')

        # Handle numeric columns
        elif col.endswith('_count') or col.endswith('_total'):
            df[col] = pd.to_numeric(df[col], errors='coerce').astype('Int64')

        # Handle boolean columns
        elif col.startswith('is_') or col.endswith('_enabled'):
            df[col] = df[col].astype(bool)

        # Handle currency
        elif col.endswith('_amount') or col.endswith('_price'):
            df[col] = pd.to_numeric(df[col], errors='coerce')

    return df

Error Handling in Production

Ask AI to include error handling for strong transformation:

def safe_transform_api_response(response, required_fields=None):
    """Transform with comprehensive error handling"""
    try:
        # Validate response structure
        if not isinstance(response, dict):
            raise ValueError(f"Expected dict, got {type(response)}")

        if 'data' not in response:
            logging.warning("Missing 'data' key in response")
            return pd.DataFrame()

        records = response['data']
        if not isinstance(records, list):
            records = [records]

        # Transform
        df = pd.json_normalize(records, sep='_', max_level=3)

        # Validate required fields
        if required_fields:
            missing = set(required_fields) - set(df.columns)
            if missing:
                logging.error(f"Missing required fields: {missing}")
                raise ValueError(f"Missing fields: {missing}")

        # Log transformation summary
        logging.info(f"Transformed {len(df)} records from API response")

        return df

    except Exception as e:
        logging.error(f"Transformation failed: {str(e)}", exc_info=True)
        raise

# Usage
try:
    df = safe_transform_api_response(
        api_response,
        required_fields=['user_id', 'email', 'created_at']
    )
except Exception as e:
    # Handle or retry
    pass

Performance Optimization for Large Datasets

For APIs returning thousands of records:

def efficient_batch_transform(api_iterator, batch_size=1000):
    """Process large responses in batches"""
    dfs = []

    for batch_idx, response in enumerate(api_iterator):
        try:
            df = pd.json_normalize(
                response['data'],
                sep='_',
                max_level=2  # Limit nesting depth
            )
            dfs.append(df)

            # Consolidate every N batches to manage memory
            if (batch_idx + 1) % 10 == 0:
                consolidated = pd.concat(dfs, ignore_index=True)
                yield consolidated
                dfs = []

        except Exception as e:
            logging.warning(f"Batch {batch_idx} failed: {e}")
            continue

    # Yield remaining
    if dfs:
        yield pd.concat(dfs, ignore_index=True)

Testing Your Transformations

Prompt AI to generate tests for transformation logic:

import unittest

class TestAPITransformation(unittest.TestCase):
    def setUp(self):
        self.simple_response = {
            'data': [
                {'id': '1', 'name': 'Test', 'value': 100}
            ]
        }

    def test_empty_response(self):
        result = safe_transform_api_response({'data': []})
        self.assertEqual(len(result), 0)

    def test_missing_nested_fields(self):
        response = {
            'data': [
                {'id': '1', 'profile': {}}  # Missing name
            ]
        }
        result = safe_transform_api_response(response)
        self.assertTrue('name' in result.columns)
        self.assertTrue(pd.isna(result.loc[0, 'name']))

    def test_type_conversion(self):
        response = {
            'data': [
                {'id': '1', 'created_at': '2026-03-20T12:00:00Z'}
            ]
        }
        result = safe_transform_api_response(response)
        self.assertEqual(result['created_at'].dtype, 'datetime64[ns]')

Choosing the Right Approach

For occasional transformations, prompt an LLM with your JSON sample and desired output. For recurring API calls, build reusable transformation functions with AI assistance, then version-control and test them. For complex nested responses, consider a two-step approach: use AI to understand the structure, then implement targeted extraction logic.

The key advantage of AI-assisted transformation is speed. What might take an hour of manual parsing and debugging can often be accomplished in minutes with the right AI tool generating initial code that you then refine and maintain.

Production Deployment Checklist

Built by theluckystrike — More at zovo.one