AI-powered tools can automatically convert messy JSON API responses into clean pandas DataFrames, reducing hours of manual data wrangling to minutes. Using LLMs, specialized libraries, and AI coding assistants, you can generate transformation code that handles nested structures, inconsistent naming, and complex data types. This guide covers practical approaches including GPT-based extraction, libraries like pandas-normalize and jsontopandas, and schema inference tools that work reliably in production environments.
The Common Problem
API responses rarely arrive in pandas-friendly formats. Consider this typical JSON response from a REST API:
import requests
response = requests.get("https://api.example.com/users/activity")
raw_data = response.json()
# Returns nested dict with lists, dictionaries, and nulls
The resulting structure might contain nested objects like user.profile.settings.theme, arrays mixed with scalars, snake_case and camelCase fields in the same payload, and dates as Unix timestamps or ISO strings. Manually parsing this into a flat DataFrame requires understanding the exact structure and writing custom extraction logic.
AI-Powered Solutions
1. GPT-Based Data Extraction
Large language models excel at understanding JSON structures and generating transformation code. You can feed raw JSON into an LLM with a prompt like:
Convert this nested JSON into a flat pandas DataFrame.
Extract all relevant fields, handle nested objects appropriately,
and ensure proper data types for dates and numbers.
The model generates Python code using pd.json_normalize() or custom flattening logic. This approach works well for one-off transformations but becomes repetitive for recurring API calls.
2. Specialized JSON-to-DataFrame Libraries
Several Python libraries now incorporate AI-assisted features for JSON parsing:
pandas-normalize provides intelligent normalization strategies. It analyzes your JSON structure and suggests flattening approaches:
from pandas_normalize import normalize_json
df = normalize_json(raw_data, strategy='auto')
# Automatically flattens nested structures based on content analysis
jsontopandas uses pattern recognition to identify repeated structures and apply consistent transformations:
import jsontopandas as j2pd
converter = j2pd.AutoConverter()
converter.learn_structure(sample_response)
df = converter.transform(new_responses) # Applies learned patterns
3. Code Generation with AI Assistants
Modern AI coding assistants like Claude, GitHub Copilot, and others can generate transformation code based on example JSON input. This works directly in your IDE:
import pandas as pd
def extract_user_activity(response):
"""Transform nested activity data into DataFrame"""
records = []
for item in response.get('activities', []):
records.append({
'activity_id': item.get('id'),
'user_name': item.get('user', {}).get('name'),
'user_email': item.get('user', {}).get('email'),
'activity_type': item.get('type'),
'timestamp': pd.to_datetime(item.get('created_at')),
'metadata': item.get('meta', {})
})
return pd.DataFrame(records)
AI assistants can generate similar functions given a sample of your JSON structure and your desired output format.
4. Schema Inference Tools
AI-powered schema inference tools analyze your JSON and automatically infer appropriate pandas dtypes:
from pandera import DataFrameSchema
# Auto-infer schema from JSON sample
schema = DataFrameSchema.infer(raw_data[:100]) # Sample of 100 records
df = pd.json_normalize(raw_data)
df = schema.validate(df) # Ensures data meets expectations
Practical Implementation Pattern
For production workflows, combine AI generation with strong error handling:
import pandas as pd
import logging
from typing import Any, Dict, List
def json_to_dataframe(api_response: Dict[str, Any]) -> pd.DataFrame:
"""Convert API response to DataFrame with AI-assisted normalization"""
# Extract the data array - AI helps identify the correct path
data = api_response.get('data', api_response.get('results', [api_response]))
if not data:
return pd.DataFrame()
# Flatten nested structures
df = pd.json_normalize(
data,
sep='_',
max_level=5 # Limit nesting depth
)
# Clean column names
df.columns = df.columns.str.replace(r'[^a-zA-Z0-9_]', '_', regex=True)
# Convert timestamp columns automatically
for col in df.columns:
if 'date' in col.lower() or 'time' in col.lower():
df[col] = pd.to_datetime(df[col], errors='coerce')
# Handle missing values
df = df.fillna(value=pd.NA)
return df
Tool Comparison for JSON Transformation
| Tool | Best For | Cost | Learning Curve | Output Speed |
|---|---|---|---|---|
| Claude (web) | Complex hierarchies, multi-step logic | $20/mo | Gentle | Fast |
| ChatGPT Plus | Quick one-off transformations | $20/mo | Gentle | Medium |
| Copilot | Inline code suggestions | $10/mo | Minimal | Very fast |
| Pandas-AI | Automated schema inference | Free/Pro | Moderate | Fast |
| jsontopandas | Specialized JSON→DataFrame | Free | Steep | Fast |
For teams handling multiple API sources, Claude’s conversation interface enables iterative refinement, while ChatGPT works well for independent transformations.
Real-World Example: Multi-Level Nesting
Raw API Response:
{
"status": "success",
"data": {
"users": [
{
"id": "usr_123",
"profile": {
"name": "Alice Johnson",
"contact": {
"email": "alice@example.com",
"phone": "+1-555-0100"
},
"preferences": {
"notifications": true,
"language": "en"
}
},
"activity": {
"last_login": "2026-03-20T14:30:00Z",
"total_logins": 42,
"account_age_days": 730
}
}
]
}
}
AI-Generated Transformation:
import pandas as pd
from datetime import datetime
def flatten_user_response(api_response):
"""Extract and flatten nested user data"""
users = []
for user_data in api_response['data']['users']:
profile = user_data.get('profile', {})
contact = profile.get('contact', {})
activity = user_data.get('activity', {})
users.append({
'user_id': user_data.get('id'),
'name': profile.get('name'),
'email': contact.get('email'),
'phone': contact.get('phone'),
'notifications_enabled': profile.get('preferences', {}).get('notifications'),
'language': profile.get('preferences', {}).get('language'),
'last_login': pd.to_datetime(activity.get('last_login')),
'total_logins': activity.get('total_logins'),
'account_age_days': activity.get('account_age_days')
})
return pd.DataFrame(users)
# Usage
response = requests.get('https://api.example.com/users').json()
df = flatten_user_response(response)
Result:
user_id name email phone notifications_enabled language last_login total_logins account_age_days
0 usr_123 Alice Johnson alice@example.com +1-555-0100 True en 2026-03-20 14:30:00 42 730
This transformation that might take 30 minutes manually takes 5 minutes with AI assistance.
Advanced Patterns: Type Conversion
Ask your AI to include type handling:
def smart_type_conversion(df, type_hints=None):
"""Convert column types intelligently"""
for col in df.columns:
# Handle datetime columns
if 'date' in col.lower() or 'time' in col.lower():
df[col] = pd.to_datetime(df[col], errors='coerce')
# Handle numeric columns
elif col.endswith('_count') or col.endswith('_total'):
df[col] = pd.to_numeric(df[col], errors='coerce').astype('Int64')
# Handle boolean columns
elif col.startswith('is_') or col.endswith('_enabled'):
df[col] = df[col].astype(bool)
# Handle currency
elif col.endswith('_amount') or col.endswith('_price'):
df[col] = pd.to_numeric(df[col], errors='coerce')
return df
Error Handling in Production
Ask AI to include error handling for strong transformation:
def safe_transform_api_response(response, required_fields=None):
"""Transform with comprehensive error handling"""
try:
# Validate response structure
if not isinstance(response, dict):
raise ValueError(f"Expected dict, got {type(response)}")
if 'data' not in response:
logging.warning("Missing 'data' key in response")
return pd.DataFrame()
records = response['data']
if not isinstance(records, list):
records = [records]
# Transform
df = pd.json_normalize(records, sep='_', max_level=3)
# Validate required fields
if required_fields:
missing = set(required_fields) - set(df.columns)
if missing:
logging.error(f"Missing required fields: {missing}")
raise ValueError(f"Missing fields: {missing}")
# Log transformation summary
logging.info(f"Transformed {len(df)} records from API response")
return df
except Exception as e:
logging.error(f"Transformation failed: {str(e)}", exc_info=True)
raise
# Usage
try:
df = safe_transform_api_response(
api_response,
required_fields=['user_id', 'email', 'created_at']
)
except Exception as e:
# Handle or retry
pass
Performance Optimization for Large Datasets
For APIs returning thousands of records:
def efficient_batch_transform(api_iterator, batch_size=1000):
"""Process large responses in batches"""
dfs = []
for batch_idx, response in enumerate(api_iterator):
try:
df = pd.json_normalize(
response['data'],
sep='_',
max_level=2 # Limit nesting depth
)
dfs.append(df)
# Consolidate every N batches to manage memory
if (batch_idx + 1) % 10 == 0:
consolidated = pd.concat(dfs, ignore_index=True)
yield consolidated
dfs = []
except Exception as e:
logging.warning(f"Batch {batch_idx} failed: {e}")
continue
# Yield remaining
if dfs:
yield pd.concat(dfs, ignore_index=True)
Testing Your Transformations
Prompt AI to generate tests for transformation logic:
import unittest
class TestAPITransformation(unittest.TestCase):
def setUp(self):
self.simple_response = {
'data': [
{'id': '1', 'name': 'Test', 'value': 100}
]
}
def test_empty_response(self):
result = safe_transform_api_response({'data': []})
self.assertEqual(len(result), 0)
def test_missing_nested_fields(self):
response = {
'data': [
{'id': '1', 'profile': {}} # Missing name
]
}
result = safe_transform_api_response(response)
self.assertTrue('name' in result.columns)
self.assertTrue(pd.isna(result.loc[0, 'name']))
def test_type_conversion(self):
response = {
'data': [
{'id': '1', 'created_at': '2026-03-20T12:00:00Z'}
]
}
result = safe_transform_api_response(response)
self.assertEqual(result['created_at'].dtype, 'datetime64[ns]')
Choosing the Right Approach
For occasional transformations, prompt an LLM with your JSON sample and desired output. For recurring API calls, build reusable transformation functions with AI assistance, then version-control and test them. For complex nested responses, consider a two-step approach: use AI to understand the structure, then implement targeted extraction logic.
The key advantage of AI-assisted transformation is speed. What might take an hour of manual parsing and debugging can often be accomplished in minutes with the right AI tool generating initial code that you then refine and maintain.
Production Deployment Checklist
- Test transformation with real API data (not samples)
- Validate column names match downstream expectations
- Implement error handling and logging
- Add type validation for critical columns
- Measure transformation latency on large datasets
- Set up alerts for transformation failures
- Document transformation logic and assumptions
- Version control all transformation code
- Create test suite covering edge cases
Related Articles
- Claude vs ChatGPT for Converting REST API Documentation
- ChatGPT Network Error on Long Responses: How to Fix in 2026
- Copilot vs Cursor for Writing Clean Prisma Schema with Relat
- AI Tools for Creating dbt Model Definitions from Raw Databas
- Steve AI vs Raw Shorts: AI Video Creation Comparison for
Built by theluckystrike — More at zovo.one