Python data science work spans multiple domains: data wrangling with pandas, statistical modeling with sklearn, visualization with matplotlib/seaborn/plotly, and exploratory analysis in Jupyter. Claude and GitHub Copilot take fundamentally different approaches — Claude as a conversational assistant for complex analysis, Copilot as an inline completion engine. This guide tests both on real data science tasks.
Task 1: Pandas Data Cleaning Pipeline
Scenario: A messy CSV with mixed types, nulls, inconsistent date formats, and duplicate rows. Build a cleaning pipeline.
Claude (conversational prompt):
I have a sales CSV with these issues:
- 'amount' column has values like "$1,234.56", "1234.56", and nulls
- 'date' column has mixed formats: "2025-03-15", "15/03/2025", "Mar 15, 2025"
- 'customer_id' has duplicates (keep latest by date)
- 'region' column has "US", "usa", "United States" (normalize to ISO)
Write a pandas pipeline using method chaining.
Claude’s output:
import pandas as pd
import numpy as np
from dateutil import parser as dateutil_parser
REGION_MAP = {
"us": "US", "usa": "US", "united states": "US", "u.s.": "US",
"uk": "GB", "united kingdom": "GB", "england": "GB",
"ca": "CA", "canada": "CA",
# extend as needed
}
def parse_amount(val):
if pd.isna(val):
return np.nan
return float(str(val).replace("$", "").replace(",", "").strip())
def parse_flexible_date(val):
if pd.isna(val):
return pd.NaT
try:
return dateutil_parser.parse(str(val), dayfirst=False)
except (ValueError, OverflowError):
return pd.NaT
def clean_sales(df: pd.DataFrame) -> pd.DataFrame:
return (
df
.assign(
amount=lambda x: x["amount"].map(parse_amount),
date=lambda x: x["date"].map(parse_flexible_date),
region=lambda x: (
x["region"]
.str.strip()
.str.lower()
.map(REGION_MAP)
.fillna(x["region"].str.upper()) # keep unknown as-is
),
)
.dropna(subset=["amount", "date"])
.sort_values("date", ascending=False)
.drop_duplicates(subset=["customer_id"], keep="first")
.reset_index(drop=True)
.astype({"amount": "float64"})
.assign(date=lambda x: pd.to_datetime(x["date"]))
)
Claude produces a fully chained pipeline with custom parsers, handles edge cases in region normalization, and uses dateutil for flexible date parsing. It also explains why drop_duplicates(keep="first") after sort_values(date, ascending=False) keeps the latest row.
Copilot (inline in Jupyter):
When you type def clean_sales(df): and add a comment # normalize region names, parse currency, parse dates, Copilot generates a simpler version:
def clean_sales(df):
# normalize region names
df['region'] = df['region'].str.upper().str.strip()
# parse currency
df['amount'] = df['amount'].str.replace('$', '').str.replace(',', '').astype(float)
# parse dates
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True, errors='coerce')
# remove duplicates
df = df.sort_values('date').drop_duplicates('customer_id', keep='last')
return df
Copilot’s version is simpler but has issues: infer_datetime_format=True is deprecated in pandas 2.x, .str.replace('$', '') raises an error on NaN values, and it mutates the input DataFrame instead of returning a clean copy.
Winner: Claude — handles edge cases, idiomatic chaining, deprecation-aware.
Task 2: Sklearn Pipeline with Custom Transformer
Prompt: “Build a sklearn pipeline for a classification task. Include: StandardScaler for numeric, OneHotEncoder for categorical, a custom transformer that adds log1p features for skewed numerics, then LogisticRegression. Make it compatible with GridSearchCV.”
Claude:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
import numpy as np
import pandas as pd
class Log1pTransformer(BaseEstimator, TransformerMixin):
"""Applies log1p to columns with skewness above threshold."""
def __init__(self, skew_threshold: float = 0.5):
self.skew_threshold = skew_threshold
self.skewed_cols_: list[int] = []
def fit(self, X, y=None):
if isinstance(X, pd.DataFrame):
X = X.values
skewness = pd.DataFrame(X).skew()
self.skewed_cols_ = list(skewness[abs(skewness) > self.skew_threshold].index)
return self
def transform(self, X):
X = np.array(X, dtype=float).copy()
X[:, self.skewed_cols_] = np.log1p(X[:, self.skewed_cols_])
return X
def build_pipeline(
numeric_features: list[str],
categorical_features: list[str],
) -> Pipeline:
numeric_transformer = Pipeline([
("log1p", Log1pTransformer(skew_threshold=0.5)),
("scaler", StandardScaler()),
])
categorical_transformer = OneHotEncoder(
handle_unknown="ignore",
sparse_output=False,
)
preprocessor = ColumnTransformer([
("num", numeric_transformer, numeric_features),
("cat", categorical_transformer, categorical_features),
])
return Pipeline([
("preprocessor", preprocessor),
("classifier", LogisticRegression(max_iter=1000, random_state=42)),
])
# GridSearchCV-compatible parameter grid
param_grid = {
"preprocessor__num__log1p__skew_threshold": [0.3, 0.5, 0.8],
"classifier__C": [0.01, 0.1, 1.0, 10.0],
"classifier__solver": ["lbfgs", "saga"],
}
pipeline = build_pipeline(
numeric_features=["age", "income", "tenure_days"],
categorical_features=["region", "plan_type"],
)
grid_search = GridSearchCV(
pipeline, param_grid, cv=5, scoring="roc_auc", n_jobs=-1, verbose=1
)
Claude correctly implements the sklearn estimator API (fit/transform, BaseEstimator, TransformerMixin) and uses the __ double-underscore notation for GridSearchCV parameter names.
Copilot struggles here without context — it generates a basic pipeline but doesn’t implement a proper custom transformer. It typically uses FunctionTransformer instead, which doesn’t expose hyperparameters for GridSearchCV.
Winner: Claude — proper sklearn API, GridSearchCV-compatible custom transformer.
Task 3: EDA and Visualization
For Jupyter-based EDA, Copilot has an advantage: it sees your DataFrame columns and generates relevant visualizations inline.
Copilot inline (after loading a DataFrame):
# Type: # plot distribution of amount by region
# Copilot generates:
import matplotlib.pyplot as plt
import seaborn as sns
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Distribution plot
sns.histplot(data=df, x='amount', hue='region', bins=30, ax=axes[0])
axes[0].set_title('Amount Distribution by Region')
axes[0].set_xlabel('Amount ($)')
# Box plot for outlier visibility
sns.boxplot(data=df, x='region', y='amount', ax=axes[1])
axes[1].set_title('Amount by Region (Box Plot)')
axes[1].tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
Copilot’s visualization completions are fast and contextually relevant. Claude requires more prompt effort but generates more sophisticated plots with proper statistical annotations.
Task 4: Time Series Feature Engineering
Time series work is where the gap between the two tools becomes clear. Claude can reason about lag features, rolling statistics, and seasonality; Copilot generates boilerplate without understanding the domain.
Prompt to Claude: “I have a daily sales DataFrame with columns: date, sales, store_id. Create a feature engineering function that adds: 7-day rolling mean and std, lag features for 1, 7, and 30 days, day-of-week and month indicators, and a ‘is_weekend’ flag. Handle the NaN values from lagging correctly.”
Claude:
import pandas as pd
import numpy as np
def engineer_time_features(
df: pd.DataFrame,
date_col: str = "date",
target_col: str = "sales",
group_col: str = "store_id",
) -> pd.DataFrame:
"""
Add time-series features to a daily sales DataFrame.
Applies lag and rolling features within each store group
to avoid data leakage across stores.
"""
df = df.copy()
df[date_col] = pd.to_datetime(df[date_col])
df = df.sort_values([group_col, date_col]).reset_index(drop=True)
def add_features(group: pd.DataFrame) -> pd.DataFrame:
g = group.copy()
# Lag features — shift within the group
for lag in [1, 7, 30]:
g[f"{target_col}_lag_{lag}d"] = g[target_col].shift(lag)
# Rolling statistics (requires lag to avoid leakage)
g[f"{target_col}_roll7_mean"] = (
g[target_col].shift(1).rolling(7, min_periods=4).mean()
)
g[f"{target_col}_roll7_std"] = (
g[target_col].shift(1).rolling(7, min_periods=4).std()
)
return g
df = df.groupby(group_col, group_keys=False).apply(add_features)
# Calendar features (don't need grouping)
df["day_of_week"] = df[date_col].dt.dayofweek # 0=Monday
df["month"] = df[date_col].dt.month
df["is_weekend"] = df["day_of_week"].isin([5, 6]).astype(int)
df["quarter"] = df[date_col].dt.quarter
return df
# Usage
featured_df = engineer_time_features(raw_df)
# NaN summary — expected for early rows
print(featured_df.isna().sum())
# Drop rows where any lag feature is NaN before training
train_df = featured_df.dropna(subset=[
"sales_lag_1d", "sales_lag_7d", "sales_lag_30d",
"sales_roll7_mean", "sales_roll7_std",
])
The critical detail: using .shift(1) before .rolling() prevents data leakage. Copilot sometimes generates rolling features on the raw column, which means the rolling mean at time T includes the value at time T — leaking the target into the feature.
Winner: Claude — understands data leakage risk, applies groupby correctly per store.
Task 5: Debugging pandas Errors
Claude’s conversational ability shines for debugging. Paste in a stack trace and Claude identifies root causes accurately.
Error to debug:
ValueError: Cannot convert non-finite values (NA or inf) to integer
Claude’s diagnosis: This error occurs when casting a float column containing NaN to int. Claude identifies the three root causes and gives the fix for each:
# Root cause 1: Column has NaN, astype(int) fails
df["col"].astype(int) # Fails if any NaN
# Fix 1: Fill NaN before casting
df["col"].fillna(0).astype(int)
# Fix 2: Use nullable integer type (pandas 1.0+)
df["col"].astype("Int64") # Capital I — supports pd.NA
# Fix 3: Drop NaN rows first
df.dropna(subset=["col"])["col"].astype(int)
# Root cause 2: Column has inf values from division
df["rate"] = df["a"] / df["b"] # inf when b==0
# Fix: Replace inf before casting
df["rate"] = df["rate"].replace([np.inf, -np.inf], np.nan)
Copilot doesn’t debug errors conversationally — you’d need to paste the error into a comment and hope it completes a fix, which works inconsistently.
Tool Selection Guide
| Task | Best Tool | Why |
|---|---|---|
| Complex data cleaning pipeline | Claude | Handles edge cases, no deprecation issues |
| Custom sklearn transformers | Claude | Proper API implementation |
| Quick EDA visualizations | Copilot | Inline, sees DataFrame context |
| Statistical analysis explanations | Claude | Interprets results, not just code |
| Boilerplate sklearn pipelines | Copilot | Fast inline completion |
| Debugging pandas errors | Claude | Full error + code context |
| Feature engineering ideas | Claude | Conversational exploration |
| Time series features | Claude | Understands leakage, groupby patterns |
| Simple plot generation | Copilot | Faster with existing column context |
Task 4: Debugging Statistical Analysis
Scenario: Your model’s validation AUC is 0.68 (bad), but training AUC is 0.95 (overfitting).
Claude prompt:
Training AUC: 0.95, Validation AUC: 0.68
This is a severe overfitting problem. My dataset has:
- 10,000 training samples
- 2,000 validation samples
- 150 features (numerical and categorical)
- Logistic Regression with default parameters
What are the top 5 most likely causes?
Claude generates:
- Too many features relative to samples (150:10000 ratio is bad)
- Categorical features not properly encoded (might have high cardinality)
- No regularization (add C parameter, use L1/L2)
- Data leakage (feature contains target information)
- Class imbalance (minority class overfit)
Claude then provides the diagnostic code:
# Check feature importance — might reveal leakage
from sklearn.inspection import permutation_importance
perm_importance = permutation_importance(model, X_val, y_val, n_repeats=10)
# Check for extreme feature values
print(X_train.describe())
# Verify no target leakage
print("Features correlated with target:")
print(X_train.corrwith(y_train).abs().sort_values(ascending=False).head(10))
Copilot in the same scenario would suggest:
# Add regularization
LogisticRegression(C=0.01, max_iter=1000)
Copilot gives the fix but not the diagnostic. Claude explains why each cause is likely.
Task 5: Hyperparameter Tuning Strategy
Prompt to Claude:
I'm tuning a RandomForest for a classification task:
- Dataset: 50,000 samples, 30 features
- Training takes 5 minutes per model
- Need results within 24 hours (288 minutes = ~57 models max)
Design a hyperparameter tuning strategy. I can evaluate:
1. n_estimators: [50, 100, 200, 500]
2. max_depth: [5, 10, 20, None]
3. min_samples_split: [2, 5, 10, 20]
4. min_samples_leaf: [1, 2, 4, 8]
5. max_features: ['sqrt', 'log2', None]
Which parameters matter most? What's the best search strategy?
Claude analyzes:
Priority ranking (impact on performance):
1. n_estimators: High impact, test [100, 300] first
2. max_depth: High impact, test [10, 20] first
3. min_samples_split: Medium impact, test [5, 10]
4. max_features: Medium impact, test ['sqrt', 'log2']
5. min_samples_leaf: Low impact, skip (use default)
Recommended approach:
- Phase 1 (10 models): GridSearch on n_estimators × max_depth
Time: 50 minutes. Keep top 3 parameter combinations.
- Phase 2 (15 models): RandomSearch on remaining 3 params around winning combo
Time: 75 minutes
- Phase 3 (20 models): Bayesian optimization to fine-tune top performers
Time: 100 minutes
- Total: 225 minutes (leaves 63 minutes buffer)
Implementation:
"""
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from skopt import BayesSearchCV
# Phase 1: GridSearch (most important parameters)
grid = GridSearchCV(
RandomForestClassifier(n_jobs=-1, random_state=42),
{
'n_estimators': [100, 200, 300],
'max_depth': [10, 15, 20, 25]
},
cv=5, scoring='roc_auc', n_jobs=-1
)
grid.fit(X_train, y_train)
# Keep best params
best_params = grid.best_params_
# Phase 2: RandomSearch around winners
search = RandomizedSearchCV(
RandomForestClassifier(**best_params, n_jobs=-1, random_state=42),
{
'min_samples_split': [2, 5, 10, 20],
'max_features': ['sqrt', 'log2'],
'min_samples_leaf': [1, 2, 4]
},
n_iter=15, cv=5, scoring='roc_auc', n_jobs=-1, random_state=42
)
search.fit(X_train, y_train)
"""
This strategy is impossible for Copilot — it can't reason about time constraints and feature importance ranking.
## Handling Imbalanced Data: Claude vs Copilot
**Prompt:** "My dataset is 95% class 0, 5% class 1. How do I fix this?"
**Copilot** suggests:
```python
from imblearn.over_sampling import SMOTE
smote = SMOTE()
X_train_balanced, y_train_balanced = smote.fit_resample(X_train, y_train)
model.fit(X_train_balanced, y_train_balanced)
This is correct but incomplete — SMOTE can cause leakage if applied before train-test split.
Claude explains:
SMOTE during training is common but risky. Best practice:
1. Split first: X_train/y_train, X_test/y_test
2. Apply SMOTE only to training data
3. Evaluate on original test distribution
4. Watch for inflated metrics due to synthetic minority samples
Better code:
"""
from sklearn.model_selection import train_test_split
from imblearn.pipeline import Pipeline
from imblearn.over_sampling import SMOTE
from sklearn.linear_model import LogisticRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# SMOTE inside pipeline — applies only during fit
pipeline = Pipeline([
('smote', SMOTE(random_state=42)),
('classifier', LogisticRegression(max_iter=1000))
])
pipeline.fit(X_train, y_train)
score = pipeline.score(X_test, y_test) # Evaluated on original distribution
"""
Also mention: if SMOTE hurts performance, try class_weight='balanced' instead.
Tool Decision Framework
| Data Science Task | Claude | Copilot | Winner |
|---|---|---|---|
| Exploratory Data Analysis | Good (explains patterns) | Excellent (sees columns) | Copilot |
| Feature engineering ideation | Excellent | Good | Claude |
| Debugging model overfitting | Excellent | Weak | Claude |
| Sklearn pipeline building | Excellent | Good | Claude |
| Visualization code | Good | Excellent (in-notebook) | Copilot |
| Statistical test interpretation | Excellent | Weak | Claude |
| Quick boilerplate code | Moderate | Excellent | Copilot |
| Cross-validation strategy | Excellent | Weak | Claude |
| Hyperparameter tuning design | Excellent | Poor | Claude |
Real Workflow: ML Practitioners Use Both
Smart teams use Claude for reasoning and Copilot for speed:
# Step 1: Use Copilot to generate boilerplate
# Type: "import pandas" → Copilot fills in common imports
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
# Step 2: Load data with Copilot
df = pd.read_csv('data.csv')
X = df.drop('target', axis=1)
y = df['target']
# Step 3: Use Claude for strategy
# "I have 50k samples, 30 features, binary classification.
# Training is slow. What features should I engineer?"
# → Claude generates domain-specific feature ideas
# Step 4: Back to Copilot for implementation
# Type comment: "# normalize numerical features"
# → Copilot suggests StandardScaler
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 5: Claude for debugging
# Paste error: "ValueError: y_true and y_pred have different shapes"
# → Claude explains and provides fix
# Step 6: Copilot for testing boilerplate
# Type: "# test model performance"
from sklearn.metrics import classification_report, confusion_matrix
print(classification_report(y_test, model.predict(X_test_scaled)))
This hybrid approach is faster than either tool alone.
Related Reading
- Best AI Code Completion for Python Data Science 2026
- Best AI Coding Tools for Python Data Science and Pandas
- AI Code Generation for Python FastAPI Endpoints with Pydantic
- Best AI Code Completion for Python Data Science 2026
Built by theluckystrike — More at zovo.one