Claude Skills Guide

Claude Code for Data Anonymization Workflow Guide

Data privacy has become a non-negotiable aspect of modern software development. Whether you’re preparing datasets for testing, sharing customer data with third-party analytics tools, or complying with GDPR, CCPA, or HIPAA regulations, proper data anonymization is essential. This guide shows you how to use Claude Code to build robust, automated data anonymization workflows that protect sensitive information while maintaining data utility for development and analytics purposes.

Understanding Data Anonymization Fundamentals

Before diving into implementation, let’s clarify what data anonymization means in practice. Anonymization goes beyond simple data masking—it transforms personal data so individuals can no longer be identified, even when combined with other datasets. The key distinction is between:

Claude Code excels at this task because it can understand your specific data schema, identify PII (Personally Identifiable Information), and generate appropriate transformation logic. The AI can reason about context—which fields are truly sensitive versus which are merely confidential.

Building Your First Anonymization Pipeline

Let’s create a practical anonymization workflow using Claude Code. We’ll build a skill that handles common anonymization scenarios.

Setting Up Your Anonymization Skill

First, create a new skill for data anonymization:

# Create the skill: touch ~/.claude/skills/data-anonymization.md

Configure your skill with the necessary capabilities in CLAUDE.md:

# Data Anonymization Skill

## Capabilities
- Detect PII fields in data schemas (JSON, CSV, SQL dumps)
- Apply appropriate anonymization transformations
- Preserve data types and formats where possible
- Handle nested and relational data structures

## Transformation Rules
- Names → Hash or pseudonymize
- Emails → Hash with consistent salt
- Phone numbers → Mask or generate fake
- Addresses → Generalize to city/region
- Dates → Shift by random offset (preserve intervals)
- IDs → Consistent hashing (referential integrity)
- Financial data → Round or range-based generalization

Core Anonymization Functions

Create the implementation with these essential functions:

// anonymizer.ts - Core anonymization utilities
import crypto from 'crypto';

const SALT = process.env.ANONYMIZATION_SALT || 'default-salt-change-in-production';

export function hashValue(value: string): string {
  return crypto.createHmac('sha256', SALT)
    .update(value)
    .digest('hex')
    .substring(0, 16);
}

export function anonymizeEmail(email: string): string {
  const [local, domain] = email.split('@');
  const hashedLocal = hashValue(local).substring(0, 8);
  return `${hashedLocal}@${domain}`;
}

export function maskPhoneNumber(phone: string): string {
  // Keep only last 4 digits
  return phone.replace(/\d(?=\d{4})/g, '*');
}

export function shiftDate(date: Date, maxDays: number = 30): Date {
  const offset = Math.floor(Math.random() * maxDays * 2) - maxDays;
  return new Date(date.getTime() + offset * 24 * 60 * 60 * 1000);
}

export function generalizeLocation(address: string): string {
  // Extract city/region from address
  const parts = address.split(',');
  return parts.length > 1 ? parts[parts.length - 2].trim() : 'Unknown';
}

Handling Complex Data Structures

Real-world data rarely comes as flat JSON. Here’s how Claude Code can help with nested objects and arrays:

// recursive-anonymizer.ts
interface AnonymizeOptions {
  preserveRelations: boolean;
  shiftDates: boolean;
  hashIds: boolean;
}

export function anonymizeObject(
  data: any,
  schema: FieldSchema[],
  options: AnonymizeOptions
): any {
  if (Array.isArray(data)) {
    return data.map(item => anonymizeObject(item, schema, options));
  }
  
  if (typeof data === 'object' && data !== null) {
    const result: any = {};
    for (const [key, value] of Object.entries(data)) {
      const fieldSchema = schema.find(s => s.field === key);
      
      if (!fieldSchema || !fieldSchema.isSensitive) {
        result[key] = value;
        continue;
      }
      
      result[key] = applyTransformation(value, fieldSchema.type, options);
    }
    return result;
  }
  
  return data;
}

function applyTransformation(value: any, type: string, options: AnonymizeOptions): any {
  switch (type) {
    case 'email':
      return anonymizeEmail(value);
    case 'phone':
      return maskPhoneNumber(value);
    case 'date':
      return options.shiftDates ? shiftDate(new Date(value)) : value;
    case 'id':
      return options.hashIds ? hashValue(String(value)) : value;
    case 'name':
      return hashValue(value).substring(0, 12);
    default:
      return '[REDACTED]';
  }
}

Workflow Integration Patterns

Database Export Anonymization

When exporting production data for staging or testing environments, automate the entire pipeline:

# anonymization-pipeline.yaml
name: Data Export Anonymization
on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Target environment'
        required: true
        type: choice
        options: [staging, testing, development]

steps:
  - name: Export production data
    run: |
      psql $DATABASE_URL -c "COPY users TO STDOUT CSV HEADER" > raw_users.csv
  
  - name: Anonymize data
    run: |
      npx ts-node anonymize.ts --input raw_users.csv \
        --output anonymized_users.csv \
        --rules rules/production-rules.json
  
  - name: Verify anonymization
    run: |
      npx ts-node verify-anonymization.ts anonymized_users.csv
  
  - name: Upload to target environment
    run: |
      aws s3 cp anonymized_users.csv s3://${{ inputs.environment }}-data/

Real-Time API Response Anonymization

For APIs that return sensitive data, create middleware that automatically anonymizes responses:

// anonymization-middleware.ts
import { anonymizeObject } from './recursive-anonymizer';

const RESPONSE_SCHEMA: FieldSchema[] = [
  { field: 'email', type: 'email', isSensitive: true },
  { field: 'phone', type: 'phone', isSensitive: true },
  { field: 'ssn', type: 'string', isSensitive: true },
  { field: 'dateOfBirth', type: 'date', isSensitive: true },
  { field: 'address', type: 'address', isSensitive: true },
];

export function anonymizeApiResponse(data: any): any {
  return anonymizeObject(data, RESPONSE_SCHEMA, {
    preserveRelations: true,
    shiftDates: true,
    hashIds: true,
  });
}

// Express middleware example
app.get('/api/users', (req, res, next) => {
  const originalJson = res.json.bind(res);
  res.json = (data) => {
    return originalJson(anonymizeApiResponse(data));
  };
  next();
});

Best Practices for Production Workflows

1. Define Clear Transformation Rules

Document your anonymization rules explicitly. Claude Code can help generate these rules based on your data:

{
  "fieldRules": {
    "email": { "method": "hash", "preserveDomain": true },
    "phone": { "method": "mask", "visibleDigits": 4 },
    "firstName": { "method": "pseudonymize" },
    "lastName": { "method": "pseudonymize" },
    "dateOfBirth": { "method": "shift", "maxOffsetDays": 30 },
    "ssn": { "method": "redact" },
    "creditCard": { "method": "tokenize" }
  }
}

2. Verify Referential Integrity

When anonymizing relational data, ensure foreign key relationships remain consistent:

// preserve-references.ts
export class ReferencePreserver {
  private idMap: Map<string, string> = new Map();
  
  getAnonymizedId(originalId: string): string {
    if (!this.idMap.has(originalId)) {
      this.idMap.set(originalId, hashValue(originalId));
    }
    return this.idMap.get(originalId)!;
  }
  
  // Use same preserver for all related tables
}

3. Test Your Anonymization

Always verify that anonymization worked correctly before using the data:

// verify-anonymization.ts
export function verifyAnonymization(
  original: any[],
  anonymized: any[],
  sensitiveFields: string[]
): VerificationResult {
  const issues: string[] = [];
  
  for (const field of sensitiveFields) {
    // Check that no original values appear in anonymized data
    const originalValues = new Set(original.map(row => row[field]));
    const anonymizedValues = anonymized.map(row => row[field]);
    
    for (const value of anonymizedValues) {
      if (originalValues.has(value)) {
        issues.push(`Field '${field}' contains unmasked value: ${value}`);
      }
    }
  }
  
  return { passed: issues.length === 0, issues };
}

Automating with Claude Code Skills

Create a reusable skill that your entire team can use:

#CLAUDE.md
## Anonymization Skill

You help with data anonymization tasks. When asked to anonymize data:

1. First analyze the data structure and identify sensitive fields
2. Apply appropriate transformation methods based on field type
3. Verify that no PII remains in output
4. Generate transformation rules for future use

Common transformations:
- Email → hash local part, preserve domain
- Phone → mask all but last 4 digits
- Names → pseudonymize consistently
- Dates → shift by random offset
- IDs → hash consistently for referential integrity
- Addresses → generalize to region/city level

Always verify results and log transformation rules used.

Conclusion

Building automated data anonymization workflows with Claude Code significantly reduces the risk of data breaches while accelerating development cycles. By defining clear transformation rules, handling complex data structures properly, and integrating verification steps, you can create production-ready pipelines that protect sensitive information at scale.

Start with simple use cases like test data generation, then expand to real-time API response anonymization and database export pipelines. Claude Code’s ability to understand context and generate appropriate code makes this process much more efficient than traditional manual approaches.

Remember: data anonymization is not a one-time task but an ongoing process. Regularly review and update your transformation rules as new data fields are added and privacy regulations evolve.

Built by theluckystrike — More at zovo.one