Claude Code Docker Compose Test Setup Guide

Running Claude Code skills inside Docker Compose gives you repeatable test environments where you can spin up databases, mock APIs, and isolated skill executions without polluting your host system. This guide walks through practical setups for testing skills that interact with external services, databases, and CI pipelines.

Why Use Docker Compose for Skill Testing

When you test Claude Code skills that modify files, call APIs, or manage infrastructure, you need controlled environments. Docker Compose lets you:

Spin up fresh databases for each test run
Mock external APIs that your skills depend on
Run multiple Claude Code instances in isolation
Reproduce CI failures locally
Test skills that require specific runtime versions

The tdd skill, for example, works best when it can create and destroy test databases without affecting your local setup. Similarly, the pdf skill needs controlled file system access that containers provide naturally.

Basic Docker Compose Setup for Claude Code

Create a docker-compose.yml that runs Claude Code in an isolated container with access to your project files:

version: '3.8'

services:
  claude-code:
    image: node:20-alpine
    working_dir: /app
    volumes:
      - ./project:/app
      - claude_cache:/root/.cache/claude
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
    command: tail -f /dev/null
    networks:
      - claude-network

  test-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: testdb
      POSTGRES_USER: testuser
      POSTGRES_PASSWORD: testpass
    networks:
      - claude-network

  mock-api:
    image: mockserver/mockserver:latest
    environment:
      MOCKSERVER_INITIALIZATION_JSON_PATH: /config/init.json
    ports:
      - "1080:1080"
    networks:
      - claude-network

volumes:
  claude_cache:

networks:
  claude-network:
    driver: bridge

This setup gives you three containers: one for Claude Code, one for a PostgreSQL test database, and one for a mock API server. All three share a network so they can communicate.

Testing the tdd Skill in Docker

The tdd skill shines when you need to generate tests against a fresh database. Here’s how to test it:

# Start the environment
docker compose up -d

# Run the tdd skill inside the container
docker compose exec claude-code npx -y @anthropic-ai/claude-code tdd \
  --pattern "src/**/*.ts" \
  --framework jest \
  --database postgres://testuser:testpass@test-db:5432/testdb

The skill generates tests that connect to the containerized PostgreSQL instance. Because the database is isolated, you can run destructive tests without worry. After testing, destroy everything with docker compose down -v to start fresh.

For the pdf skill, mount a volume containing the documents you want to process:

services:
  claude-code:
    # ... existing config
    volumes:
      - ./project:/app
      - ./documents:/documents:ro
      - claude_cache:/root/.cache/claude

Then run the skill against documents in that folder:

docker compose exec claude-code npx -y @anthropic-ai/claude-code pdf \
  --operation extract \
  --source /documents/report.pdf \
  --output /app/extracted/

Mocking External Services

Skills that call third-party APIs need mocking. Use the mock-api service to intercept requests:

{
  "httpRequest": {
    "method": "POST",
    "path": "/api/v1/users"
  },
  "httpResponse": {
    "statusCode": 201,
    "body": {
      "id": "usr_123",
      "status": "created"
    },
    "delay": {
      "timeUnit": "MILLISECONDS",
      "value": 100
    }
  }
}

Save this as mockserver/init.json and the mock-api container initializes with these expectations. Your skill sees realistic responses without hitting real APIs. This approach works well for testing the supermemory skill when it calls external storage services, or the frontend-design skill when it validates against design system APIs.

Running Multiple Skill Instances

For testing agent swarms or multi-skill workflows, scale the Claude Code service:

docker compose up -d --scale claude-code=3

Each instance gets its own container but shares the network and volumes. This lets you test how skills coordinate through shared state or message queues:

services:
  redis:
    image: redis:7-alpine
    networks:
      - claude-network

  claude-code:
    # ... 
    depends_on:
      - redis
    environment:
      - REDIS_URL=redis://redis:6379

The supermemory skill can use Redis to share context between instances, simulating a multi-agent workflow on your local machine.

CI Integration

Once your Compose setup works locally, translate it to CI. GitHub Actions example:

jobs:
  test-skill:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Start test environment
        run: docker compose up -d
        
      - name: Run skill tests
        run: |
          docker compose exec -T claude-code npx -y \
            @anthropic-ai/claude-code tdd \
            --pattern "src/**/*.ts" \
            --output /app/tests/
          
      - name: Run test suite
        run: docker compose exec -T test-db \
          psql -U testuser -d testdb -f /app/tests/run.sql
      
      - name: Cleanup
        if: always()
        run: docker compose down -v

The -T flag disables pseudo-TTY allocation, which works better in CI environments. The -v flag removes volumes, ensuring each CI run starts with a completely fresh database.

Debugging Skills in Containers

When a skill fails inside Docker, attach to the running container:

docker compose exec claude-code sh

From there, you can inspect the skill’s output, check environment variables, and manually run commands to reproduce issues. The Alpine-based image keeps the footprint small while providing the tools you need for debugging.

For persistent debugging sessions, override the command in your override file:

# docker-compose.override.yml
services:
  claude-code:
    command: sleep infinity
    volumes:
      - ./project:/app
      - ./debug-scripts:/debug

Then exec into the container and run your debugging tools from the mounted /debug folder.

Health Checks for Skill Services

Add health checks to ensure services are ready before running skills:

services:
  test-db:
    image: postgres:15-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U testuser -d testdb"]
      interval: 5s
      timeout: 5s
      retries: 5

  claude-code:
    depends_on:
      test-db:
        condition: service_healthy

Docker Compose waits for the database to be healthy before starting Claude Code, preventing connection failures on startup.

Cleanup and Resource Management

Always clean up after testing:

# Remove containers, networks, and named volumes
docker compose down -v

# Remove unused images to save disk space
docker compose build --no-cache && docker image prune -f

# Remove entirely unused volumes
docker volume prune -f

For faster subsequent runs, keep the images cached but rebuild the volumes:

docker compose down
docker compose up -d --build

This rebuilds containers with your latest skill code while reusing downloaded layers.

Summary

Docker Compose provides the isolation and repeatability you need for testing Claude Code skills. Whether you’re running the tdd skill against a fresh database, the pdf skill on isolated documents, or coordinating multiple instances with supermemory, containers give you confidence that your skills work correctly before deploying to production.

The key patterns are: isolate each test run with fresh volumes, mock external dependencies, scale horizontally for multi-agent tests, and mirror your local setup in CI for consistent results.

Built by theluckystrike — More at zovo.one