Claude Code for Shell Operator Workflow Tutorial

Shell operators are fundamental to infrastructure automation, enabling you to build custom controllers that manage resources outside the Kubernetes API. Whether you’re creating a Kubernetes Operator that wraps a CLI tool, building a shell-based automation system, or managing infrastructure as code, Claude Code can dramatically accelerate your workflow. This tutorial shows you how to use Claude Code effectively when building and maintaining shell operator workflows.

Understanding Shell Operators

A shell operator is essentially a program that runs in a loop, watching for events and taking action when something changes. In the Kubernetes ecosystem, operators extend the API to manage custom resources. Shell operators typically work by:

Watching for changes in Custom Resources (CRs)
Running shell commands to reconcile the desired state
Reporting status back to the cluster
Handling retries and error recovery

Claude Code can assist at every stage—from initial operator design to debugging production issues.

Setting Up Your Operator Project

Start by describing your operator requirements to Claude. Instead of writing boilerplate code manually, explain what you need:

/shell-operator Create a Kubernetes operator that manages backup operations for a MySQL database. It should watch a Backup custom resource, run mysqldump commands, upload to S3, and report status.

Claude will generate a project structure with proper organization. For shell operators, expect output like:

backup-operator/
├── Dockerfile
├── deploy/
│   ├── crd.yaml
│   ├── rbac.yaml
│   └── operator.yaml
├── reconcile.sh
├── backup.sh
└── test/
    ├── integration.sh
    └── mock_data/

Core Operator Patterns

The Reconciliation Loop

Every operator needs a reconciliation loop that watches for changes and takes action. Here’s a pattern Claude often generates:

#!/bin/bash

# Reconciliation loop for shell operator
NAMESPACE="${NAMESPACE:-default}"
RESOURCE_NAME="${RESOURCE_NAME:-}"
RESOURCE_GROUP="${RESOURCE_GROUP:-example.com}"
RESOURCE_VERSION="${RESOURCE_VERSION:-v1}"
RESOURCE_PLURAL="${RESOURCE_PLURAL:-backups}"

# Watch for changes using kubectl
kubectl get "${RESOURCE_PLURAL}" \
    --namespace="${NAMESPACE}" \
    --watch \
    --request-timeout=30s \
    --field-selector="metadata.name=${RESOURCE_NAME}" \
    -o jsonpath='{.items[*]}' | jq -r '.[] | @base64' | while read -r item; do
    NAME=$(echo "$item" | jq -r '.metadata.name')
    SPEC=$(echo "$item" | jq -r '.spec')
    GENERATION=$(echo "$item" | jq -r '.metadata.generation')
    
    # Check if reconciliation needed
    ANNOTATION_GEN=$(echo "$item" | jq -r '.metadata.annotations."operators.example.com/reconciled-generation // "0"')
    
    if [ "$GENERATION" != "$ANNOTATION_GEN" ]; then
        echo "Reconciling $NAME (generation: $GENERATION)"
        ./reconcile.sh "$item"
    fi
done

Claude generates this with proper error handling and the ability to handle edge cases you might not initially consider.

Handling Status Updates

Shell operators need to update resource status. Here’s a common pattern:

#!/bin/bash

update_status() {
    local name="$1"
    local namespace="$2"
    local phase="$3"
    local message="$4"
    
    kubectl patch backup "$name" \
        --namespace="$namespace" \
        --type=merge \
        --subresource=status \
        --patch="{\"status\": {\"phase\": \"$phase\", \"message\": \"$message\"}}"
}

# Usage
update_status "backup-001" "production" "Running" "Starting backup process"

Debugging Operator Issues

When your operator fails in production, Claude becomes invaluable for debugging. Describe the symptoms:

/shell-operator My backup operator is stuck in "Running" phase. The logs show "exec format error" but the container is running. Help me debug.

Claude will guide you through common issues:

Exec format error: Usually indicates the shell script is missing the shebang or has Windows line endings
Permission denied: Check that your script has execute permissions in the container image
Missing dependencies: Verify all required commands are available in your operator image

Common Debugging Patterns

# Debug: Enable verbose output
set -x  # Print commands and arguments as they execute

# Debug: Exit on error (helpful during development)
set -e

# Debug: Treat unset variables as error
set -u

# Debug: Capture full output
exec > >(tee /var/log/operator.log) 2>&1

Building Operator Skills

You can create a Claude Skill specifically for your operator to ensure consistent behavior:

# skill.yaml
name: shell-operator
description: "Specialized assistance for building and debugging shell-based Kubernetes operators"

Testing Your Operator

Automated testing is crucial for reliable operators. Claude can help set up comprehensive test suites:

#!/bin/bash

# Unit test for backup function
test_backup_mysqldump() {
    local expected_args="--single-transaction --quick --lock-tables=false"
    
    # Mock mysqldump
    mysqldump() {
        echo "Mock mysqldump called with: $*"
        if [[ "$*" == *"$expected_args"* ]]; then
            return 0
        else
            return 1
        fi
    }
    
    # Run test
    export -f mysqldump
    result=$(./backup.sh "test-db" 2>&1)
    
    if echo "$result" | grep -q "Mock mysqldump called"; then
        echo "✓ Unit test passed"
        return 0
    else
        echo "✗ Unit test failed"
        return 1
    fi
}

# Integration test using kind
test_operator_integration() {
    kind create cluster --name operator-test
    
    # Install CRD
    kubectl apply -f deploy/crd.yaml
    
    # Deploy operator
    kubectl apply -f deploy/operator.yaml
    
    # Create test resource
    kubectl apply -f test/cr.yaml
    
    # Wait for reconciliation
    sleep 10
    
    # Verify status
    phase=$(kubectl get backup test-backup -o jsonpath='{.status.phase}')
    
    if [ "$phase" == "Completed" ]; then
        echo "✓ Integration test passed"
    else
        echo "✗ Integration test failed: phase=$phase"
        return 1
    fi
    
    kind delete cluster --name operator-test
}

Best Practices

Resource Management

Always handle cleanup properly in shell operators:

cleanup() {
    # Remove temporary files
    rm -rf /tmp/backup-*
    
    # Close file descriptors
    exec 3>&-
    
    # Kill child processes
    jobs -p | xargs -r kill
}

trap cleanup EXIT SIGTERM SIGINT

Secret Handling

Never log secrets:

# Good: Redact sensitive values
log_message() {
    local msg="$1"
    echo "[$(date)] ${msg//${DB_PASSWORD}/******}"
}

# Good: Use sealed secrets or external secret operators
# Reference secrets as files, not environment variables
DB_PASSWORD=$(cat /secrets/db/password)

Observability

Add structured logging:

log_json() {
    local level="$1"
    local message="$2"
    local resource="$3"
    
    jq -n \
        --arg level "$level" \
        --arg message "$message" \
        --arg resource "$resource" \
        --arg timestamp "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
        '{level: $level, message: $message, resource: $resource, timestamp: $timestamp}'
}

Conclusion

Claude Code transforms shell operator development from manually writing scripts to describing requirements and letting AI generate robust, production-ready code. By using Claude’s capabilities for code generation, debugging, and skill creation, you can build more reliable operators faster. Start with clear descriptions of your operator’s purpose, use skills to maintain consistency, and always test thoroughly before deploying to production.

The key is treating Claude as a partner in your development workflow—not just a code generator, but a debugger, reviewer, and advisor who can help you navigate the complexities of operator development.

Built by theluckystrike — More at zovo.one