Claude Code for Shell Operator Workflow Tutorial
Shell operators are fundamental to infrastructure automation, enabling you to build custom controllers that manage resources outside the Kubernetes API. Whether you’re creating a Kubernetes Operator that wraps a CLI tool, building a shell-based automation system, or managing infrastructure as code, Claude Code can dramatically accelerate your workflow. This tutorial shows you how to use Claude Code effectively when building and maintaining shell operator workflows.
Understanding Shell Operators
A shell operator is essentially a program that runs in a loop, watching for events and taking action when something changes. In the Kubernetes ecosystem, operators extend the API to manage custom resources. Shell operators typically work by:
- Watching for changes in Custom Resources (CRs)
- Running shell commands to reconcile the desired state
- Reporting status back to the cluster
- Handling retries and error recovery
Claude Code can assist at every stage—from initial operator design to debugging production issues.
Setting Up Your Operator Project
Start by describing your operator requirements to Claude. Instead of writing boilerplate code manually, explain what you need:
/shell-operator Create a Kubernetes operator that manages backup operations for a MySQL database. It should watch a Backup custom resource, run mysqldump commands, upload to S3, and report status.
Claude will generate a project structure with proper organization. For shell operators, expect output like:
backup-operator/
├── Dockerfile
├── deploy/
│ ├── crd.yaml
│ ├── rbac.yaml
│ └── operator.yaml
├── reconcile.sh
├── backup.sh
└── test/
├── integration.sh
└── mock_data/
Core Operator Patterns
The Reconciliation Loop
Every operator needs a reconciliation loop that watches for changes and takes action. Here’s a pattern Claude often generates:
#!/bin/bash
# Reconciliation loop for shell operator
NAMESPACE="${NAMESPACE:-default}"
RESOURCE_NAME="${RESOURCE_NAME:-}"
RESOURCE_GROUP="${RESOURCE_GROUP:-example.com}"
RESOURCE_VERSION="${RESOURCE_VERSION:-v1}"
RESOURCE_PLURAL="${RESOURCE_PLURAL:-backups}"
# Watch for changes using kubectl
kubectl get "${RESOURCE_PLURAL}" \
--namespace="${NAMESPACE}" \
--watch \
--request-timeout=30s \
--field-selector="metadata.name=${RESOURCE_NAME}" \
-o jsonpath='{.items[*]}' | jq -r '.[] | @base64' | while read -r item; do
NAME=$(echo "$item" | jq -r '.metadata.name')
SPEC=$(echo "$item" | jq -r '.spec')
GENERATION=$(echo "$item" | jq -r '.metadata.generation')
# Check if reconciliation needed
ANNOTATION_GEN=$(echo "$item" | jq -r '.metadata.annotations."operators.example.com/reconciled-generation // "0"')
if [ "$GENERATION" != "$ANNOTATION_GEN" ]; then
echo "Reconciling $NAME (generation: $GENERATION)"
./reconcile.sh "$item"
fi
done
Claude generates this with proper error handling and the ability to handle edge cases you might not initially consider.
Handling Status Updates
Shell operators need to update resource status. Here’s a common pattern:
#!/bin/bash
update_status() {
local name="$1"
local namespace="$2"
local phase="$3"
local message="$4"
kubectl patch backup "$name" \
--namespace="$namespace" \
--type=merge \
--subresource=status \
--patch="{\"status\": {\"phase\": \"$phase\", \"message\": \"$message\"}}"
}
# Usage
update_status "backup-001" "production" "Running" "Starting backup process"
Debugging Operator Issues
When your operator fails in production, Claude becomes invaluable for debugging. Describe the symptoms:
/shell-operator My backup operator is stuck in "Running" phase. The logs show "exec format error" but the container is running. Help me debug.
Claude will guide you through common issues:
- Exec format error: Usually indicates the shell script is missing the shebang or has Windows line endings
- Permission denied: Check that your script has execute permissions in the container image
- Missing dependencies: Verify all required commands are available in your operator image
Common Debugging Patterns
# Debug: Enable verbose output
set -x # Print commands and arguments as they execute
# Debug: Exit on error (helpful during development)
set -e
# Debug: Treat unset variables as error
set -u
# Debug: Capture full output
exec > >(tee /var/log/operator.log) 2>&1
Building Operator Skills
You can create a Claude Skill specifically for your operator to ensure consistent behavior:
# skill.yaml
name: shell-operator
description: "Specialized assistance for building and debugging shell-based Kubernetes operators"
Testing Your Operator
Automated testing is crucial for reliable operators. Claude can help set up comprehensive test suites:
#!/bin/bash
# Unit test for backup function
test_backup_mysqldump() {
local expected_args="--single-transaction --quick --lock-tables=false"
# Mock mysqldump
mysqldump() {
echo "Mock mysqldump called with: $*"
if [[ "$*" == *"$expected_args"* ]]; then
return 0
else
return 1
fi
}
# Run test
export -f mysqldump
result=$(./backup.sh "test-db" 2>&1)
if echo "$result" | grep -q "Mock mysqldump called"; then
echo "✓ Unit test passed"
return 0
else
echo "✗ Unit test failed"
return 1
fi
}
# Integration test using kind
test_operator_integration() {
kind create cluster --name operator-test
# Install CRD
kubectl apply -f deploy/crd.yaml
# Deploy operator
kubectl apply -f deploy/operator.yaml
# Create test resource
kubectl apply -f test/cr.yaml
# Wait for reconciliation
sleep 10
# Verify status
phase=$(kubectl get backup test-backup -o jsonpath='{.status.phase}')
if [ "$phase" == "Completed" ]; then
echo "✓ Integration test passed"
else
echo "✗ Integration test failed: phase=$phase"
return 1
fi
kind delete cluster --name operator-test
}
Best Practices
Resource Management
Always handle cleanup properly in shell operators:
cleanup() {
# Remove temporary files
rm -rf /tmp/backup-*
# Close file descriptors
exec 3>&-
# Kill child processes
jobs -p | xargs -r kill
}
trap cleanup EXIT SIGTERM SIGINT
Secret Handling
Never log secrets:
# Good: Redact sensitive values
log_message() {
local msg="$1"
echo "[$(date)] ${msg//${DB_PASSWORD}/******}"
}
# Good: Use sealed secrets or external secret operators
# Reference secrets as files, not environment variables
DB_PASSWORD=$(cat /secrets/db/password)
Observability
Add structured logging:
log_json() {
local level="$1"
local message="$2"
local resource="$3"
jq -n \
--arg level "$level" \
--arg message "$message" \
--arg resource "$resource" \
--arg timestamp "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
'{level: $level, message: $message, resource: $resource, timestamp: $timestamp}'
}
Conclusion
Claude Code transforms shell operator development from manually writing scripts to describing requirements and letting AI generate robust, production-ready code. By using Claude’s capabilities for code generation, debugging, and skill creation, you can build more reliable operators faster. Start with clear descriptions of your operator’s purpose, use skills to maintain consistency, and always test thoroughly before deploying to production.
The key is treating Claude as a partner in your development workflow—not just a code generator, but a debugger, reviewer, and advisor who can help you navigate the complexities of operator development.
Related Reading
- Claude Code for Beginners: Complete Getting Started Guide
- Best Claude Skills for Developers in 2026
- Claude Skills Guides Hub
Built by theluckystrike — More at zovo.one