Claude Skills Guide

Claude Code for Kube State Metrics Workflow: A Complete Guide

Kubernetes has become the backbone of modern cloud-native infrastructure, and understanding what’s happening inside your clusters is crucial for maintaining reliable applications. Kube State Metrics (KSM) plays a vital role in exposing cluster-level objects as Prometheus-compatible metrics, but configuring and extending it can be complex. This is where Claude Code becomes your powerful ally in building efficient Kube State Metrics workflows.

What is Kube State Metrics?

Kube State Metrics is a service that listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects—pods, deployments, services, and more. Unlike node-level metrics (which focus on CPU and memory), KSM focuses on the status and health of your Kubernetes resources.

For example, KSM can tell you:

These metrics are invaluable for understanding cluster health and setting up meaningful alerts.

Setting Up Claude Code for KSM Workflows

Before diving into examples, ensure Claude Code is configured with access to your Kubernetes context. The most effective approach is creating a dedicated skill for Kubernetes operations.

Creating a K8s Operations Skill

# ~/.claude/skills/kubernetes-skill/skill.md
name: Kubernetes Operations
description: Assists with Kubernetes resource management and monitoring

# Context includes kubeconfig and kubectl access
# Focus on Kube State Metrics and Prometheus integration

When working with KSM, provide Claude Code with your cluster context, including which metrics you’re currently tracking and what alerting rules exist. This context helps generate more accurate and relevant configurations.

Practical Examples: Building Your KSM Workflow

Example 1: Deploying and Configuring Kube State Metrics

Claude Code can help you deploy KSM and configure it appropriately for your cluster size:

# KSM Deployment generated with Claude Code guidance
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.10.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 100m
            memory: 190Mi
          limits:
            cpu: 200m
            memory: 250Mi

When asking Claude Code to generate this, specify your cluster’s scale. For large clusters with hundreds of namespaces, you might need to adjust resource limits and potentially use sharding strategies.

Example 2: Creating Custom Metrics with PromQL

Kube State Metrics provides a solid foundation, but you’ll often need custom metrics tailored to your application. Here’s how Claude Code can help:

# Custom PromQL for deployment health
kube_pod_container_status_ready{namespace="production"} 
== 1

# Alert for pods stuck in pending state for more than 5 minutes
kube_pod_status_pending{namespace="production"} == 1
  and
time() - kube_pod_start_time{namespace="production"} > 300

# Deployment replica mismatch alert
kube_deployment_spec_replicas{namespace="production"}
  !=
kube_deployment_status_replicas_available{namespace="production"}

Claude Code excels at writing PromQL queries when you describe what you want to monitor. Provide concrete examples of failure scenarios you’re concerned about, and it can generate appropriate queries.

Example 3: Alerting Rules Configuration

# prometheus-alerts.yaml - Generated with Claude Code
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: ksm-alerts
  namespace: monitoring
spec:
  groups:
  - name: kube-state-metrics
    rules:
    - alert: PodNotReady
      expr: |
        kube_pod_status_ready{namespace=~".*", condition="true"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is not ready"
        description: "Pod has been not ready for more than 5 minutes"
    
    - alert: DeploymentReplicasMismatch
      expr: |
        kube_deployment_spec_replicas != kube_deployment_status_replicas_available
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Deployment replicas mismatch"
        description: "Deployment {{ $labels.namespace }}/{{ $labels.deployment }} has mismatched replicas"

Advanced Workflow: Extending Kube State Metrics

Building Custom Metrics Exporters

When KSM doesn’t expose the specific metrics you need, consider building a custom exporter. Claude Code can help scaffold this:

// Custom metrics exporter structure
package main

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/rest"
    "net/http"
)

var (
    customMetric = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "myapp_custom_metric",
            Help: "A custom metric for my application",
        },
        []string{"namespace", "pod"},
    )
)

func init() {
    prometheus.MustRegister(customMetric)
}

func collectCustomMetrics(clientset *kubernetes.Clientset) {
    pods, err := clientset.CoreV1().Pods("").List(context.Background(), metav1.ListOptions{})
    if err != nil {
        return
    }
    
    for _, pod := range pods.Items {
        customMetric.WithLabelValues(pod.Namespace, pod.Name).Set(float64(pod.Status.ContainerStatuses[0].RestartCount))
    }
}

Integrating with Service-Level Objectives (SLOs)

For teams practicing SRE, Claude Code can help translate KSM metrics into meaningful SLOs:

# SLO Configuration Example
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-slo-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: my-application
  endpoints:
  - port: web
    path: /metrics
  namespaceSelector:
    matchNames:
    - production

Actionable Advice for KSM Workflows

Start Simple and Iterate

Begin with the default KSM metrics. They’re battle-tested and cover most common use cases. Only extend with custom metrics when you have specific operational needs that the defaults don’t address.

Resource Planning for KSM

As your cluster grows, KSM resource requirements increase. Here’s a practical guideline:

Cluster Size Recommended Replicas CPU Request Memory Request
< 50 pods 1 100m 190Mi
50-200 pods 1-2 200m 250Mi
200+ pods 2-3 500m 500Mi

Claude Code can help you calculate these based on your actual cluster metrics.

Implementing Cost-Based Alerting

One powerful pattern is correlating KSM metrics with cloud provider costs. Use pod restart counts and resource requests to identify inefficient workloads:

# Finding over-provisioned pods
kube_pod_container_resource_requests{resource="cpu"} 
  / on(pod, namespace) kube_pod_status_ready{condition="true"}
  > 0.5

This helps identify pods requesting more CPU than they actually use.

Monitoring Claude Code Itself

Don’t forget to monitor your AI-assisted workflows. Track metrics like:

This meta-monitoring helps you optimize your own processes.

Conclusion

Claude Code transforms Kube State Metrics from a complex configuration task into a streamlined workflow. By using AI assistance for generating manifests, writing PromQL queries, and designing alerting strategies, you can build robust Kubernetes monitoring faster while reducing errors.

The key is providing Claude Code with clear context about your cluster environment and specific monitoring requirements. Start with basic deployments, then incrementally add custom metrics as your observability needs evolve.

Remember: effective Kubernetes monitoring isn’t about collecting every possible metric—it’s about collecting the right metrics that help you understand system health and respond quickly to issues. Claude Code can help you identify and implement exactly what you need.

Built by theluckystrike — More at zovo.one