AI Tools for Automated OpenTelemetry Setup

Generating individual OTel spans is one thing. Standing up a full OpenTelemetry pipeline — collector config, SDK initialization, exporter routing, sampling rules — is the harder problem. This guide covers using AI tools to automate the entire OTel setup process, from collector YAML to service-side SDK wiring.

What “Automated Setup” Means

The full OTel setup involves:

Collector config — receivers, processors, exporters, pipelines in otelcol-config.yaml
SDK initialization — per-service tracing/metrics/logs setup
Exporter routing — sending to Jaeger, Prometheus, Grafana Cloud, Datadog, or Honeycomb
Sampling strategy — head-based vs. tail-based, rate limiting
Context propagation — W3C TraceContext headers between services

AI tools compress this from 2-3 hours of docs-reading to 15 minutes of iteration.

Claude for Collector Configuration

Claude handles the OTEL collector config YAML well, including multi-pipeline setups.

Prompt:

Generate an OpenTelemetry Collector config that:
- Receives traces via OTLP gRPC (port 4317) and HTTP (port 4318)
- Receives Prometheus metrics via scrape
- Processes: batch traces, add resource attributes (service.version from env)
- Exports traces to Jaeger (localhost:14250) and Grafana Cloud OTLP endpoint
- Exports metrics to Prometheus remote write
- Uses tail sampling: keep 100% of error traces, 10% of successful traces

Claude’s output:

# otelcol-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 15s
          static_configs:
            - targets: ['localhost:8888']

processors:
  batch:
    timeout: 5s
    send_batch_size: 1024
  resource:
    attributes:
      - key: service.version
        value: ${env:SERVICE_VERSION}
        action: upsert
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 100
    policies:
      - name: errors-policy
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: probabilistic-policy
        type: probabilistic
        probabilistic: {sampling_percentage: 10}

exporters:
  jaeger:
    endpoint: localhost:14250
    tls:
      insecure: true
  otlphttp/grafana:
    endpoint: https://otlp-gateway-prod-us-central-0.grafana.net/otlp
    headers:
      authorization: Basic ${env:GRAFANA_CLOUD_TOKEN}
  prometheusremotewrite:
    endpoint: https://prometheus-prod-us-central-0.grafana.net/api/prom/push
    headers:
      authorization: Basic ${env:GRAFANA_CLOUD_TOKEN}

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resource, tail_sampling, batch]
      exporters: [jaeger, otlphttp/grafana]
    metrics:
      receivers: [otlp, prometheus]
      processors: [resource, batch]
      exporters: [prometheusremotewrite]

Claude correctly orders the processors (resource before sampling before batch) and uses environment variable substitution for secrets.

Why Processor Order Matters

A common error in hand-written OTel collector configs is putting batch before tail_sampling. Tail sampling needs to see all spans for a trace before making a sampling decision — if you batch first, spans can get flushed to the exporter before the sampler has a complete picture. Claude gets this ordering right without being prompted.

The decision_wait of 10 seconds tells the tail sampler to hold traces in memory for up to 10 seconds before making a decision. For services with long-running operations (database transactions, external API calls), you may need to increase this. Claude adds a comment in the generated config when the prompt mentions slow operations.

Node.js SDK Initialization

For Node.js microservices, the SDK setup is the most commonly generated piece:

// Generated by Claude for a Node.js/Express service
// instrumentation.js — must be loaded before any other imports
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: process.env.SERVICE_NAME ?? 'unknown-service',
    [ATTR_SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '0.0.0',
    'deployment.environment': process.env.NODE_ENV ?? 'development',
  }),
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4317',
  }),
  metricReader: new PeriodicExportingMetricReader({
    exporter: new OTLPMetricExporter({
      url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://localhost:4317',
    }),
    exportIntervalMillis: 30_000,
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-http': { enabled: true },
      '@opentelemetry/instrumentation-express': { enabled: true },
      '@opentelemetry/instrumentation-pg': { enabled: true },
      '@opentelemetry/instrumentation-redis': { enabled: true },
    }),
  ],
});

sdk.start();

process.on('SIGTERM', () => {
  sdk.shutdown().finally(() => process.exit(0));
});

Claude uses the current semantic convention constants (ATTR_SERVICE_NAME) rather than the deprecated string literals. GPT-4 sometimes uses the old SemanticResourceAttributes.SERVICE_NAME import path which was removed in SDK 2.x.

The `--require` Pattern for Production

In production Node.js services, you want instrumentation loaded before application code runs. Claude recommends the --require flag rather than top-of-file imports:

// package.json
{
  "scripts": {
    "start": "node --require ./instrumentation.js dist/server.js"
  }
}

This guarantees instrumentation is active before any module-level code executes, which matters for capturing startup-time database connection events that would otherwise be missed.

Python Auto-Instrumentation Setup

# Generated by GPT-4 for FastAPI — comparable quality to Claude here
# instrumentation.py
import os
from opentelemetry import trace, metrics
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.resources import Resource, SERVICE_NAME, SERVICE_VERSION
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor


def setup_telemetry(app=None):
    resource = Resource.create({
        SERVICE_NAME: os.getenv("SERVICE_NAME", "unknown"),
        SERVICE_VERSION: os.getenv("SERVICE_VERSION", "0.0.0"),
        "deployment.environment": os.getenv("ENVIRONMENT", "development"),
    })

    otlp_endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4317")

    # Tracing
    tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(
        BatchSpanProcessor(OTLPSpanExporter(endpoint=otlp_endpoint, insecure=True))
    )
    trace.set_tracer_provider(tracer_provider)

    # Metrics
    metric_reader = PeriodicExportingMetricReader(
        OTLPMetricExporter(endpoint=otlp_endpoint, insecure=True),
        export_interval_millis=30_000,
    )
    meter_provider = MeterProvider(resource=resource, metric_readers=[metric_reader])
    metrics.set_meter_provider(meter_provider)

    # Auto-instrumentation
    if app:
        FastAPIInstrumentor.instrument_app(app)
    SQLAlchemyInstrumentor().instrument()
    HTTPXClientInstrumentor().instrument()

Go SDK Setup

GPT-4 and Claude both handle the Go OTel SDK, but the boilerplate is more verbose than Python or Node.js:

// Generated by Claude for a Go/Gin service
// otel.go
package telemetry

import (
    "context"
    "os"

    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
    "go.opentelemetry.io/otel/sdk/resource"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
    semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
    "google.golang.org/grpc"
)

func InitTracer(ctx context.Context) (func(context.Context) error, error) {
    endpoint := os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
    if endpoint == "" {
        endpoint = "localhost:4317"
    }

    exporter, err := otlptracegrpc.New(ctx,
        otlptracegrpc.WithEndpoint(endpoint),
        otlptracegrpc.WithDialOption(grpc.WithBlock()),
        otlptracegrpc.WithInsecure(),
    )
    if err != nil {
        return nil, err
    }

    res := resource.NewWithAttributes(
        semconv.SchemaURL,
        semconv.ServiceName(os.Getenv("SERVICE_NAME")),
        semconv.ServiceVersion(os.Getenv("SERVICE_VERSION")),
    )

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(res),
        sdktrace.WithSampler(sdktrace.ParentBased(sdktrace.TraceIDRatioBased(0.1))),
    )
    otel.SetTracerProvider(tp)

    return tp.Shutdown, nil
}

Claude uses semconv/v1.26.0 (current), whereas GPT-4 sometimes pulls an older semconv version that has deprecated attribute names. For Go specifically, always verify the semconv import version matches your OTel SDK version.

Docker Compose Integration

Claude generates the full Docker Compose setup for local development:

# Generated by Claude
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.97.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otelcol-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "8888:8888"   # Collector metrics
    depends_on:
      - jaeger

  jaeger:
    image: jaegertracing/all-in-one:1.55
    ports:
      - "14250:14250"  # gRPC
      - "16686:16686"  # UI
    environment:
      - COLLECTOR_OTLP_ENABLED=true

  prometheus:
    image: prom/prometheus:v2.51.0
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"

  your-service:
    build: .
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
      - SERVICE_NAME=your-service
      - SERVICE_VERSION=1.0.0
    depends_on:
      - otel-collector

Health Check Wiring

One gap in both Claude and GPT-4’s Docker Compose output is missing health checks on the collector, which means app containers can start before the collector is ready to receive spans. Add this to the otel-collector service when you have high-volume services:

  otel-collector:
    # ...existing config...
    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"]
      interval: 5s
      timeout: 3s
      retries: 5

  your-service:
    depends_on:
      otel-collector:
        condition: service_healthy

The collector exposes a health check extension on port 13133 when configured — Claude generates this extension when asked, but omits it from the Docker Compose health check wiring by default.

Propagation and Context Headers

A frequently missed piece of OTel setup is configuring W3C TraceContext propagation. Claude generates this correctly when asked:

// Add to instrumentation.js — W3C propagation (required for distributed tracing across services)
import { W3CTraceContextPropagator } from '@opentelemetry/core';
import { CompositePropagator, W3CBaggagePropagator } from '@opentelemetry/core';
import { propagation } from '@opentelemetry/api';

propagation.setGlobalPropagator(
  new CompositePropagator({
    propagators: [
      new W3CTraceContextPropagator(),
      new W3CBaggagePropagator(),
    ],
  })
);

Without explicit propagator configuration, the SDK falls back to the default global propagator, which may or may not be W3C TraceContext depending on what other packages are loaded. Setting it explicitly avoids subtle distributed tracing failures where spans from different services don’t correlate.

Tool Comparison for OTel Automation

Setup Task	Claude	GPT-4	Copilot
Collector YAML	Excellent — correct pipeline ordering	Good	Weak
SDK initialization (Node.js)	Excellent — current API	Good — check imports	Good inline
SDK initialization (Python)	Excellent	Excellent	Good inline
SDK initialization (Go)	Excellent — current semconv	Good — check semconv version	Weak
Tail sampling config	Excellent	Good	No
Docker Compose wiring	Excellent	Good	No
Semantic convention currency	Current (ATTR_*)	Sometimes outdated	Context-dependent
Context propagation setup	Correct when asked	Correct when asked	Omits it
Health check wiring	Omits by default	Omits by default	No

Prompting Strategy

The most effective approach is a two-pass strategy:

Pass 1 — Generate the full collector config and SDK setup together in one prompt, giving Claude the full topology (services, exporters, sampling requirements). This produces a coherent config where all the pieces reference the same endpoints.

Pass 2 — Ask for the Docker Compose integration separately, referencing the collector config from pass 1. This ensures the service names, ports, and environment variables are consistent.

Trying to generate everything in one mega-prompt produces configs where the collector and SDK initialization use different port numbers or endpoint formats. Splitting into two passes with explicit context in the second prompt avoids this.

Built by theluckystrike — More at zovo.one