Claude Skills Guide

Claude Code Error Rate Limit 429 — How to Handle

When you’re deep in a coding session with Claude Code, the last thing you want is your workflow interrupted by an HTTP 429 error. This status code means you’ve hit a rate limit—the server is throttling your requests because you’ve sent too many in a short time window. Understanding how to handle this error gracefully keeps your development momentum intact.

What Triggers the 429 Error in Claude Code

Claude Code imposes rate limits to ensure fair resource allocation across all users. Several scenarios commonly trigger this error:

Excessive API calls within a minute — If you’re using Claude Code with API integrations, sending dozens of requests per minute will hit the threshold. This is especially common when automating tasks with skills like xlsx for bulk spreadsheet operations or pdf for batch document processing.

Concurrent session limits — Running multiple Claude Code sessions simultaneously can exhaust your allowed connections. Each active session consumes resources, and the system throttles when you exceed the concurrent limit.

Long-running conversations — Extended sessions with thousands of message exchanges may gradually approach rate limits. The supermemory skill, which searches through your conversation history, can trigger additional API calls that compound over time.

Repeated tool invocations — Skills that call external tools repeatedly—like tdd running multiple test cycles, or frontend-design generating numerous iterations—can trigger throttling if the tool invocations happen too rapidly.

Immediate Response: Recognizing the Error

When a 429 error occurs, Claude Code typically displays a clear message:

Error: HTTP 429 — Too Many Requests
Rate limit exceeded. Please wait before retrying.

The error message often includes a Retry-After header indicating how many seconds to wait. This is your key to recovery—honoring this wait time prevents further throttling and gets you back to coding faster.

Implementing Retry Logic

The most robust approach to handling rate limits is implementing automatic retry with exponential backoff. Here’s a practical pattern you can use in your Claude Code workflows:

async function claudeRequestWithRetry(requestFn, maxRetries = 3) {
  let attempt = 0;
  
  while (attempt < maxRetries) {
    try {
      return await requestFn();
    } catch (error) {
      if (error.status === 429) {
        const retryAfter = error.headers?.['retry-after'] || Math.pow(2, attempt);
        console.log(`Rate limited. Waiting ${retryAfter}s before retry ${attempt + 1}/${maxRetries}`);
        await sleep(retryAfter * 1000);
        attempt++;
      } else {
        throw error;
      }
    }
  }
  throw new Error('Max retries exceeded');
}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

This pattern doubles the wait time with each failed attempt (1 second, then 2 seconds, then 4 seconds), giving the server time to recover while minimizing total wait time.

Practical Strategies for Different Workflows

Bulk Processing with the xlsx Skill

When using xlsx to process large datasets, batch your operations instead of sending individual requests for each row:

# Instead of processing one row at a time
for row in data:
    await process_row(row)  # Triggers rate limit with many calls

# Batch the work into chunks
batch_size = 50
for i in range(0, len(data), batch_size):
    batch = data[i:i + batch_size]
    await process_batch(batch)  # Fewer, larger requests
    await sleep(2000)  # Brief pause between batches

Document Generation with the pdf Skill

The pdf skill excels at generating documents, but generating dozens in rapid succession triggers rate limits. Space out your requests:

import asyncio

async def generate_documents_safely(docs):
    results = []
    for i, doc in enumerate(docs):
        result = await pdf.generate(doc)
        results.append(result)
        
        # Wait longer every 10 documents
        if (i + 1) % 10 == 0:
            print(f"Pausing after {i + 1} documents...")
            await asyncio.sleep(10)
        elif i < len(docs) - 1:
            await asyncio.sleep(1)  # Brief pause between each
            
    return results

Test-Driven Development with the tdd Skill

When tdd runs multiple test cycles, build pauses into your workflow:

# Run tdd with controlled pacing
for test_file in test_files; do
    # Invoke skill: /tdd run-tests $test_file
    # Wait 3 seconds between test cycles
    sleep 3
done

Design Iterations with frontend-design

The frontend-design skill may generate multiple mockups. Request them sequentially rather than in parallel:

// Sequential requests with delays
const mockups = ['landing-page', 'dashboard', 'profile'];
for (const type of mockups) {
  const result = await claude.invoke('frontend-design', { type });
  console.log(`Generated: ${type}`);
  await delay(2000);  // Space out each generation
}

Monitoring Your Rate Limit Usage

Keep track of your request patterns to avoid hitting limits proactively. Several approaches help:

Log request timestamps — Maintain a simple log showing when you make API calls:

const requestLog = [];

function logRequest() {
  requestLog.push(Date.now());
  // Keep only last 60 seconds of requests
  const cutoff = Date.now() - 60000;
  while (requestLog.length && requestLog[0] < cutoff) {
    requestLog.shift();
  }
}

function getRequestsPerMinute() {
  return requestLog.length;
}

Set up warnings — Before executing a batch operation, check your recent request rate:

async function safeBatchOperation(operations) {
  if (getRequestsPerMinute() > 40) {
    console.log('Approaching rate limit. Waiting 30s...');
    await sleep(30000);
  }
  // Proceed with operations
}

Checkpointing for Long Tasks

When rate limits interrupt long-running workflows, implement checkpointing to preserve progress:

# checkpoint.py - save progress before continuing
import json
import os

def save_checkpoint(task_id, progress):
    checkpoint_file = f".claude/checkpoints/{task_id}.json"
    os.makedirs(os.path.dirname(checkpoint_file), exist_ok=True)
    with open(checkpoint_file, 'w') as f:
        json.dump(progress, f)

def load_checkpoint(task_id):
    checkpoint_file = f".claude/checkpoints/{task_id}.json"
    if os.path.exists(checkpoint_file):
        with open(checkpoint_file, 'r') as f:
            return json.load(f)
    return None

After a rate limit interruption, resume from the last checkpoint rather than restarting from scratch. This is especially valuable when using batch processing workflows — instead of many small operations, consolidate into fewer, larger ones:

# Instead of many small operations:
claude "fix this function"
claude "now fix this one"

# Do this:
claude "fix these three functions: [list them all at once]"

Best Practices Summary

  1. Honor the Retry-After header — It’s your clearest guide to when you can resume
  2. Use exponential backoff — Starting with short waits and increasing prevents hammering the server
  3. Batch operations — Fewer, larger requests beat many small ones
  4. Space out bulk work — Adding deliberate delays between operations keeps you under limits
  5. Monitor proactively — Track your request rate before hitting errors
  6. Consider upgrading — If you consistently hit limits, higher-tier plans often offer increased quotas

When Rate Limits Become a Pattern

If you frequently encounter 429 errors despite implementing these strategies, consider whether your workflow architecture needs adjustment. Skills like supermemory can help you track which operations consume the most requests, enabling you to optimize high-traffic patterns.

For teams using Claude Code at scale, implementing request queuing with a dedicated service can abstract rate limiting away from individual developers, allowing everyone to work without manual throttling concerns.

Built by theluckystrike — More at zovo.one