Remote Work Tools

A playbook is a documented procedure that any team member can execute without needing the author present. For remote teams across time zones, good playbooks turn what would be a “wake someone up” situation into a “follow these steps” situation. The difference is in the structure and completeness of the template.

This guide provides templates for three core playbook types: incident response, deployment, and onboarding.


Incident Response Playbook Template

# Incident: [INCIDENT-NAME]

**Severity**: P1 / P2 / P3
**Status**: Active / Resolved
**Incident Commander**: @[owner]
**Started**: YYYY-MM-DD HH:MM UTC
**Resolved**: YYYY-MM-DD HH:MM UTC (fill when resolved)

---

## What Is Happening

One paragraph plain-language description of the incident. What is affected? Who is affected? What is the user-visible impact?

> Example: "The payments API is returning 502 errors for ~40% of checkout attempts. Approximately 200 users per hour are unable to complete purchases. The error started at 14:23 UTC."

## Current Status

> Example: "Identified root cause (database connection pool exhausted). Implementing fix. ETA 30 minutes."

## Timeline

| Time (UTC) | Event |
|------------|-------|
| 14:23 | First error alerts fired |
| 14:31 | On-call acknowledged |
| 14:45 | Root cause identified |
| 15:10 | Fix deployed |
| 15:15 | Confirmed resolved |

## Impact

- **Services affected**: payments-api, checkout-frontend
- **Error rate**: ~40%
- **Users affected**: ~200/hour
- **Revenue impact**: ~$4,000/hour estimate
- **External customers notified**: [Yes/No] via [status.yourcompany.com]

---

## Diagnosis Steps

Run these commands to gather context:

```bash
# Check service health
kubectl get pods -n production | grep payments

# View recent error logs
kubectl logs -n production deployment/payments-api --since=30m | grep ERROR | tail -50

# Check database connectivity
kubectl exec -n production deployment/payments-api -- \
 pg_isready -h $DB_HOST -p 5432

# View active DB connections
psql -h $DB_HOST -U postgres -c \
 "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

Mitigation Options

Option Risk ETA Steps
Restart pods Low 2 min kubectl rollout restart deployment/payments-api -n production
Roll back deployment Low 5 min kubectl rollout undo deployment/payments-api -n production
Increase DB pool size Medium 15 min Edit DB_POOL_SIZE env var and redeploy
Enable maintenance mode Medium 2 min Set MAINTENANCE_MODE=true in config and redeploy

Resolution

What was done to resolve the incident. What was the root cause?

Follow-up Actions


Deployment Playbook Template

# Deployment: [SERVICE-NAME] v[VERSION]

**Deployer**: @[name]
**Date**: YYYY-MM-DD
**Environment**: staging / production
**Deploy type**: Standard / Hotfix / Rollback
**PR / Release**: [link]

---

## Pre-Deployment Checklist

### Code
- [ ] PR approved by required reviewers
- [ ] All CI checks passing
- [ ] CHANGELOG updated
- [ ] Migration scripts reviewed (if applicable)

### Staging Verified
- [ ] Deployed to staging successfully
- [ ] Smoke tests passing on staging
- [ ] New feature tested on staging

### Dependencies
- [ ] Dependent services notified
- [ ] External APIs/webhooks compatible with new version
- [ ] Feature flags configured for gradual rollout

### Rollback Plan
- [ ] Previous version noted: `v[PREVIOUS_VERSION]`
- [ ] Rollback command tested: `kubectl rollout undo deployment/[service]`
- [ ] Database migrations are reversible: [Yes / No — explain if No]

---

## Deployment Steps

### 1. Announce

Post in #deployments Slack channel:

Deploying [service] v[version] to production. Changes: [brief summary] Risk: Low / Medium / High Rollback ready: yes


### 2. Deploy

```bash
# Tag and push (if not automated)
git tag v[version]
git push origin v[version]

# Trigger deploy (if manual)
kubectl set image deployment/[service] [service]=[registry]/[service]:v[version] -n production

# Wait for rollout
kubectl rollout status deployment/[service] -n production --timeout=5m

3. Verify

# Check pods are running
kubectl get pods -n production -l app=[service]

# Confirm new version
kubectl describe deployment/[service] -n production | grep Image

# Check error rate (first 5 minutes)
# Run this every 60 seconds x5
curl -s "https://monitoring.yourcompany.com/api/v1/query?query=rate(http_requests_total{service='[service]',status=~'5..'}[1m])" \
 | jq '.data.result[0].value[1]'

4. Post-Deploy


Rollback Procedure

If error rate increases or critical errors appear:

# Immediate rollback
kubectl rollout undo deployment/[service] -n production
kubectl rollout status deployment/[service] -n production

# Verify rollback
kubectl describe deployment/[service] -n production | grep Image

Post in #deployments:

ROLLBACK: [service] rolled back to v[previous_version].
Reason: [brief explanation]
Investigation ongoing in #incidents

---

## Onboarding Playbook Template

```markdown
# Onboarding: [ENGINEER_NAME]

**Start Date**: YYYY-MM-DD
**Role**: [role]
**Manager**: @[manager]
**Buddy**: @[buddy]
**Team**: [team name]

---

## Week 1: Foundation

### Day 1 — Access and Setup

**IT/Admin tasks (Manager)**
- [ ] Google Workspace account created
- [ ] GitHub org invitation sent
- [ ] Slack invitation sent
- [ ] 1Password team invitation sent
- [ ] PagerDuty account created (if on-call eligible)
- [ ] AWS/GCP/Azure console access configured

**Environment setup (New Hire)**

```bash
# Clone the onboarding repo for setup scripts
git clone git@github.com:your-org/onboarding.git
cd onboarding && make setup

Day 2-3 — Codebase Orientation

Day 4-5 — Process and Context


Week 2: Contributing


Ongoing: 30/60/90 Day Goals

30 Days

60 Days

90 Days


Key Resources

| Resource | Link | |———-|——| | Architecture overview | [link] | | API documentation | [link] | | Deployment runbook | [link] | | Incident response playbook | [link] | | On-call rotation | [link] | | Team calendar | [link] |


---

## Storing and Accessing Playbooks

Playbooks rot if they're not maintained. The best storage is wherever your team already looks:

```bash
# Notion database with properties
Title: [text]
Type: [Incident / Deployment / Onboarding / Process]
Owner: [person]
Last Reviewed: [date]
Status: [Active / Draft / Archived]

# GitHub Wiki (version-controlled)
docs/
├── playbooks/
│   ├── incident-response.md
│   ├── deployment-standard.md
│   └── onboarding.md

Trigger playbook reminders via GitHub Actions to review stale playbooks:

name: Playbook Review Reminder
on:
  schedule:
    - cron: '0 9 1 * *'  # First of each month

jobs:
  remind:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check for stale playbooks
        run: |
          find docs/playbooks/ -name "*.md" -mtime +90 \
            -exec echo "STALE PLAYBOOK: {}" \; \
            | while read msg; do
                curl -s -X POST \
                  -H 'Content-type: application/json' \
                  --data "{\"text\":\"$msg — please review and update\"}" \
                  "${{ secrets.SLACK_WEBHOOK }}"
              done