Remote Work Tools

Distributed teams face an unsolvable problem: production incidents happen at 3am in your timezone. Forcing your team into 24/7 on-call coverage burns people out. Building a rotation that respects timezones, fairly distributes load, and compensates appropriately requires deliberate architecture.

Why Remote On-Call is Harder

Collocated teams: “Let’s rotate who stays late.” Distributed teams have to solve:

Poor rotations create hero culture: “Sarah always saves us, let’s keep her on-call” leads to departure.

On-Call Rotation Models

Model 1: Single Engineer On-Call (One Person)

Setup: One person on-call for 1 week at a time. Rotates every Monday.

Pros:

Cons:

When to use: Teams < 8 people, or SLA allows 4-hour response time (non-critical incidents)

Compensation: Each on-call week = 1 comp day off the following week. Incidents that wake you = $100-300 per incident (on top of salary) or 4 hours comp time.

Model 2: Primary + Escalation Engineer (Two Person)

Setup: Engineer A is primary (pages first), Engineer B is escalation (pages if A unresponsive for 15 minutes).

Pros:

Cons:

When to use: Teams 8-16 people, critical infrastructure

Compensation: Primary takes $150-250/incident or 2h comp time. Escalation (if paged) takes $50-100 or 1h comp time. Both get 1 comp day per week.

Model 3: Timezone-Based Rotation (Multiple Regions)

Setup:

Rotate by week, but each person only on-call during their business hours + night.

Pros:

Cons:

When to use: Teams 12+ with global distribution, critical SLA (99.99%)

Tool setup:

Example escalation:

Compensation:

Model 4: Follow-the-Sun Rotation

Setup: Incident ownership passes from timezone to timezone as earth rotates.

Pros:

Cons:

When to use: Teams with SLA > 4 hours (e.g., non-critical infrastructure, batch jobs)

Tool setup:

Compensation: No emergency pay (all business hours), just standard salary.

Tool Comparison

PagerDuty

Price: $49/user/month (Standard), $199/user/month (Enterprise) Best for: Critical infrastructure, large teams

Features:

Example config:

  1. Create schedule “US-Business-Hours” (5pm PT - 8am PT)
  2. Create schedule “EU-Business-Hours” (8am CET - 5pm CET)
  3. Set escalation: If US doesn’t ack in 5 min, escalate to EU
  4. Rotation: 1-week on-call per person

Cost analysis: $49 × 8 people = $392/month. High but justified for critical systems.

Opsgenie (Atlassian)

Price: Free tier (1 on-call schedule, 5 users), $29/user/month (Standard) Best for: Teams 5-20, budget-conscious

Features:

Example config:

  1. Create schedule with time windows
  2. Set escalation policy: email (15 min) → Slack (30 min) → SMS (60 min)
  3. Rotation pattern: 1-week on-call

Cost analysis: Free tier covers 1 schedule + 5 people (sufficient for small teams). Paid $29 × 8 = $232/month if you outgrow free.

Google Calendar + Slack Bot

Price: Free (if already using Google Workspace + Slack) Best for: Teams < 10, simple rotations

Setup:

  1. Create shared “On-Call” calendar
  2. Add events: “US On-Call: John (5pm-8am)” with color coding
  3. Slack bot reads calendar, posts “#on-call who’s on duty”

Example:

John: Mon-Sun 5pm PT - 8am PT
Sarah: Mon-Sun 5pm CET - 8am CET
Mike: Mon-Sun 5pm SGT - 8am SGT
Repeat next week

Cost analysis: Free if you have Google Workspace. No SMS/escalation automation.

Grafana OnCall (formerly Grafana Incident)

Price: Free (basic), $240/month (Pro) Best for: Teams already using Grafana, good observability integration

Features:

Cost analysis: $240/month (fixed, not per-user). Good ROI for Grafana-heavy shops.

Scenario: 12-person team across US, EU, APAC

Step 1: Define on-call windows

US:   5pm PT - 8am PT (13h, covers evening + morning)
EU:   8am CET - 5pm CET (9h, covers business + overlap)
APAC: 5pm SGT - 8am SGT (15h, covers evening + morning)

Step 2: Define rotation

Each person does 1-week on-call per 8 weeks = 1.5 weeks/year per person.

Week 1:  US: John   | EU: Sarah  | APAC: Mike
Week 2:  US: Jessica| EU: Philipp| APAC: Kim
...
Week 8:  (repeat)

Step 3: Set compensation

Step 4: Set escalation policy

1. Primary on-call (5min to ack)
   ↓
2. Escalation engineer (10min timeout)
   ↓
3. On-call manager (20min timeout)
   ↓
4. VP Engineering (hard page)

Step 5: Test in PagerDuty or Opsgenie

  1. Create 3 schedules (US, EU, APAC)
  2. Add 4 people to each (rotating weekly)
  3. Create escalation policy chaining them
  4. Send test alert from Slack/monitoring system

Red Flags in On-Call Rotations

Red flag 1: One person takes 3x incidents per rotation

Indicates: Unfair incident distribution, or that person is better at fixing things (promote them, don’t burn them out).

Fix: Analyze incident sources, fix root causes, rotate on-call.

Red flag 2: On-call engineer sleeps with pager, then works full day

Indicates: Team is unsustainable, incidents cluster at night.

Fix: Give comp time (3h slept badly = next day off), reduce on-call frequency, or hire contractor for night shifts.

Red flag 3: On-call engineer avoids “on-call week” by calling in sick

Indicates: Rotation is unfair, compensation is inadequate, or culture is blame-heavy.

Fix: Review rotations for fairness, increase compensation, fix blame culture (blameless post-mortems).

Red flag 4: Manager is never on-call

Indicates: Culture problem (managers avoid pain), or you’re protecting managers from reality.

Fix: Managers take on-call rotation. They understand incidents better, and it builds empathy.

Compensation Framework

Scenario Payment Model
On-call during business hours (no incidents) Included in salary, no extra
On-call during business hours (1-2 incidents) +$100-150 for the week
On-call during night (9pm-7am, no incidents) $300-400/week or 8h comp time
On-call during night (1+ incidents) $400-600/week or 12h comp time
Critical incident (3+ hours, wakes you at 3am) $200-300 + 8h comp time
Oncall manager (escalation calls only) $500/week (usually salary bump)

Why comp time > cash?: Some people prefer time off. Offer both, let them choose.

FAQ

Q: What’s a fair on-call frequency? A: 1 week per 8 weeks (1.5 weeks/year) is industry standard. 1 week per 6 weeks is aggressive. 1 week per 12 weeks is cushy.

Q: Do we page the CEO? A: Only for critical incidents: “service down for all users” or “data loss in progress”. Not for “p99 latency is high” or “one customer affected”.

Q: What if someone refuses on-call? A: It’s part of the job for most engineering roles. If someone refuses, you either: (1) hire someone else, (2) restructure the role (not infrastructure-facing), or (3) hire an ops/SRE team to absorb on-call.

Q: How do we prevent on-call burnout? A: (1) Limit frequency, (2) Pay well, (3) Invest in automation (fewer incidents = fewer pages), (4) Blameless post-mortems (reduce anxiety), (5) Give comp time.

Q: Should interns be on-call? A: No. Senior engineers only until they have 2+ years experience + deep system knowledge.

Q: What if we’re a startup and can’t pay on-call premium? A: Use Model 1 (single engineer, simple rotation) + comp time (day off each week). It’s less fair but sustainable for early stage.


Built by theluckystrike — More at zovo.one