Best Tools for Remote Team Post-Mortems

A post-mortem that happens three weeks after an incident, written by one person from memory, with no action items tracked to completion, is theater. Remote teams need a structured, blame-free process that starts within 48 hours, collects input asynchronously, and produces tracked action items that close before the next incident.

The Anatomy of a Good Remote Post-Mortem

Timeline: Exact timestamps, who noticed what and when, what actions were taken
Impact: Affected users/customers, duration, severity
Root cause: Not “human error” — the systemic conditions that made the error possible
Contributing factors: What made detection or resolution slower
Action items: Specific, assigned, with due dates — not “improve monitoring”

The process has to work across time zones. Nobody should be blocked waiting for a live meeting to add their observations.

Tool 1: Notion (Best for Async Input)

Notion’s commenting system lets distributed team members add observations to specific sections of a post-mortem doc without waiting for a meeting. Use a database with templates for consistent structure.

Post-Mortem Database Template in Notion:

# Incident: [Brief Description] — [Date]

**Severity**: P1 / P2 / P3
**Duration**: [start time] → [end time] (X minutes)
**Services affected**:
**Customers affected**: ~N users

---

## Timeline

| Time (UTC) | Event | Actor |
|------------|-------|-------|
| 14:32 | Alert fired: error rate > 5% | PagerDuty |
| 14:35 | @alice acknowledged | Alice |
| 14:41 | Identified bad deploy at 14:20 | Alice |
| 14:45 | Rollback initiated | Alice |
| 14:51 | Error rate returned to baseline | Auto |

---

## What Happened

[Narrative description — written collaboratively via comments]

## Root Cause

[The systemic reason this happened — not "someone made a mistake"]

## Contributing Factors

- [What slowed detection]
- [What slowed resolution]

## What Went Well

- [Things that worked correctly during the incident]

## Action Items

| Item | Owner | Due | Status |
|------|-------|-----|--------|
| Add error rate alert at 2% threshold | @ops | 2026-04-01 | Open |
| Write runbook for rollback procedure | @alice | 2026-04-05 | Open |

Notion API to create a post-mortem from an incident:

#!/usr/bin/env python3
# create-postmortem.py
import os
import requests
from datetime import datetime

NOTION_TOKEN = os.environ["NOTION_TOKEN"]
PM_DATABASE_ID = os.environ["NOTION_PM_DATABASE_ID"]

def create_postmortem(title: str, severity: str, service: str):
    response = requests.post(
        "https://api.notion.com/v1/pages",
        headers={
            "Authorization": f"Bearer {NOTION_TOKEN}",
            "Notion-Version": "2022-06-28",
            "Content-Type": "application/json",
        },
        json={
            "parent": {"database_id": PM_DATABASE_ID},
            "properties": {
                "Name": {"title": [{"text": {"content": title}}]},
                "Severity": {"select": {"name": severity}},
                "Service": {"rich_text": [{"text": {"content": service}}]},
                "Date": {"date": {"start": datetime.utcnow().date().isoformat()}},
                "Status": {"select": {"name": "Draft"}},
            },
            "children": [
                {
                    "object": "block",
                    "type": "heading_2",
                    "heading_2": {"rich_text": [{"text": {"content": "Timeline"}}]},
                },
                {
                    "object": "block",
                    "type": "paragraph",
                    "paragraph": {"rich_text": [{"text": {"content": "Add timeline entries as table rows below."}}]},
                },
            ],
        },
    )
    response.raise_for_status()
    data = response.json()
    print(f"Post-mortem created: {data['url']}")
    return data["url"]

# Triggered from your incident management tool or PagerDuty webhook
if __name__ == "__main__":
    create_postmortem(
        title=f"Incident: Payment service 500s — {datetime.utcnow().strftime('%Y-%m-%d')}",
        severity="P1",
        service="payments",
    )

Tool 2: Jeli (Purpose-Built)

Jeli imports PagerDuty/Opsgenie timelines, Slack message history, and deployment logs automatically. The distributed team adds annotations and context without building a timeline from scratch.

PagerDuty webhook to create Jeli investigation:

# Configure in PagerDuty → Integrations → Webhooks
# Endpoint: https://app.jeli.io/api/v1/incidents/pagerduty
# Event: incident.triggered (P1/P2 only)

# Manually create an investigation from CLI
curl -X POST https://app.jeli.io/api/v1/investigations \
  -H "Authorization: Bearer $JELI_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Payment service outage 2026-03-22",
    "severity": "sev1",
    "summary": "Payment service returned 503s for 12 minutes",
    "slack_channel_id": "C0XXXXXXXXX"
  }'

Jeli automatically imports the Slack conversation from the incident channel into the timeline, making async post-mortems much faster.

Tool 3: GitHub Issues (Free, Integrated)

For teams already living in GitHub, a structured GitHub Issue template is the lowest-friction option.

.github/ISSUE_TEMPLATE/postmortem.md

---
name: Post-Mortem
about: Document an incident for learning and improvement
title: "Post-Mortem: [Brief description] — [YYYY-MM-DD]"
labels: post-mortem, needs-review
assignees: ""
---

## Summary

**Severity**: <!-- P1/P2/P3 -->
**Duration**: <!-- HH:MM UTC → HH:MM UTC (X minutes) -->
**Impact**: <!-- N users affected, X% error rate -->

## Timeline

<!-- Use UTC timestamps -->
| Time | Event | Actor |
|------|-------|-------|
| | | |

## Root Cause

<!-- Systemic cause — not human error -->

## Contributing Factors

-

## What Went Well

-

## Action Items

<!-- Use task list format so items show in issue sidebar -->
- [ ] @owner: Description of action item by YYYY-MM-DD
- [ ] @owner: Description of action item by YYYY-MM-DD

## Lessons Learned

<!-- What would you tell another team experiencing the same incident? -->

GitHub Actions to alert when action items are overdue:

#!/usr/bin/env python3
# check-pm-actions.py — run weekly, flag overdue action items
import os
import re
import requests
from datetime import datetime, timezone

GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
REPO = os.environ["GITHUB_REPO"]
SLACK_HOOK = os.environ.get("SLACK_WEBHOOK_URL", "")

headers = {"Authorization": f"Bearer {GITHUB_TOKEN}"}

issues = requests.get(
    f"https://api.github.com/repos/{REPO}/issues",
    headers=headers,
    params={"labels": "post-mortem", "state": "open", "per_page": 50},
).json()

overdue = []
today = datetime.now(timezone.utc)

for issue in issues:
    body = issue.get("body", "")
    # Find task items with due dates: "- [ ] @owner: ... by YYYY-MM-DD"
    matches = re.findall(r"- \[ \] .*?by (\d{4}-\d{2}-\d{2})", body)
    for date_str in matches:
        due = datetime.fromisoformat(date_str).replace(tzinfo=timezone.utc)
        if due < today:
            overdue.append({
                "title": issue["title"],
                "url": issue["html_url"],
                "due": date_str,
            })

if overdue and SLACK_HOOK:
    lines = "\n".join(f"• <{i['url']}|{i['title']}> (due {i['due']})" for i in overdue)
    requests.post(SLACK_HOOK, json={"text": f":warning: *Overdue post-mortem action items*\n{lines}"})

Running the Async Post-Mortem Process

Hour 0: Incident resolved. Create the post-mortem document immediately with just the title and timeline stub. Don’t write conclusions yet.

Hours 1–24: Everyone involved adds their observations asynchronously. Use comments for additions, not edits. Keep to facts, not blame.

Hour 24–48: Incident lead synthesizes the timeline into root cause analysis. Drafts action items with owners (who must be consulted, not just assigned).

Hour 48: Review comment period opens. Team has 48 hours to add corrections.

Hour 96: Document marked final. Action items are filed as tickets.

Week 4: Action item owners report progress. Incomplete items get re-scheduled, not silently dropped.

Built by theluckystrike — More at zovo.one