Use case · Runbooks

Runbooks the team
actually opens at 3 a.m.

Most runbooks rot in a Confluence space nobody touches between incidents. A BookSlash runbook board lives at b/runbook-<service>: live dashboards, code blocks, decision logs, post-mortems — current because the team uses it, not because someone audits it.

The shape of the work

Three steps to a runbook that survives.

A runbook is only useful if someone opens it under stress. Make it impossible to lose, fast to scan, and easy to update.

  1. 1

    Give it a slug per service

    b/runbook-checkout, b/runbook-payments. The on-call rotation references slugs, not Confluence URLs. The slug stays even when the service moves.

  2. 2

    Put dashboards on the board

    Embed the live dashboard, the alert thresholds, the recent SEV history. The runbook is the dashboard, not a link to it.

  3. 3

    Updates land where the work is

    After every incident, the on-call engineer updates the runbook from the same board they used to mitigate. The audit log records who changed what.

Runbook board template

b/runbook-checkout — eight blocks for a 3 a.m. page.

Designed for fast scanning under stress. Live dashboards first, decision criteria second, the human stuff (who to escalate to) at the bottom.

b/runbook-checkout
8 nodes
  1. Service overview

    One paragraph: what the service does, who owns it, where it sits in the architecture mind map.

  2. Live dashboard

    Embedded Datadog/Grafana board. Latency, error rate, saturation. Loads at 40ms.

  3. Alert thresholds

    Table node: which alert fires at which value, who pages, on which schedule. Single source of truth.

  4. Decision tree

    Flow diagram: 5xx spike? Check this. Database slow? Check that. Three branches, ten nodes, calm under stress.

  5. Recent SEVs

    Last five incidents with one-line summaries and links to post-mortems. Pattern recognition built in.

  6. Code snippets

    kubectl, pg query, deploy rollback. Code blocks with copy buttons. No retyping at 3 a.m.

  7. Escalation path

    Who to page, in what order, with their slug-resolved phone number. b/escalate-checkout if the on-call cannot reach them.

  8. Post-mortem template

    Pinned at the bottom. Filled in after every SEV. The board grows; institutional knowledge stays.

Why this works

Six things a Confluence space cannot do.

One slug per service

b/runbook-<service> is the same shortcut for every engineer, every browser, every device. Page received → type slug → on the runbook in 40ms.

Live dashboards, not screenshots

The runbook IS the dashboard. Latency, errors, saturation in real time. Numbers from a screenshot are already wrong.

Code blocks with copy buttons

Mitigation queries, kubectl invocations, deploy rollbacks — copy-paste ready. No retyping a Confluence code block at 3 a.m.

A flow diagram for the decision tree

Branches make better runbooks than paragraphs. Look at the symptom, follow the line, take the action.

Audit log of every edit

Who changed the alert threshold last Tuesday? The audit log answers in two clicks. 90 days on Pro, 365 on Enterprise.

Post-mortem on the same board

The mitigation team writes the post-mortem on the board they used during the incident. No transcribing into a doc the next day.

We ditched two wikis and a "links" channel. The on-call rotation went from "where's the dashboard?" to "type b/oncall."

RC

Renata Coleman

Eng Lead · Halberd Mobility

−28%

Mean time to mitigate

More customer stories

Frequently asked

Questions, answered.

No — those handle paging and rotation. BookSlash holds the runbook content the on-call engineer reads after they get paged. Most teams put a slug to the runbook directly in their PagerDuty escalation policy notes.

BookSlash embeds load the source dashboard (Datadog, Grafana, Honeycomb, etc.) in an iframe with the workspace’s authenticated session. The runbook viewer sees current data, not a snapshot.

Yes. Per-board permissions let you restrict edit rights to specific roles or members. Most teams give all engineers read access, senior engineers and managers edit access, and use the audit log to track changes.

Audit logs are tamper-evident and exportable (90 days on Pro, 365 days on Enterprise with NDJSON). Most SOC 2 audits accept BookSlash audit log exports as evidence of change control. Email [email protected] for the SOC 2 status update.

Start with one team. Roll out when it sticks.

Your stack. Your shortcuts.
One keystroke for everyone.

2,400+ teams reach every important destination in their stack with a single keystroke. Save the first slug in 30 seconds.

Free for personal use · No credit card · 14-day team trial

Runbook boards — The runbook your team actually opens · BookSlash