Ransomware attack playbook for CTOs: decisions, containment, recovery, and insurance

In 2021, Colonial Pipeline shut down operations after a ransomware event, and the company later confirmed a $4.4 million payment. In 2023, MGM Resorts reported a cyber incident that drove a $100 million hit to adjusted property EBITDAR. Those numbers frame the real problem for CTOs: ransomware is a business outage with legal, financial, and people costs, not “just” malware.

This playbook is the version I wish every CTO had printed, tested, and funded before the first encrypted server shows up.

Ransomware attack playbook: what it is and what it must cover

A ransomware playbook is a pre-agreed set of decisions, roles, and technical steps that lets you contain an intrusion, keep people safe, and restore systems under time pressure.

A good playbook runs four tracks at the same time:

Technical response: isolate, preserve evidence, eradicate, restore.
Business continuity: keep revenue and customer support running.
Legal and regulatory: privilege, breach notice, sanctions risk.
Financial: cyber insurance, forensics vendors, ransom decision flow.

Here’s the definition I use with boards and exec teams.

A ransomware playbook is a set of pre-committed choices that trades speed for certainty in the first 24 hours.

Most CTOs I talk to focus on tooling and backups. You need that. But it falls apart without pre-decisions, because the attacker moves faster than your approval chain.

Core components to write down and rehearse:

Roles: incident commander, deputy, comms lead, legal lead, IT lead, cloud lead.
Authority: who can disconnect networks, disable SSO, and shut down production.
Evidence rules: what to collect, where to store it, who can touch it.
Restore order: what comes back first, and what stays off.
External calls: insurer hotline, breach coach, IR firm, outside counsel, FBI.

Ransomware response is a leadership exercise that happens to include technical steps.

What to decide in advance for a ransomware response

The first hour is not the time to debate risk tolerance. Write these decisions down, get signatures, and revisit them every 6 to 12 months.

Define your “stop the bleeding” authority

Someone needs the power to break glass without a meeting.

Decide these in advance:

Shutdown authority: who can take production offline.
Identity authority: who can disable SSO, rotate IdP keys, and revoke sessions.
Network authority: who can isolate segments and block egress.
Cloud authority: who can lock down IAM, stop instances, and freeze snapshots.

If you run teams of 150 plus, put a deputy on each authority. People get pulled into side quests during incidents, and you don’t want a single point of failure in your decision chain.

Set recovery objectives that match ransomware reality

RPO and RTO targets often live in a slide deck. Ransomware turns them into a contract with your future self.

Pick targets per system tier:

Tier 0: identity, DNS, PKI, secrets, logging.
Tier 1: payments, order flow, customer auth.
Tier 2: internal tools, analytics, batch jobs.

Write down numbers. Example targets that I’ve seen work for SaaS:

Tier 0: RTO 4 hours, RPO 15 minutes.
Tier 1: RTO 12 hours, RPO 1 hour.
Tier 2: RTO 72 hours, RPO 24 hours.

The catch is Tier 0. If your IdP or secrets store is compromised, your restore order changes fast.

Pre-negotiate cyber insurance workflows

Cyber insurance can help, but it comes with rules and gates. You want to learn those rules on a calm Tuesday, not while your finance team is asking how to pay an IR firm.

Decide and document:

Carrier requirements: which IR firms are “approved” and billable.
Hotline path: who calls, from what number, with what policy ID.
Consent gates: what actions require insurer approval.
Ransom handling: whether the policy covers negotiation and payment.

Ask your broker for the insurer’s breach response panel list and keep it offline. Print it. Store it in a safe.

Also ask for the policy’s sublimits. Many policies cap ransomware payments, business interruption, and digital forensics separately.

For a concrete reference point, CISA’s ransomware guidance stresses preparation, response planning, and tested backups as core controls, and it aligns well with what insurers now ask for in underwriting questionnaires. See CISA Stop Ransomware guidance.

Decide your ransom payment stance and your sanctions checks

You need a stance, even if the stance is “we decide case by case.” Case by case still needs a process, owners, and a clock.

Two constraints shape this:

Sanctions risk: paying a sanctioned entity can create legal exposure.
Trust impact: customers judge you by recovery speed and honesty.

OFAC published an advisory in 2020 that warns about sanctions risks tied to ransomware payments. Your counsel should build this into the decision flow. See OFAC advisory on ransomware payments.

Write down who owns:

sanctions screening
negotiation
payment mechanics
proof of decryption testing

And decide where crypto payment capability lives, if you allow it at all. Don’t try to stand up a wallet while your systems are on fire.

Build a communications plan that assumes systems are down

Ransomware often takes out email, chat, and SSO. Plan like you won’t have any of them.

Pre-stage:

Out-of-band chat: Signal group, phone tree, or a secondary tenant.
War room: a physical room and a video bridge.
Customer status page: separate credentials, separate hosting.
Press holding statement: reviewed by counsel.

If you want a tool to keep this organized, treat it like a living operational asset in Command Center at /command-center. Track owners, last test date, and gaps like you would for SLOs.

Step-by-step ransomware response: first hour, first day, first week

This section is written like a runbook. It assumes you have an incident commander and a deputy.

First hour: contain, preserve, and stop spread

Your goal is to stop encryption and stop data theft.

Declare the incident. Open an incident channel in your out-of-band system.
Start an incident log. One scribe, timestamped entries, no gaps.
Isolate affected endpoints. Pull network, don’t power off unless counsel says.
Disable risky identity paths. Revoke sessions, rotate tokens, block legacy auth.
Block egress. Stop outbound traffic to suspicious IPs and domains.
Preserve evidence. Snapshot disks, collect memory where possible, export logs.

The evidence point matters. Your insurer and counsel will ask for it. Your IR firm will need it.

CISA’s joint advisories often stress that ransomware crews steal data before encryption. Treat this as both an outage and a breach until proven otherwise.

First day: scope the blast radius and stabilize the business

You need answers to four questions.

What systems are encrypted, what systems are exfiltrated, what access path did they use, and what is still safe to restore?

Do this work in parallel:

Forensics track: IR firm starts triage, identifies initial access vector.
Identity track: rotate secrets, reset privileged accounts, review MFA.
Infrastructure track: rebuild clean admin workstations, lock down management planes.
Business track: decide what customer functions you can run safely.

If you run Kubernetes, assume cluster admin tokens are compromised until you prove otherwise. Rotate service account tokens, review RBAC bindings, and check admission controller logs.

If you run Windows domains, assume Active Directory is the center of gravity. Many ransomware crews target AD first.

Microsoft’s incident response guidance emphasizes identity control, privileged access hygiene, and rapid containment steps. Use it as a checklist for your identity track. See Microsoft ransomware guidance.

First week: restore in a clean room, then harden

Restoration is where teams make the costliest mistakes. They restore fast, then they re-infect.

Run recovery like a migration:

Build a clean environment: new accounts, new keys, new admin devices.
Restore Tier 0 first: identity, secrets, logging, DNS.
Restore Tier 1 next: revenue paths, customer auth, support tooling.
Validate before reconnect: scan, test, and monitor.

Use immutable backups where you can. Attackers often delete or encrypt backups.

NIST’s incident handling guide, SP 800-61, gives a clear lifecycle for preparation, detection, containment, eradication, and recovery. It’s not ransomware-specific, but it keeps teams disciplined under stress. See NIST SP 800-61 Rev. 2.

If you want a structured way to turn incident data into root cause, run your post-incident analysis through Split Cause at /splitcause. Graph the identity events, lateral movement, and encryption start times. Then test which causal chain fits the evidence.

Cyber insurance and ransomware: how to make it help, not slow you down

Cyber insurance can pay for forensics, counsel, notification, and business interruption. It can also bog you down if you treat it like paperwork instead of an incident partner with constraints.

What to ask your broker before renewal

Ask these questions in writing and keep the answers offline.

Panel vendors: which IR firms and negotiators are pre-approved.
Retentions: what you pay before coverage starts.
Sublimits: ransomware, business interruption, and data restoration caps.
Waiting periods: business interruption often has a time deductible.
Proof requirements: logs, invoices, and incident timelines.

Also ask how the carrier defines “system failure” versus “security event.” That wording changes payouts.

How to run the insurer relationship during an incident

Treat the insurer like a stakeholder with a strict interface.

Single point of contact: one person owns insurer comms.
Daily update cadence: one short call, one written summary.
Pre-approved spend: get written consent for big vendor costs.

This is where your incident log pays off. It becomes the shared record that keeps everyone aligned.

The ransom decision matrix (print this)

This is the link-worthy element. It’s blunt on purpose.

Decision factor	What to measure	Green	Yellow	Red
Restore capability	Clean backups and tested restore time	RTO met in 24 to 72 hours	Partial restores, missing Tier 0	No viable restore path
Data theft evidence	Exfil indicators, leak site claims	No exfil evidence	Some exfil signals	Confirmed exfil of regulated data
Safety and critical services	Patient care, public safety, utilities	No safety impact	Degraded operations	Safety risk or legal duty to operate
Sanctions risk	Counsel and OFAC screening	Cleared	Unclear attribution	Likely sanctioned entity
Trust impact	Customer contracts, SLAs, comms plan	Clear comms, short outage	Medium outage	Long outage, unclear comms
Recurrence risk	Initial access closed, identity clean	Closed and verified	Partially closed	Still open or unknown

Use this matrix to drive a decision meeting with counsel, CEO, and board rep. Keep the meeting to 30 minutes. Decide next actions, not feelings.

Enterprise implications for CTOs: why ransomware changes your operating model

Your identity system becomes critical infrastructure. If Okta, Entra ID, or AD falls, everything else follows. Treat identity like Tier 0 and fund it like production.
Backups become a product with SLOs. “We have backups” means nothing without restore tests. Run quarterly restore drills and track success rate. Aim for 95 percent restore success on Tier 1 data sets.
Your vendor surface becomes your attack surface. MSP tools, remote access, and SaaS admin accounts show up in real incidents. Use a formal third-party review with our vendor risk assessment guide at /tools/vendor-risk-assessment.
Your org chart becomes part of your security posture. Teams that can’t coordinate under stress lose days. Use our engineering org design patterns guide at /tools/engineering-org-design to clarify incident roles and escalation paths.

CTO recommendations: immediate actions, policy, and architecture principles

Immediate actions (next 30 days)

Run a ransomware tabletop. Use a 2-hour scenario with a fake ransom note and a dead Slack. Measure time to declare, time to isolate, and time to contact insurer.
Inventory Tier 0 systems. List IdP, secrets, DNS, PKI, logging, endpoint management. Put owners and restore steps in one doc.
Test restores for two Tier 1 services. Pick one database and one object store. Restore into an isolated account and run app smoke tests.
Create an offline contact sheet. Include insurer hotline, breach coach, IR firm, outside counsel, PR, and FBI field office.
Harden admin workstations. Issue dedicated admin devices, block email and web browsing on them, and require phishing-resistant MFA.

For operational tracking, keep these actions visible in Command Center at /command-center. Treat them like a migration plan with dates and owners.

Policy framework (what to write down and get signed)

Authority policy. Name who can shut down production, disable SSO, and isolate networks.
Ransom stance policy. Define who decides, what inputs they need, and how sanctions checks work.
Evidence handling policy. Define what gets imaged, where it’s stored, and who has access.
Communications policy. Define internal comms, customer comms, and regulator comms paths.

If you need a structure for the incident review itself, use our incident postmortem template at /tools/incident-postmortem. It keeps action items tied to owners and dates.

Architecture principles (how to reduce blast radius)

Segmentation by default. Separate user networks, server networks, and management planes. Block east-west traffic unless needed.
Immutable logs. Ship logs to a separate account with write-once controls. Attackers delete logs to slow forensics.
Backup isolation. Use separate credentials, separate accounts, and immutability controls. Test restores quarterly.
Least privilege for humans and workloads. Review IAM and RBAC quarterly. Remove stale admin roles within 7 days of role change.
Dependency visibility. Map service dependencies so you know what to restore first. Use our microservices dependency mapping guide at /tools/microservices-dependency-mapper.

Bigger picture: ransomware is now a board-level reliability problem

Ransomware keeps rising because it works. It hits the soft spots in modern companies: shared identity, shared admin tools, and fast-moving teams with weak change control.

The best CTOs treat ransomware readiness like SRE work. They set targets, run drills, and measure recovery like they measure uptime. They also invest in people systems, because burnout and confusion show up fast during week-long incidents.

The question is simple: if your chat, email, and SSO died at 9:17 AM, who would lead, and what would they do in the first 30 minutes?

Sources:

https://www.cisa.gov/stopransomware

https://ofac.treasury.gov/media/9326/download

https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf

https://learn.microsoft.com/en-us/security/ransomware/

https://www.sec.gov/ixviewer/documents/20230914x8k.htm

Ransomware Attack Playbook for CTOs: Decisions, Containment, Recovery, and Insurance