Documenting Failure Scenarios

Document what happens when things go wrong. Failure scenarios help teams prepare for incidents and build resilient systems.

January 23, 2026

Every system fails eventually. Networks partition. Databases go down. Third-party services time out. The difference between chaos and controlled response is preparation. Failure scenarios document what happens when components fail and how the system should respond.

Adding a Failure Scenario Block

Failure Scenario Fields

Impact Examples

Scenario	Impact Description
Write failure	All write operations fail. Users cannot save changes.
Checkout blocked	Checkout flow blocked. Revenue loss during outage.
Read degradation	Read operations degrade to cached data. Users see stale information.

Severity Levels

This example uses the NeoArc Failure Scenario content block.

Scenario Status

Track the validation state of each scenario:

This example uses the NeoArc Failure Scenario content block.

Visualising Failure Dependencies

Graph diagrams help visualise how failures cascade through a system. This example shows component dependencies with failure severity indicated by colour: red for critical, orange for high, green for medium, and blue for low-impact components. External dependencies are shown in indigo.

Categories of Failure Scenarios

Consider documenting scenarios in these categories:

Infrastructure Failures

Database unavailability, cache layer failure, message queue unavailability, storage system failure, network partitions

Dependency Failures

Third-party API outages, internal service unavailability, authentication provider issues, CDN failures

Capacity Failures

Traffic spikes beyond capacity, resource exhaustion, rate limiting triggered, queue backlogs

Data Failures

Data corruption, replication lag, schema migration issues, backup restoration needs

Using Failure Scenarios

Failure scenarios are living documents. They should be:

Activity	Description
Reviewed during design	Think through failures before building
Updated after incidents	Real incidents reveal gaps in documentation
Referenced during on-call	Engineers should know where to find them
Tested periodically	Chaos engineering validates that documented behaviour matches reality

The failure scenarios you see in this documentation site were created using the same blocks you will use to document your own system resilience.

Next Steps

What-If Analysis

Explore hypothetical changes and their impacts

Learn more →

Creating Risk Registers

Full risk tracking

Learn more →

Documenting Risks

Individual risk documentation

Learn more →