Logo
NeoArc Studio

Documenting Failure Scenarios

Document what happens when things go wrong. Failure scenarios help teams prepare for incidents and build resilient systems.

Every system fails eventually. Networks partition. Databases go down. Third-party services time out. The difference between chaos and controlled response is preparation. Failure scenarios document what happens when components fail and how the system should respond.

Adding a Failure Scenario Block

Failure Scenario Fields

Impact Examples

ScenarioImpact Description
Write failureAll write operations fail. Users cannot save changes.
Checkout blockedCheckout flow blocked. Revenue loss during outage.
Read degradationRead operations degrade to cached data. Users see stale information.

Severity Levels

This example uses the NeoArc Failure Scenario content block.

Scenario Status

Track the validation state of each scenario:

This example uses the NeoArc Failure Scenario content block.

This example uses the NeoArc Failure Scenario content block.

Visualising Failure Dependencies

Graph diagrams help visualise how failures cascade through a system. This example shows component dependencies with failure severity indicated by colour: red for critical, orange for high, green for medium, and blue for low-impact components. External dependencies are shown in indigo.

Categories of Failure Scenarios

Consider documenting scenarios in these categories:

Infrastructure Failures
Database unavailability, cache layer failure, message queue unavailability, storage system failure, network partitions
Dependency Failures
Third-party API outages, internal service unavailability, authentication provider issues, CDN failures
Capacity Failures
Traffic spikes beyond capacity, resource exhaustion, rate limiting triggered, queue backlogs
Data Failures
Data corruption, replication lag, schema migration issues, backup restoration needs

Using Failure Scenarios

Failure scenarios are living documents. They should be:

ActivityDescription
Reviewed during designThink through failures before building
Updated after incidentsReal incidents reveal gaps in documentation
Referenced during on-callEngineers should know where to find them
Tested periodicallyChaos engineering validates that documented behaviour matches reality

The failure scenarios you see in this documentation site were created using the same blocks you will use to document your own system resilience.

Next Steps

What-If Analysis
Explore hypothetical changes and their impacts
Learn more →
Creating Risk Registers
Full risk tracking
Learn more →
Documenting Risks
Individual risk documentation
Learn more →