Failure Scenario Block
Document what happens when things go wrong. Capture failure modes, detection signals, recovery strategies, and lessons learned for architecture resilience.
Failure scenarios document how the system behaves when components fail. They describe detection mechanisms, immediate responses, recovery strategies, and impacts. Well-documented failure scenarios improve incident response and inform architecture decisions for resilience.
When to Use
Block Properties
| Property | Required | Description |
|---|---|---|
| Title | Yes | Short name for the failure scenario |
| Description | Yes | Detailed description of the failure |
| Impact | Yes | Business and technical impact of the failure |
| Severity | No | Severity level: low, medium, high, or critical |
| Detection Signals | No | Array of signals indicating this failure has occurred |
| Immediate Response | No | First actions when failure is detected |
| Recovery Strategy | No | How to restore normal operation |
| Time to Recover | No | Expected recovery duration |
| Data Impact | No | Impact on data integrity or availability |
| User Impact | No | How users are affected by this failure |
| Lessons Learned | No | Insights from testing or actual failures |
| Status | No | Current state: hypothetical, tested, occurred, or mitigated |
| Review Date | No | When to review this scenario |
| Notes | No | Additional context or observations |
Severity Values
Status Values
Example: Database Failure
A critical failure scenario for primary database unavailability.
Example: External Service Failure
A failure scenario for third-party service unavailability.
Example: Regional Failure
A failure scenario for cloud region unavailability.
Example: Security Incident
A failure scenario for security compromise detection.