AWS SAA-C03: Disaster Recovery Models

Overview of Disaster Recovery

Disaster Recovery in AWS focuses on how an organization responds when a service disruption occurs. It is a critical component of the Reliability Pillar of the AWS Well-Architected Framework. The primary goal is to minimize the impact on users by meeting two key metrics:

RPO (Recovery Point Objective): Maximum acceptable data loss (measured in time).
RTO (Recovery Time Objective): Maximum acceptable downtime before service is restored.

The “Transportation” Analogy

To understand the four DR models, imagine you are preparing for a flat tire during a road trip:

Backup & Restore: You have a spare tire in the trunk, but you have to pull over and manually change it (High RTO).
Pilot Light: You have a small “donut” tire already mounted on a special fifth wheel that just needs to be inflated to take the load (Lower RTO).
Warm Standby: You are driving a truck that has dual rear wheels. If one pops, the other is already spinning and carrying the load, though you might need to slow down (Very Low RTO).
Multi-Site Active-Active: You are driving two identical cars at the same time. If one disappears, you are already in the other one (Zero RTO).

Core Concepts & Comparison

Strategy	RPO	RTO	Cost	Complexity
Backup & Restore	Hours	24h+	$	Low
Pilot Light	Minutes	Hours	$$	Medium
Warm Standby	Seconds	Minutes	$$$	High
Multi-Site	Near Zero	Near Zero	$$$$	Very High

Decision Matrix: If/Then Scenarios

If the requirement is the lowest possible cost and high downtime is acceptable, then use Backup & Restore (S3 + AWS Backup).
If you need to keep a “scaled-down” version of your infrastructure always running, then use Pilot Light.
If you need a “business-critical” app to failover in minutes with a small amount of traffic already handled in Region B, then use Warm Standby.
If the application is “mission-critical” and cannot afford any downtime, then use Multi-Site Active-Active (Route 53 + Global Accelerator).

Exam Tips: Golden Nuggets

Route 53: Always the answer for DNS-based failover between regions using Health Checks.
Aurora Global Database: Best for cross-region DR with a typical RPO of < 1 second and RTO of < 1 minute.
S3 Cross-Region Replication (CRR): The foundation for most DR strategies to ensure data is physically in another region.
CloudFormation: Crucial for “Pilot Light” and “Warm Standby” to ensure infrastructure parity in the secondary region.

Visualizing DR Flow

Key Services

AWS Backup: Centralized backup management.
Route 53: Failover routing policies.
RDS Read Replicas: Cross-region data sync.

Common Pitfalls

Hardcoded IPs: Use DNS names instead for failover.
No Testing: DR plans that aren’t tested fail during real events.
Service Quotas: Forgetting to increase limits in the DR region.

Quick Patterns

S3 to S3: Cross-Region Replication (CRR).
EBS to S3: EBS Snapshots copied to other regions.
AMI Copy: Share custom images across regions.

AWS Disaster Recovery (DR) Strategies