AWS Disaster Recovery (DR) Strategies

Mastering RTO and RPO for the SAA-C03 Exam

Overview of Disaster Recovery

Disaster Recovery in AWS focuses on how an organization responds when a service disruption occurs. It is a critical component of the Reliability Pillar of the AWS Well-Architected Framework. The primary goal is to minimize the impact on users by meeting two key metrics:

  • RPO (Recovery Point Objective): Maximum acceptable data loss (measured in time).
  • RTO (Recovery Time Objective): Maximum acceptable downtime before service is restored.

The “Transportation” Analogy

To understand the four DR models, imagine you are preparing for a flat tire during a road trip:

  • Backup & Restore: You have a spare tire in the trunk, but you have to pull over and manually change it (High RTO).
  • Pilot Light: You have a small “donut” tire already mounted on a special fifth wheel that just needs to be inflated to take the load (Lower RTO).
  • Warm Standby: You are driving a truck that has dual rear wheels. If one pops, the other is already spinning and carrying the load, though you might need to slow down (Very Low RTO).
  • Multi-Site Active-Active: You are driving two identical cars at the same time. If one disappears, you are already in the other one (Zero RTO).

Core Concepts & Comparison

Strategy RPO RTO Cost Complexity
Backup & Restore Hours 24h+ $ Low
Pilot Light Minutes Hours $$ Medium
Warm Standby Seconds Minutes $$$ High
Multi-Site Near Zero Near Zero $$$$ Very High

Decision Matrix: If/Then Scenarios

  • If the requirement is the lowest possible cost and high downtime is acceptable, then use Backup & Restore (S3 + AWS Backup).
  • If you need to keep a “scaled-down” version of your infrastructure always running, then use Pilot Light.
  • If you need a “business-critical” app to failover in minutes with a small amount of traffic already handled in Region B, then use Warm Standby.
  • If the application is “mission-critical” and cannot afford any downtime, then use Multi-Site Active-Active (Route 53 + Global Accelerator).

Exam Tips: Golden Nuggets

  • Route 53: Always the answer for DNS-based failover between regions using Health Checks.
  • Aurora Global Database: Best for cross-region DR with a typical RPO of < 1 second and RTO of < 1 minute.
  • S3 Cross-Region Replication (CRR): The foundation for most DR strategies to ensure data is physically in another region.
  • CloudFormation: Crucial for “Pilot Light” and “Warm Standby” to ensure infrastructure parity in the secondary region.

Visualizing DR Flow

Primary Region (Active) App Server DB Replication DR Region (Standby) Idle/Off Read

Key Services

  • AWS Backup: Centralized backup management.
  • Route 53: Failover routing policies.
  • RDS Read Replicas: Cross-region data sync.

Common Pitfalls

  • Hardcoded IPs: Use DNS names instead for failover.
  • No Testing: DR plans that aren’t tested fail during real events.
  • Service Quotas: Forgetting to increase limits in the DR region.

Quick Patterns

  • S3 to S3: Cross-Region Replication (CRR).
  • EBS to S3: EBS Snapshots copied to other regions.
  • AMI Copy: Share custom images across regions.

AWS SAA-C03 Exam Prep | Disaster Recovery Models | Study Guide

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top