1. Study Guide: Understanding the Scope of Failure
In the AWS ecosystem, the fundamental goal of a Solutions Architect is to design systems that can withstand failures. This involves a strategic choice between Multi-AZ (Availability Zone) deployments for high availability and Multi-Region deployments for disaster recovery and global performance.
Multi-AZ is like having a spare tire inside your car. If one tire goes flat (an AZ failure), you can swap it immediately and keep driving without much delay. Multi-Region is like having a second car parked in a different city. If your entire city experiences a massive flood (a Regional failure), you can travel to the other city and use the second car to continue your journey.
Core Concepts: The Well-Architected View
Under the Reliability Pillar of the AWS Well-Architected Framework, AWS emphasizes that everything fails all the time.
- Multi-AZ: Focuses on High Availability (HA). It protects against data center outages, power failures, or localized networking issues. It typically uses synchronous replication to ensure no data loss.
- Multi-Region: Focuses on Disaster Recovery (DR) and Business Continuity. It protects against rare but catastrophic events affecting an entire geographic area. It typically uses asynchronous replication.
Comparison Table: Design Trade-offs
| Feature | Multi-AZ Design | Multi-Region Design |
|---|---|---|
| Primary Goal | High Availability (HA) & Fault Tolerance | Disaster Recovery (DR) & Low Latency |
| Replication | Synchronous (usually) | Asynchronous (usually) |
| Latency | Low (Single-digit ms) | High (Tens to hundreds of ms) |
| Cost | Moderate (Data transfer is often free/low) | High (Duplicate stacks + Cross-region transfer) |
| Complexity | Low (Managed by AWS services) | High (Requires DNS/Traffic routing logic) |
Scenario-Based Decision Matrix
- If you need to survive a single data center failure with zero data loss: Use Multi-AZ.
- If your RTO/RPO requirements are measured in seconds/minutes: Use Multi-AZ.
- If you need to serve users in Europe and Asia with sub-100ms latency: Use Multi-Region.
- If you must comply with data sovereignty laws requiring data to stay in a specific country: Use Multi-AZ (within that Region).
- If you are protecting against a total AWS service outage in a specific geography: Use Multi-Region.
Exam Tips: Golden Nuggets
- RDS Multi-AZ vs. Read Replicas: Multi-AZ is for HA (synchronous, automatic failover); Read Replicas are for scaling (asynchronous, can be cross-region).
- S3 Durability: S3 Standard is Multi-AZ by default (replicated across ≥3 AZs). S3 One Zone-IA is NOT.
- Route 53: Use Health Checks and Failover Routing to transition traffic between Regions.
- Aurora Global Database: The best choice for low-latency cross-region disaster recovery (typical RPO < 1 sec).
2. Architectural Infographic
Key Services
- ELB: Distributes traffic across AZs.
- ASG: Spans multiple AZs.
- DynamoDB: Global Tables for Multi-Region.
Common Pitfalls
- Hardcoding AZ names (use AZ IDs).
- Ignoring Cross-Region Data Transfer costs.
- Assuming Multi-AZ solves application bugs.
Quick Patterns
- Active-Passive: Failover to secondary region.
- Active-Active: Serve traffic from both regions via Route 53.