High Availability (HA) Concepts for SAA-C03
In the AWS ecosystem, High Availability refers to a system’s ability to remain operational and accessible even when individual components fail. Unlike “Fault Tolerance,” which aims for zero downtime (often at a high cost), HA focuses on 99.9% to 99.999% uptime by ensuring rapid, automated recovery and redundancy.
Topics covered:
Summary of key subtopics covered in this guide:
- Difference between Availability, Durability, and Fault Tolerance.
- Multi-AZ Architecture as the foundation of HA.
- Regional vs. Zonal Service availability.
- Elastic Load Balancing (ELB) and Auto Scaling Groups (ASG).
- Database HA (RDS Multi-AZ vs. Read Replicas).
- The role of Route 53 in HA.
1. Core Concepts & Definitions
Availability vs. Durability
Students often confuse these two. Availability is the percentage of time a service is “up” (e.g., S3 Standard is 99.99% available). Durability is the likelihood that data will not be lost (e.g., S3 is 99.999999999% durable).
High Availability vs. Fault Tolerance
| Feature | High Availability (HA) | Fault Tolerance (FT) |
|---|---|---|
| Downtime | Minimal (seconds to minutes) | Zero |
| Cost | Moderate | Very High (requires 2x resources) |
| Complexity | Standard Multi-AZ | Specialized (e.g., Cluster placement) |
2. Architectural Patterns for HA
The Multi-AZ Principle
The gold standard for SAA-C03 is the Multi-AZ deployment. By distributing resources across at least two Availability Zones within a Region, you protect your application from a single data center failure.
- Compute: Use an ASG with a “Minimum Capacity” of 2, spread across 2 AZs.
- Database: Use RDS Multi-AZ for synchronous replication and automatic failover.
- Networking: Use an Application Load Balancer (ALB) to route traffic only to healthy instances in multiple AZs.
3. Service-Specific HA Strategies
RDS & Aurora
For the exam, remember: RDS Multi-AZ is for Disaster Recovery/HA (Synchronous), while Read Replicas are for Performance/Scaling (Asynchronous). Amazon Aurora is HA by default, storing 6 copies of your data across 3 AZs.
S3 & EFS
S3 is a Regional service; it is inherently HA. Amazon EFS (Elastic File System) is also Regional, allowing multiple EC2 instances in different AZs to access the same file system simultaneously.
Exam Tips and Gotchas
- Single Point of Failure (SPOF): Any architecture with a single EC2 instance, a single AZ, or a non-replicated database is NOT Highly Available.
- NAT Gateways: For high availability, you must deploy one NAT Gateway in each AZ. If the AZ with the only NAT Gateway goes down, all private subnets lose internet access.
- Route 53 Health Checks: Route 53 can remove an unhealthy endpoint from its DNS response. This is the first line of defense for HA across Regions.
- Sticky Sessions: Be careful with “Session Affinity” on ALBs. While useful, it can lead to uneven load distribution, potentially impacting HA during a spike.
Decision Matrix / If–Then Guide
| If the requirement is… | Then choose… |
|---|---|
| Automatic failover for a SQL database | RDS Multi-AZ Deployment |
| Shared storage for EC2 across multiple AZs | Amazon EFS | Route 53 with Multi-Region Failover Routing |
| Zero data loss for a critical application | Synchronous Replication (Multi-AZ) |
Infographic: The HA Architecture Flow
IAM: Control permissions for failover actions.
CloudWatch: Trigger ASG scaling based on health metrics.
VPC: Use subnets in different AZs for network isolation.
Horizontal Scaling: Adding more instances (Ideal for HA).
Vertical Scaling: Increasing instance size (Risky for HA as it requires a restart).
Use Reserved Instances for baseline HA capacity and Spot Instances for additional scaling capacity where workload interruption is acceptable.