AWS Database Services: RDS Multi-AZ
Amazon RDS Multi-AZ (Availability Zone) deployments provide enhanced availability and durability for Database Instances, making them the natural choice for production environments. In a Multi-AZ deployment, Amazon RDS automatically provisions and maintains a synchronous “standby” replica in a different Availability Zone.
The Real-World Analogy
Imagine a high-end restaurant kitchen. The Primary Instance is your Head Chef. If the Head Chef gets sick (hardware failure), a Standby Chef (Multi-AZ Standby) in a separate kitchen across town immediately takes over the orders. Because they’ve been following the same recipe in real-time (Synchronous Replication), the customers (your application) barely notice the switch—they just keep getting their food via the same delivery address (DNS Endpoint).
Core Concepts & Architecture
When you enable Multi-AZ, RDS creates a primary DB instance and synchronously replicates the data to a standby instance in a different AZ. This is not for scaling; it is strictly for High Availability (HA) and Disaster Recovery (DR).
- Synchronous Replication: Data is written to the primary and standby simultaneously. A transaction is not considered complete until it is committed on both.
- Automatic Failover: If the primary fails, RDS flips the CNAME (DNS record) to point to the standby. No manual intervention or connection string changes are required.
- No Read Access: In a standard Multi-AZ deployment, you cannot read from the standby instance. It is purely “passive.”
Multi-AZ Deployment Options
| Feature | Multi-AZ Instance | Multi-AZ Cluster | Read Replica |
|---|---|---|---|
| Primary Purpose | High Availability / DR | HA + Read Scaling | Read Scaling |
| Replication | Synchronous | Semi-synchronous | Asynchronous |
| Standby Usage | Passive (No Access) | Readable Standbys | Readable |
| Failover | Automatic (DNS) | Automatic (DNS) | Manual (Promotion) |
Exam Tips and Gotchas
- The “Scale” Trap: If an exam question asks how to scale read performance, Multi-AZ is the wrong answer. Choose Read Replicas. If the question asks for availability, Multi-AZ is the winner.
- Backup Impact: During the automated backup window, snapshots are taken from the Standby in Multi-AZ deployments. This prevents I/O suspension on the Primary.
- Latency: Because replication is synchronous, Multi-AZ can introduce slight write latency compared to a Single-AZ setup.
- DNS Magic: Failover happens by updating the DNS CNAME. Your application code does not need to change its connection string.
- KMS Encryption: If the primary is encrypted, the standby is also encrypted.
Decision Matrix / If–Then Guide
- IF the requirement is 99.95% availability for a production SQL Server/Oracle DB… THEN use Multi-AZ.
- IF you need to scale reads for a reporting module… THEN use Read Replicas.
- IF you need HA and also want to use the standby for read traffic… THEN use RDS Multi-AZ DB Clusters (available for MySQL/Postgres).
- IF you need to survive a Regional failure… THEN use Cross-Region Read Replicas or Cross-Region Snapshots.
Topics covered :
Summary of key subtopics covered in this guide:
- Difference between Synchronous and Asynchronous replication.
- Failover mechanisms (DNS/CNAME updates).
- Multi-AZ Instance vs. Multi-AZ Cluster (Readable standbys).
- Comparison between Multi-AZ and Read Replicas.
- Impact of Multi-AZ on performance and backups.
RDS Multi-AZ Architectural Flow
IAM & KMS: Integrated for access control and encryption at rest.
CloudWatch: Monitors failover events via RDS Event Notifications.
VPC: Requires a DB Subnet Group spanning at least 2 AZs.
Write Latency: Slightly higher due to “double-write” synchronous nature.
Zero Downside Backups: Backups are taken from the Standby, preserving Primary performance.
Instance Scaling: You can scale the instance size; RDS performs it on the standby first to minimize downtime.
Pricing: Roughly 2x the cost of a Single-AZ instance (you are paying for two instances).
Data Transfer: Replication traffic between AZs for Multi-AZ is free.
Use Case: Production ERP, Financial systems, and any app where RTO < 2 mins is required.