AWS Certified Solutions Architect (SAA-C03)
Study Guide: Chapter 6 – Deep Dive Into AWS DatabasesQuick Reference Infographic
📊
Relational (SQL)
Service: Amazon RDS / Aurora
Best For: Complex queries, transactional integrity (ACID), structured data.
Engines: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server.
⚡
Non-Relational (NoSQL)
Service: DynamoDB / DocumentDB
Best For: High scale, flexible schema, low-latency (BASE), big data.
Scaling: Horizontal (Scaling Out).
🚀
Optimization
Caching: ElastiCache (Redis/Memcached).
Read Scaling: Read Replicas (up to 5 in RDS).
HA: Multi-AZ Deployments.
🔍
Big Data & Analytics
Warehouse: Redshift.
Serverless Query: Athena (SQL on S3).
Streaming: Kinesis.
1. Database Fundamentals
OLTP vs. OLAP
| Feature | OLTP (Online Transaction Processing) | OLAP (Online Analytical Processing) |
|---|---|---|
| Data | Current, operational data | Historical data from various sources |
| Use Case | Business tasks (Order entry) | Data mining/Analytics |
| Speed | Fast queries | Slower, complex queries |
| Normalization | Highly normalized | Not normalized (Star/Snowflake) |
The ACID vs. BASE Properties
- ACID (SQL): Atomicity, Consistency, Isolation, Durability. Ensures “all-or-nothing” reliability.
- BASE (NoSQL): Basically Available, Soft State, Eventual Consistency. Prioritizes availability and scale over immediate consistency.
2. Amazon RDS (Relational Database Service)
RDS is a managed service that handles provisioning, patching, and backups.
Key Features
- Multi-AZ: Synchronous replication to a standby instance in a different AZ. Used for Disaster Recovery.
- Read Replicas: Asynchronous replication. Used for Scaling Read Performance. Replicas can be promoted to primary.
- Storage Types:
- General Purpose SSD (Balanced)
- Provisioned IOPS (High performance)
- Magnetic (Legacy/Archival)
Exam Tip: If the question asks for High Availability, think Multi-AZ. If it asks for Read Performance, think Read Replicas.
3. Amazon Aurora
AWS’s proprietary cloud-native database. 5x faster than standard MySQL; 3x faster than PostgreSQL.
- Aurora Serverless: Automatically scales capacity based on demand. Best for infrequent or unpredictable workloads.
- Global Database: One primary region and up to 5 read-only secondary regions. Latency < 1 second.
- Storage: Self-healing storage that replicates 6 copies of data across 3 AZs.
4. Amazon DynamoDB (NoSQL Deep Dive)
A serverless, key-value database providing single-digit millisecond latency.
- DAX (DynamoDB Accelerator): In-memory cache for DynamoDB; reduces latency from milliseconds to microseconds.
- Global Tables: Multi-region, multi-active replication.
- Capacity Modes:
- Provisioned: You specify Read/Write Capacity Units (RCUs/WCUs).
- On-Demand: Pay per request. Best for unpredictable traffic.
5. Specialized AWS Databases
- Amazon Neptune: Graph database (Social networks, fraud detection).
- Amazon DocumentDB: Managed MongoDB compatible service.
- Amazon Keyspaces: Managed Apache Cassandra.
- Amazon QLDB: Ledger database (Immutable, cryptographically verifiable logs).
- Amazon ElastiCache:
- Redis: Complex data types, persistence, pub/sub.
- Memcached: Simple key-value, no persistence.
6. Big Data & Analytics Tools
- Amazon Redshift: SQL-based Data Warehouse. Uses columnar storage for OLAP.
- Amazon Athena: Serverless query service to analyze data in S3 using standard SQL.
- AWS Glue: Serverless ETL (Extract, Transform, Load) service.
- Amazon EMR: Big data platform for Hadoop, Spark, and other distributed frameworks.
- Amazon Kinesis: Real-time streaming data collection and processing.