AWS Database Services: Mastering Indexes
In the AWS ecosystem, specifically for the SAA-C03 exam, “Indexes” primarily refer to the mechanisms used to speed up data retrieval in Amazon DynamoDB (NoSQL) and Amazon Redshift (Data Warehouse). While RDS uses standard SQL indexes, DynamoDB’s Global and Local Secondary Indexes are high-probability exam topics.
The Library Analogy
Imagine a massive library. The Primary Key is the book’s unique ISBN. If you want to find books by Author or Genre without walking through every aisle (a “Full Table Scan”), you need an Index—a separate card catalog that organizes the same books by different attributes.
1. DynamoDB Indexes: GSI vs. LSI
DynamoDB offers two types of secondary indexes to allow querying attributes other than the Primary Key.
| Feature | Global Secondary Index (GSI) | Local Secondary Index (LSI) |
|---|---|---|
| Partition Key | Can be different from the base table. | Must be the SAME as the base table. |
| Sort Key | Can be any attribute. | Must be DIFFERENT from the base table. |
| Creation | At any time (Create/Update/Delete). | Only at Table Creation time. |
| Consistency | Eventual Consistency only. | Strong or Eventual Consistency. |
| Scope | Across all partitions (Global). | Limited to a single partition (Local). |
2. Amazon Redshift: Sort Keys
Redshift doesn’t use traditional indexes like Postgres. Instead, it uses Sort Keys to determine the physical order of data on disk, which is vital for “Zone Map” filtering.
- Compound Sort Keys: Best for filters that follow a prefix of the keys (e.g., filter by Year, then Month).
- Interleaved Sort Keys: Gives equal weight to each column in the index. Best for complex queries filtering on multiple different columns regardless of order.
Exam Tips and Gotchas
- The “LSI Constraint”: You cannot add an LSI to an existing DynamoDB table. If the exam scenario asks to add an index to a live table, the answer is always GSI.
- Sparse Indexes: If an item doesn’t have the attribute defined in the index, it isn’t indexed. Use this to create “filtered” views of your data for cost savings.
- Write Heavy Workloads: Every index adds write overhead. If an exam question mentions slow writes, consider if there are too many unnecessary indexes.
- Projected Attributes: In DynamoDB, you can choose to project
ALL,KEYS_ONLY, orINCLUDEspecific attributes into an index. ProjectingALLincreases storage costs.
Decision Matrix / If–Then Guide
- If you need to query by a different Partition Key → Use GSI.
- If you need Strongly Consistent read results from an index → Use LSI.
- If you need to add an index to an existing table → Use GSI.
- If you are performing range queries on a massive OLAP dataset → Use Redshift Sort Keys.
Topics covered:
Summary of key subtopics covered in this guide:
- Difference between Global Secondary Indexes (GSI) and Local Secondary Indexes (LSI).
- Consistency models (Strong vs. Eventual) for DynamoDB indexes.
- Lifecycle management of indexes (Creation/Deletion constraints).
- Redshift Sort Keys (Compound vs. Interleaved) for performance tuning.
- Cost and performance trade-offs of index projections.
Integrations
Indexes integrate with IAM for fine-grained access control (you can restrict access to specific GSIs) and CloudWatch to monitor ConsumedReadCapacityUnits per index.
Scaling
GSI: Has its own provisioned throughput (RCU/WCU). If a GSI is throttled, it can “back-pressure” and throttle writes on the base table.
Optimization
Use Keys_Only projection to minimize storage costs. Only include attributes in the index that are absolutely necessary for the query application.
Production Use Case
Scenario: An e-commerce app stores orders by OrderID (Partition Key). Users need to search their own orders by Email.
Solution: Create a GSI with Email as the Partition Key. This allows efficient lookups without scanning the entire Orders table.