Cost Management: Amazon EC2 Spot Instances
Amazon EC2 Spot Instances allow you to take advantage of unused EC2 capacity in the AWS Cloud. Spot Instances are available at up to a 90% discount compared to On-Demand prices. The trade-off is that AWS can reclaim these instances with a 2-minute notification if they need the capacity back.
Core Concepts & Mechanics
1. Spot Price & Max Price
The Spot price is set by Amazon EC2 and fluctuates based on the long-term supply and demand of Spot Instances. You pay the current Spot price (not your maximum bid). If the Spot price exceeds your maximum price, or capacity is no longer available, your instance is interrupted.
2. Interruption Handling
When AWS needs the capacity back, it provides a Spot Instance Interruption Notice via Amazon CloudWatch Events or the Instance Metadata Service. You have exactly 2 minutes to save state or drain connections. You can choose for the instance to be Terminated, Stopped, or Hibernated upon interruption.
3. Spot Fleets & Diversification
A Spot Fleet is a collection of Spot Instances (and optionally On-Demand instances). To maximize availability, use diversification. Instead of asking for one specific type (e.g., c5.large), provide a list of multiple instance types across multiple Availability Zones. This reduces the risk of all instances being reclaimed at once.
Comparison Table: EC2 Purchasing Models
| Feature | On-Demand | Reserved / Savings Plans | Spot Instances |
|---|---|---|---|
| Cost Savings | Baseline (0%) | Up to 72% | Up to 90% |
| Availability | Guaranteed | Guaranteed (if Capacity Reservation used) | No Guarantee (Interruptible) |
| Best For | Unpredictable short-term workloads | Steady-state production apps | Fault-tolerant, flexible workloads |
| Payment | Per second/hour | Upfront, Partial, or No Upfront | Per second/hour (Spot Market) |
Decision Matrix: If-Then Guide
- IF the workload is a critical production database THEN use On-Demand or Reserved Instances.
- IF the workload is a stateless web tier that can scale in/out THEN use a mix of On-Demand and Spot.
- IF the workload is a Big Data batch job (EMR) that can resume from checkpoints THEN use Spot Instances.
- IF you need to minimize cost for a non-urgent CI/CD pipeline THEN use Spot Instances.
Exam Tips and Gotchas
- The 2-Minute Warning: This is the most common technical detail tested. Know that you must design your application to handle this gracefully.
- Statelessness: Spot Instances are not for stateful applications. If your app stores data locally without syncing, it’s a bad fit.
- Spot Block (Deprecated): Note that “Spot Blocks” (guaranteed duration) are no longer available for new customers. Focus on standard Spot behavior.
- Termination Grace: If AWS terminates a Spot instance within the first hour of a partial hour, you are not charged for that partial hour. If you terminate it, you pay for the full used time.
- Instance Rebalance Recommendation: AWS can signal that an instance is at high risk of interruption even before the 2-minute warning.
Topics covered :
Summary of key subtopics covered in this guide:
- Spot Price vs. Max Price mechanics.
- Interruption behavior (Terminate, Stop, Hibernate).
- Spot Instance Interruption Notices (2-minute rule).
- Spot Fleet diversification strategies.
- Integration with Auto Scaling Groups (ASG).
- Cost optimization for Batch, EMR, and CI/CD.
Spot Instance Architecture & Lifecycle
Service Ecosystem
ASG: Mix Spot and On-Demand in one group using “Capacity Optimized” allocation strategy.
Amazon EMR: Use Spot for Task Nodes (processing) while keeping Master nodes On-Demand.
AWS Batch: Automatically manages Spot pricing and instance selection for you.
Scaling & Reliability
Diversification: Use multiple instance types (e.g., m5.large, r5.large) to increase the chance of fulfillment.
Hibernate: For instances that take a long time to boot, use hibernation to save RAM state to EBS when interrupted.
Cost Optimization
Production Case: A media company uses Spot Instances for video transcoding. If a node is interrupted, the job is simply retried by the queue manager (SQS).
Savings Tip: Always set your Max Price to the On-Demand price to ensure maximum uptime without overpaying.