GCP Associate Cloud Engineer: Spot & Preemptible VMs
In Google Cloud, compute costs can be a significant part of your budget. Spot VMs (and their legacy predecessor, Preemptible VMs) offer a way to access Google’s spare compute capacity at a massive discount—often 60-91% cheaper than standard rates—with the trade-off that Google can reclaim that capacity at any time.
The “Standby Passenger” Analogy
Imagine you are flying from New York to London. A Standard VM is like a confirmed ticket; you have a guaranteed seat. A Spot VM is like a “Standby” ticket. You get to fly on the same plane and use the same amenities for a fraction of the cost, but if a full-paying passenger shows up and the flight is full, the airline asks you to leave the plane before takeoff. You only use the seat because it was going to be empty anyway.
Detail Elaboration: How it Works
Spot VMs are identical to standard Compute Engine instances in terms of performance, machine types, and features. The only difference is the Availability Policy. When Google Cloud needs that capacity back for a customer paying full price, it sends a preemption notice to your instance.
- Termination Signal: Your VM receives an ACPI G2 Soft Off signal 30 seconds before it is shut down.
- Shutdown Scripts: You have 30 seconds to run a shutdown script to save state, upload logs, or notify a load balancer.
- Provisioning Model: You define the VM as “Spot” during creation. Preemptible VMs are the older version with a 24-hour maximum runtime; Spot VMs do not have this time limit.
Core Concepts & Best Practices
Cost Optimization
Use Spot VMs for workloads where the cost of a “retry” is lower than the cost of “guaranteed uptime.” This is the ultimate tool for achieving operational excellence in budget management.
Reliability & Scalability
To use Spot VMs reliably, you must design for fault tolerance. Use Managed Instance Groups (MIGs) with Spot VMs. If one is reclaimed, the MIG will automatically attempt to recreate it when capacity becomes available again.
Comparison: VM Pricing Models
| Feature | Standard VM | Spot VM | Preemptible VM (Legacy) |
|---|---|---|---|
| Price Discount | 0% (Baseline) | 60-91% Discount | 60-91% Discount |
| Max Runtime | Unlimited | Unlimited (until reclaimed) | 24 Hours |
| Termination Notice | N/A | 30 Seconds | 30 Seconds |
| Availability SLA | High (99.9%+) | None (No SLA) | None (No SLA) |
| Best Use Case | Production Databases | Batch Jobs, CI/CD | Legacy Batch Jobs |
Decision Matrix: When to use Spot?
- IF the task can be interrupted and resumed later THEN use Spot VMs.
- IF the task is a production database with high availability requirements THEN use Standard VMs.
- IF you are running a containerized workload on GKE that is fault-tolerant THEN use Spot Nodes.
- IF you have a strict deadline (e.g., payroll processing) THEN avoid Spot VMs.
Exam Tips: Golden Nuggets
- The 30-Second Rule: Always remember that Spot/Preemptible VMs give exactly 30 seconds of notice. This is a common exam question regarding shutdown scripts.
- GPU Support: You can attach GPUs to Spot VMs, but they are also preemptible and priced much lower.
- Preemptible vs. Spot: On the ACE exam, “Spot” is the current terminology. If you see “Preemptible,” remember it has a hard 24-hour limit, whereas Spot does not.
- Not for State: Never store persistent data on a Spot VM’s local SSD without a backup strategy, as local SSD data is lost when the VM is preempted.
Spot VM Architecture & Lifecycle
The Spot VM Lifecycle: From Provisioning to Automatic Reclamation
Key GCP Services
- Compute Engine: Spot VM instances.
- GKE: Spot Node Pools for cost-effective clusters.
- Dataproc: Use Spot VMs for worker nodes in Big Data.
Common Pitfalls
- Assuming Spot VMs will always be available (Inventory can run out).
- Forgetting to handle the 30-second shutdown signal.
- Using Spot VMs for sensitive, real-time user transactions.
Quick Architecture
- MIG + Autoscaler: Set “Spot” in the Instance Template.
- Load Balancer: Ensure “Connection Draining” is configured.
- Cloud Storage: Checkpoint data frequently to GCS.