GCP Associate Cloud Engineer: Spot & Preemptible VMs

In Google Cloud, compute costs can be a significant part of your budget. Spot VMs (and their legacy predecessor, Preemptible VMs) offer a way to access Google’s spare compute capacity at a massive discount—often 60-91% cheaper than standard rates—with the trade-off that Google can reclaim that capacity at any time.

The “Standby Passenger” Analogy

Imagine you are flying from New York to London. A Standard VM is like a confirmed ticket; you have a guaranteed seat. A Spot VM is like a “Standby” ticket. You get to fly on the same plane and use the same amenities for a fraction of the cost, but if a full-paying passenger shows up and the flight is full, the airline asks you to leave the plane before takeoff. You only use the seat because it was going to be empty anyway.

Detail Elaboration: How it Works

Spot VMs are identical to standard Compute Engine instances in terms of performance, machine types, and features. The only difference is the Availability Policy. When Google Cloud needs that capacity back for a customer paying full price, it sends a preemption notice to your instance.

  • Termination Signal: Your VM receives an ACPI G2 Soft Off signal 30 seconds before it is shut down.
  • Shutdown Scripts: You have 30 seconds to run a shutdown script to save state, upload logs, or notify a load balancer.
  • Provisioning Model: You define the VM as “Spot” during creation. Preemptible VMs are the older version with a 24-hour maximum runtime; Spot VMs do not have this time limit.

Core Concepts & Best Practices

Cost Optimization

Use Spot VMs for workloads where the cost of a “retry” is lower than the cost of “guaranteed uptime.” This is the ultimate tool for achieving operational excellence in budget management.

Reliability & Scalability

To use Spot VMs reliably, you must design for fault tolerance. Use Managed Instance Groups (MIGs) with Spot VMs. If one is reclaimed, the MIG will automatically attempt to recreate it when capacity becomes available again.

Comparison: VM Pricing Models

Feature Standard VM Spot VM Preemptible VM (Legacy)
Price Discount 0% (Baseline) 60-91% Discount 60-91% Discount
Max Runtime Unlimited Unlimited (until reclaimed) 24 Hours
Termination Notice N/A 30 Seconds 30 Seconds
Availability SLA High (99.9%+) None (No SLA) None (No SLA)
Best Use Case Production Databases Batch Jobs, CI/CD Legacy Batch Jobs

Decision Matrix: When to use Spot?

  • IF the task can be interrupted and resumed later THEN use Spot VMs.
  • IF the task is a production database with high availability requirements THEN use Standard VMs.
  • IF you are running a containerized workload on GKE that is fault-tolerant THEN use Spot Nodes.
  • IF you have a strict deadline (e.g., payroll processing) THEN avoid Spot VMs.

Exam Tips: Golden Nuggets

  • The 30-Second Rule: Always remember that Spot/Preemptible VMs give exactly 30 seconds of notice. This is a common exam question regarding shutdown scripts.
  • GPU Support: You can attach GPUs to Spot VMs, but they are also preemptible and priced much lower.
  • Preemptible vs. Spot: On the ACE exam, “Spot” is the current terminology. If you see “Preemptible,” remember it has a hard 24-hour limit, whereas Spot does not.
  • Not for State: Never store persistent data on a Spot VM’s local SSD without a backup strategy, as local SSD data is lost when the VM is preempted.

Spot VM Architecture & Lifecycle

Request Spot VM VM Running 30s Preemption Notice Issued Terminated

The Spot VM Lifecycle: From Provisioning to Automatic Reclamation

Key GCP Services

  • Compute Engine: Spot VM instances.
  • GKE: Spot Node Pools for cost-effective clusters.
  • Dataproc: Use Spot VMs for worker nodes in Big Data.

Common Pitfalls

  • Assuming Spot VMs will always be available (Inventory can run out).
  • Forgetting to handle the 30-second shutdown signal.
  • Using Spot VMs for sensitive, real-time user transactions.

Quick Architecture

  • MIG + Autoscaler: Set “Spot” in the Instance Template.
  • Load Balancer: Ensure “Connection Draining” is configured.
  • Cloud Storage: Checkpoint data frequently to GCS.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top