Autoscaling & Load Distribution

Topic Overview

In Google Cloud, Autoscaling and Load Distribution are the twin pillars of high availability and cost-efficiency. Autoscaling ensures you have exactly the number of Compute Engine resources needed to handle current demand, while Load Balancing acts as the traffic cop, directing incoming requests to the healthiest and closest available resources.

The “Busy Restaurant” Analogy

Imagine a popular restaurant. Autoscaling is like the manager calling in extra waiters when they see a line forming outside, and sending them home when the rush ends to save on wages. Load Distribution is the Hostess at the front door; they don’t just let everyone crowd one table—they distribute guests across all open tables so no single waiter is overwhelmed.

Detail Elaboration: Managed Instance Groups (MIGs)

To use autoscaling in GCP, you must use Managed Instance Groups (MIGs). A MIG uses an Instance Template to create identical VMs. Autoscaling works based on policies you define:

CPU Utilization: Scale when average CPU across the group hits a threshold (e.g., 60%).
Load Balancing Capacity: Scale based on how much traffic the Load Balancer is sending.
Cloud Pub/Sub: Scale based on the number of messages in a queue (great for batch processing).
Custom Metrics: Scale based on application-specific data via Cloud Monitoring.

Core Concepts & Best Practices

The “What” and “Why”

Reliability: Health checks automatically recreate failed instances (Self-healing).
Scalability: Scale out (add more VMs) rather than scaling up (making one VM bigger) to handle web-scale traffic.
Cost Optimization: Use Predictive Autoscaling to look at historical data and spin up VMs before the rush starts.
Operational Excellence: Use Blue/Green or Rolling Updates within MIGs to deploy code without downtime.

Comparison: Google Cloud Load Balancers

Feature	External HTTP(S)	Network TCP/UDP	Internal HTTP(S)
Scope	Global or Regional	Regional	Regional
Traffic Type	Layer 7 (Web)	Layer 4 (Non-HTTP)	Layer 7 (Private)
Proxy/Pass-through	Proxy	Pass-through	Proxy
Use Case	Websites, APIs	Gaming, VoIP	Microservices

Decision Matrix: Which Service to Choose?

If your traffic is global web traffic (Port 80/443), Then use Global External HTTP(S) Load Balancing.
If you need to preserve the Client IP address directly at the VM, Then use Network TCP/UDP Load Balancing (Pass-through).
If you want to scale based on a queue of work, Then use Pub/Sub-based Autoscaling.
If you need high availability across regions, Then use Global Load Balancing with backends in multiple regions.

Exam Tips: Golden Nuggets

Distractor Alert: Unmanaged Instance Groups cannot autoscale. If the question mentions autoscaling, the answer must involve a MIG.
Cooldown Period: If instances are being created/deleted too fast (flapping), adjust the “Cooldown period” to allow instances to finish booting before scaling again.
Health Checks: Always ensure your Firewall rules allow the GCP Load Balancer IP ranges (35.191.0.0/16 and 130.211.0.0/22) to reach your instances.
Instance Templates: These are immutable. To change a MIG’s configuration, you must create a new template and update the MIG.

Architectural Flow: Request to Resource

Key GCP Services

MIGs: Group of identical VMs.
Instance Templates: The “Blueprint”.
Health Checks: Monitor app status.

Common Pitfalls

Forgetting to allow Firewall traffic for Health Checks.
Setting min-instances to 0 for production web apps.
Using Regional LB for Global traffic.

Architecture Patterns

Multi-Region: Global HTTP(S) LB + MIGs in us-east1 and europe-west1.
Internal: Internal LB for DB tier.

ACE Exam Guide: Autoscaling & Load Distribution