🚀 Autoscaling in Kubernetes: HPA vs VPA vs KEDA — Explained from Basics to Pro
When you run applications in Kubernetes, one of your biggest concerns is:
“How do I scale my app to handle more traffic automatically — without over-provisioning?”
That’s where Kubernetes autoscaling comes in.
In this blog, we’ll dive deep into HPA, VPA, and KEDA — the three most important autoscaling mechanisms in the Kubernetes world. You’ll learn:
- What each one does
- When (and when not) to use them
- How they compare
- Real-world examples
- YAML samples and architecture diagrams

☸️ What is Autoscaling in Kubernetes?
Autoscaling lets your Kubernetes cluster adjust workloads dynamically based on metrics like:
- CPU or memory usage
- Queue depth
- Number of requests
- Custom metrics
- External events (messages, schedules)
Without autoscaling, you’d have to manually add or remove pods, which defeats the whole point of container orchestration.
🔄 1. Horizontal Pod Autoscaler (HPA)
📌 What is HPA?
HPA automatically adds or removes pods in a deployment, replica set, or stateful set based on CPU usage, memory usage, or custom metrics.
Think of it as:
“When load increases, spin up more pods. When load drops, scale down.”
✅ Use Case:
- Web apps with fluctuating user traffic
- APIs with request-based workloads
🔧 How It Works:
- HPA controller watches pod metrics (from metrics-server)
- Compares current usage with the target
- Adjusts replicas up or down
🧪 Example YAML:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
📏 2. Vertical Pod Autoscaler (VPA)
📌 What is VPA?
While HPA scales out (pods), VPA adjusts the resources of a single pod — it increases or decreases CPU and memory requests/limits for containers.
“Make the pod stronger instead of multiplying it.”
✅ Use Case:
- Backend jobs or batch workloads
- ML training tasks
- Apps with fluctuating but non-concurrent loads
🔧 How It Works:
- VPA monitors pod performance
- Suggests or updates resource settings
- Can either just “recommend” or “automatically apply” changes (updateMode)
🧪 Example YAML:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto"
⚡ 3. KEDA (Kubernetes-based Event-Driven Autoscaler)
📌 What is KEDA?
KEDA enables event-driven scaling in Kubernetes — scaling based on:
- Kafka topic lag
- RabbitMQ queue depth
- Azure Blob count
- AWS SQS, Google Pub/Sub, Prometheus queries, etc.
It’s perfect for workloads that don’t rely on CPU or memory but respond to external triggers.
✅ Use Case:
- Serverless, event-driven apps
- Message consumers, background workers
- Event-based microservices (e.g., IoT, stream processors)
🔧 How It Works:
- Uses Scalers (prebuilt integrations)
- Deploys a ScaledObject to define autoscaling logic
- Works with HPA under the hood, but enables external metrics
🧪 Example YAML (for Kafka):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
spec:
scaleTargetRef:
name: kafka-consumer
pollingInterval: 30
cooldownPeriod: 60
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: kafka
metadata:
bootstrapServers: my-cluster-kafka:9092
topic: my-topic
lagThreshold: "100"
🔍 Comparison: HPA vs VPA vs KEDA
| Feature | HPA | VPA | KEDA |
|---|---|---|---|
| Scales By | Pod count | CPU/memory limits | Event metrics |
| Direction | Horizontal | Vertical | Horizontal |
| Metrics Type | Resource/Custom | Resource usage | External sources (Kafka, SQS) |
| Update Frequency | Constant | Scheduled or trigger-based | Event-based |
| Use With | Web apps, APIs | Batch jobs, DBs | Message queues, IoT, Serverless |
| Built-In? | ✅ Native | ✅ Native | ❌ External (install via Helm) |
🧠 Can You Combine Them?
Yes!
✅ HPA + VPA (with caveats):
- You can run both, but HPA scales pods while VPA changes resource requests
- Might need tuning to avoid conflicts
✅ KEDA + HPA:
- KEDA uses HPA under the hood with external triggers
- You can also use KEDA + VPA for fine-grained control
💼 Real-World Scenarios
| Scenario | Best Autoscaler |
|---|---|
| E-commerce site scaling with traffic | HPA |
| ML model training with dynamic resource needs | VPA |
| Kafka-based order processing system | KEDA |
| Hybrid pipeline (e.g., APIs + queues) | HPA + KEDA |
| Cost optimization for idle apps | VPA + KEDA |
🔐 Gotchas & Best Practices
| Tip | Why It Matters |
|---|---|
| Always set minReplica | Prevent scaling to zero unexpectedly |
| Use resource limits | VPA relies on actual usage for decisions |
| Monitor HPA cooldowns | Too aggressive = flapping pods |
| Don’t use HPA on CPU-bound single-threaded apps | Won’t scale as expected |
| Use Prometheus Adapter for HPA custom metrics | Extend beyond CPU/memory |
| Use KEDA for event-driven use cases | Don’t force HPA where it doesn’t fit |
📦 Tools & Resources
| Tool | Description |
|---|---|
| metrics-server | Required for HPA/VPA to collect pod metrics |
| KEDA | Install via Helm or kubectl |
| Prometheus Adapter | Use Prometheus metrics for HPA |
| Vertical Pod Autoscaler | Enable via admission controller |
| Lens / K9s | Visualize autoscaling in real time |
🏁 Final Thoughts
Kubernetes autoscaling isn’t one-size-fits-all.
- Use HPA when you care about resource usage.
- Use VPA when you want to tune performance inside pods.
- Use KEDA when your system responds to events, not load.
Mastering autoscaling helps you deliver apps that are not only resilient, but also cost-efficient and responsive to real-world usage.