π Autoscaling in Kubernetes: HPA vs VPA vs KEDA β Explained from Basics to Pro
When you run applications in Kubernetes, one of your biggest concerns is:
βHow do I scale my app to handle more traffic automatically β without over-provisioning?β
Thatβs where Kubernetes autoscaling comes in.
In this blog, weβll dive deep into HPA, VPA, and KEDA β the three most important autoscaling mechanisms in the Kubernetes world. Youβll learn:
- What each one does
- When (and when not) to use them
- How they compare
- Real-world examples
- YAML samples and architecture diagrams

βΈοΈ What is Autoscaling in Kubernetes?
Autoscaling lets your Kubernetes cluster adjust workloads dynamically based on metrics like:
- CPU or memory usage
- Queue depth
- Number of requests
- Custom metrics
- External events (messages, schedules)
Without autoscaling, youβd have to manually add or remove pods, which defeats the whole point of container orchestration.
π 1. Horizontal Pod Autoscaler (HPA)
π What is HPA?
HPA automatically adds or removes pods in a deployment, replica set, or stateful set based on CPU usage, memory usage, or custom metrics.
Think of it as:
βWhen load increases, spin up more pods. When load drops, scale down.β
β Use Case:
- Web apps with fluctuating user traffic
- APIs with request-based workloads
π§ How It Works:
- HPA controller watches pod metrics (from metrics-server)
- Compares current usage with the target
- Adjusts replicas up or down
π§ͺ Example YAML:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: webapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: webapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
π 2. Vertical Pod Autoscaler (VPA)
π What is VPA?
While HPA scales out (pods), VPA adjusts the resources of a single pod β it increases or decreases CPU and memory requests/limits for containers.
βMake the pod stronger instead of multiplying it.β
β Use Case:
- Backend jobs or batch workloads
- ML training tasks
- Apps with fluctuating but non-concurrent loads
π§ How It Works:
- VPA monitors pod performance
- Suggests or updates resource settings
- Can either just βrecommendβ or βautomatically applyβ changes (updateMode)
π§ͺ Example YAML:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto"
β‘ 3. KEDA (Kubernetes-based Event-Driven Autoscaler)
π What is KEDA?
KEDA enables event-driven scaling in Kubernetes β scaling based on:
- Kafka topic lag
- RabbitMQ queue depth
- Azure Blob count
- AWS SQS, Google Pub/Sub, Prometheus queries, etc.
Itβs perfect for workloads that donβt rely on CPU or memory but respond to external triggers.
β Use Case:
- Serverless, event-driven apps
- Message consumers, background workers
- Event-based microservices (e.g., IoT, stream processors)
π§ How It Works:
- Uses Scalers (prebuilt integrations)
- Deploys a ScaledObject to define autoscaling logic
- Works with HPA under the hood, but enables external metrics
π§ͺ Example YAML (for Kafka):
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-consumer-scaler
spec:
scaleTargetRef:
name: kafka-consumer
pollingInterval: 30
cooldownPeriod: 60
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: kafka
metadata:
bootstrapServers: my-cluster-kafka:9092
topic: my-topic
lagThreshold: "100"
π Comparison: HPA vs VPA vs KEDA
Feature | HPA | VPA | KEDA |
---|---|---|---|
Scales By | Pod count | CPU/memory limits | Event metrics |
Direction | Horizontal | Vertical | Horizontal |
Metrics Type | Resource/Custom | Resource usage | External sources (Kafka, SQS) |
Update Frequency | Constant | Scheduled or trigger-based | Event-based |
Use With | Web apps, APIs | Batch jobs, DBs | Message queues, IoT, Serverless |
Built-In? | β Native | β Native | β External (install via Helm) |
π§ Can You Combine Them?
Yes!
β HPA + VPA (with caveats):
- You can run both, but HPA scales pods while VPA changes resource requests
- Might need tuning to avoid conflicts
β KEDA + HPA:
- KEDA uses HPA under the hood with external triggers
- You can also use KEDA + VPA for fine-grained control
πΌ Real-World Scenarios
Scenario | Best Autoscaler |
---|---|
E-commerce site scaling with traffic | HPA |
ML model training with dynamic resource needs | VPA |
Kafka-based order processing system | KEDA |
Hybrid pipeline (e.g., APIs + queues) | HPA + KEDA |
Cost optimization for idle apps | VPA + KEDA |
π Gotchas & Best Practices
Tip | Why It Matters |
---|---|
Always set minReplica | Prevent scaling to zero unexpectedly |
Use resource limits | VPA relies on actual usage for decisions |
Monitor HPA cooldowns | Too aggressive = flapping pods |
Donβt use HPA on CPU-bound single-threaded apps | Wonβt scale as expected |
Use Prometheus Adapter for HPA custom metrics | Extend beyond CPU/memory |
Use KEDA for event-driven use cases | Don’t force HPA where it doesn’t fit |
π¦ Tools & Resources
Tool | Description |
---|---|
metrics-server | Required for HPA/VPA to collect pod metrics |
KEDA | Install via Helm or kubectl |
Prometheus Adapter | Use Prometheus metrics for HPA |
Vertical Pod Autoscaler | Enable via admission controller |
Lens / K9s | Visualize autoscaling in real time |
π Final Thoughts
Kubernetes autoscaling isn’t one-size-fits-all.
- Use HPA when you care about resource usage.
- Use VPA when you want to tune performance inside pods.
- Use KEDA when your system responds to events, not load.
Mastering autoscaling helps you deliver apps that are not only resilient, but also cost-efficient and responsive to real-world usage.
Leave a Reply