GitOps is one of those ideas that sounds like buzzwords… until you run it for 30 days and suddenly you can’t imagine operating Kubernetes without it.
Because GitOps gives you something engineers love:
- A single source of truth
- A deterministic deploy process
- Built-in rollback
- Drift detection
- Auditability
- Repeatability across environments
And it does it with a loop so simple it feels obvious in hindsight:
Git declares what should be running. A controller makes the cluster match Git. Always.
Let’s break it down in a way that’s beginner-friendly, but also practical enough that you can implement it this week.
What GitOps actually means (without hype)
GitOps is an operating model where:
- The desired state of your system is stored in Git (declarative configs)
- Changes happen via pull requests (reviewable, auditable)
- The cluster is continuously reconciled by agents/controllers that pull from Git and apply changes
- Any drift (manual changes) is detected and corrected (or at least flagged)
The GitOps “control loop”
Think of it like Kubernetes itself:
- Kubernetes reconciles pods to match Deployments
- GitOps reconciles clusters to match Git
Desired state (Git) → Reconciler (Argo CD / Flux) → Actual state (Cluster)
…and if Actual ≠ Desired, it corrects.
Why GitOps is worth your time (especially for engineers)
Without GitOps, deployments often become:
- “CI pushes to cluster”
- “Someone applied kubectl manually”
- “Helm install is different per person”
- “Prod differs slightly from staging”
- “Nobody knows why it changed”
With GitOps, deployments become:
- “If it’s not in Git, it doesn’t exist”
- “Every change has a PR, reviewer, and history”
- “Clusters converge automatically”
- “Rollback = revert commit”
- “Drift becomes visible”
If you like the idea of treating production like code (not like a pet), GitOps fits perfectly.
The two main players: Argo CD and Flux (what they are)
Both Argo CD and Flux implement GitOps for Kubernetes. They are pull-based: they watch Git and reconcile the cluster.
But they feel different in how they operate:
- Argo CD is often described as “GitOps with a strong UI and application-centric model.”
- Flux is often described as “GitOps as a set of composable controllers (Kubernetes-native first).”
Let’s compare them the way engineers actually choose tools.
Argo CD vs Flux: the practical comparison
1) Mental model
Argo CD
You declare Applications (or generate them via ApplicationSets).
Argo CD is “app-first”: what app, what repo path, what cluster, what sync policy.
Flux
You declare sources and reconciliation objects like GitRepository, Kustomization, HelmRelease.
Flux is “controller-first”: source → reconcile → apply.
2) Developer experience (day-to-day)
Argo CD
- Strong UI: see apps, health, sync status, diffs
- Easy for teams to “see what’s going on”
- Great for multi-team visibility and operations
Flux
- More Kubernetes-native: you interact mostly via YAML + kubectl
- Less UI-oriented (you rely on CRD status, events, dashboards)
- Great if your org prefers Git + CLI workflows over UI
3) Installation and footprint
Argo CD
- One main product with a clear structure
- Generally quick to get a first app synced
Flux
- A toolkit of controllers (source-controller, kustomize-controller, helm-controller, notification-controller, etc.)
- Very modular and clean once you learn the pieces
4) Multi-tenancy and access control
Argo CD
- Strong RBAC model for app access
- UI makes it easy to manage who sees what
- Often chosen by platform teams supporting many app teams
Flux
- Multi-tenancy is usually enforced via Kubernetes RBAC + namespaces + repo layout
- Very solid, but feels more “build your structure” than “use built-in UI RBAC”
5) Handling many apps / many clusters
Both scale well, but patterns differ:
- Argo CD: ApplicationSets can generate hundreds/thousands of Applications cleanly.
- Flux: You structure reconciliation objects per cluster/namespace/path.
6) Progressive delivery and rollout patterns
- Argo ecosystems often pair well with rollout tooling and app health views.
- Flux ecosystems often pair well with controller-based rollout automation patterns.
(You can do progressive delivery with either—what changes is how you wire it.)
7) Image update automation
- Flux has a strong “image automation” story via dedicated controllers that can update manifests by committing back to Git.
- Argo can do this too, usually via external automation components or pipeline steps that update Git (still GitOps, as long as Git remains the source of truth).
Quick decision guide (real-world)
Pick Argo CD if:
- You want a strong UI for app teams and ops
- You want “applications” as the primary abstraction
- You want faster visibility for drift, diff, health, sync
Pick Flux if:
- You want a Kubernetes-native, modular controller approach
- Your org is comfortable with CRDs + YAML-driven ops
- You want first-class Git-based automation for image updates and composability
Many organizations standardize on one. Some run both (platform choice depends on team needs). If you’re starting fresh and want faster adoption, Argo CD often feels easier for beginners because of the UI. If you want a clean controller toolkit and love Kubernetes primitives, Flux feels elegant.
The GitOps “golden rules” (the parts you must get right)
No matter which tool you choose, GitOps works when you follow these rules:
Rule 1: Git is the source of truth
If someone changes the cluster manually, that’s drift.
Either:
- GitOps corrects it (self-heal), or
- GitOps flags it and you fix via PR
Rule 2: Reconciliation is continuous
Not “deploy once.”
It’s “always converge.”
Rule 3: Changes flow through PRs
You want:
- review
- audit trail
- rollback by revert
- fewer “who changed prod?” mysteries
Rule 4: Separate “what” from “how”
- “What”: desired state (manifests/helm/kustomize)
- “How”: GitOps controller reconciliation and policies
This separation keeps the system maintainable.
Step-by-step GitOps setup (works for Argo CD or Flux)
Below is a practical blueprint you can implement in a real platform team.
Step 1: Decide your repo strategy (this matters more than the tool)
There are three common approaches:
A) Monorepo (apps + environments together)
Structure
repo/
apps/
payments/
catalog/
clusters/
prod/
staging/
Pros: simple, one place to look
Cons: permissions and ownership can get messy at scale
B) App repo + environment repo (most scalable)
- Each service owns its app manifests in its repo
- Platform owns an “environments” repo that composes versions
Pros: clean ownership, good for large orgs
Cons: needs good release process for promotion
C) Repo per cluster (infrastructure-first orgs)
Pros: cluster config isolated
Cons: cross-cluster consistency becomes harder
If you’re unsure, choose B (app repo + environment repo). It scales best.
Step 2: Choose a deployment packaging style (Kustomize or Helm)
- Kustomize is great for overlays (dev/stage/prod)
- Helm is great for reusable charts with values per env
You can mix them, but start with one for simplicity.
Step 3: Create environment overlays (dev/stage/prod)
Example with Kustomize overlays:
apps/payments/
base/
deployment.yaml
service.yaml
kustomization.yaml
overlays/
staging/
kustomization.yaml
patch.yaml
prod/
kustomization.yaml
patch.yaml
Base contains common config.
Overlays change replicas, resources, env vars, ingress, etc.
Step 4: Add a promotion workflow
A clean GitOps promotion is simple:
- Merge to
mainupdates staging - A PR from staging → prod updates prod
Promotion becomes:
- a PR with a diff
- a review
- a merge
- an automatic reconciliation
No special “release day ritual.”
Step 5: Decide your drift policy
Three common policies:
- Self-heal ON (cluster always forced back to Git)
Best for: mature teams, strict compliance, “no manual changes” culture - Self-heal OFF but alert
Best for: adoption phase, when teams still occasionally hotfix manually - Hybrid: self-heal for some namespaces, alert-only for others
Best for: gradual rollout
Step 6: Handle secrets properly (do NOT wing this)
Beginner mistake: putting secrets directly in Git. Don’t.
Good patterns:
- Encrypt secrets in Git (git-stored but encrypted)
- Or store secrets in a secret manager and sync them into Kubernetes
- Or use sealed/encrypted secret resources
Core idea: Git can contain secrets only if they’re encrypted and safe-by-design.
Real examples: Argo CD vs Flux YAML
You said “packed with real examples,” so let’s do it.
Example 1: Argo CD “Application” (deploy one app)
This is how Argo CD typically models a deployment:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payments-staging
namespace: argocd
spec:
project: default
source:
repoURL: REPO_PLACEHOLDER
targetRevision: main
path: apps/payments/overlays/staging
destination:
server: https://kubernetes.default.svc
namespace: payments
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
What this means
- “Argo CD, keep
payments-stagingsynced to this Git path” prune: trueremoves resources deleted from GitselfHeal: truefixes drift automatically
Example 2: Flux “GitRepository + Kustomization” (deploy one app)
This is how Flux often models the same thing using Kubernetes-native objects:
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: platform-repo
namespace: flux-system
spec:
interval: 1m
url: REPO_PLACEHOLDER
ref:
branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: payments-staging
namespace: flux-system
spec:
interval: 5m
sourceRef:
kind: GitRepository
name: platform-repo
path: ./apps/payments/overlays/staging
prune: true
targetNamespace: payments
wait: true
timeout: 2m
What this means
- “Flux, pull this repo every minute”
- “Apply that path every 5 minutes into this namespace”
- “Prune removed objects”
- “Wait for readiness”
What you should notice
- Argo CD: a single “Application” object is the main thing you manage.
- Flux: you manage “where the source is” + “how to reconcile it.”
Both are GitOps. Both are valid. The best choice depends on team preferences and operating style.
GitOps patterns that work (and why they work)
Now the fun part: the patterns that make GitOps feel powerful instead of painful.
Pattern 1: “App of Apps” (platform-friendly)
Instead of creating 100 app definitions manually, you define one “root” that points to everything.
Why it’s great
- onboarding a new app = add a folder + commit
- less manual wiring
- consistent structure
Best for
- platform teams managing many apps
Pattern 2: “One folder per cluster” (clean multi-cluster)
Structure example:
clusters/
prod-cluster-1/
kustomization.yaml
apps.yaml
staging-cluster-1/
kustomization.yaml
apps.yaml
Why it’s great
- you can reason about a cluster’s desired state in one place
- disaster recovery becomes straightforward
Pattern 3: “Environment overlays” (dev/stage/prod without chaos)
This is how you avoid copy-paste manifests:
base/= shared configoverlays/staging/= staging-specific patchesoverlays/prod/= prod-specific patches
Why it’s great
- consistent changes across environments
- minimal diffs for promotion
Pattern 4: PR-based promotion (the safest release pipeline)
Promotion becomes a PR that changes only what’s necessary (often just image tag or chart version).
Why it’s great
- the diff tells the story
- the review catches mistakes
- rollback is easy (revert PR)
Pattern 5: “Cost + security guardrails via policy”
GitOps makes it easy to standardize:
- resource limits
- approved registries
- required labels
- network policies
- ingress rules
You can enforce these with admission policies so “bad config never lands.”
Pattern 6: Drift strategy by environment
A practical approach:
- Prod: self-heal ON, prune ON (strict)
- Staging: self-heal ON, prune ON (strict)
- Dev: self-heal OFF but alert (flexible)
This avoids early adoption pain while still keeping prod clean.
Pattern 7: “Bootstrap” a cluster the same way every time
A cluster is “GitOps-ready” when:
- GitOps tool installed
- policies installed
- base namespaces and platform add-ons installed
- app sync begins
If you can bootstrap consistently, scaling becomes easy.
GitOps anti-patterns (the traps that make teams quit)
This is the section that saves you months.
Anti-pattern 1: “CI pushes directly to the cluster”
Symptom: pipeline runs kubectl apply or helm upgrade into prod
Why it’s bad: Git is no longer the source of truth; auditing and rollback get messy
Fix: CI should update Git (create a PR / commit), and GitOps reconciles
Anti-pattern 2: “Secrets in plain text in Git”
Symptom: Secret YAML committed with real passwords/tokens
Why it’s bad: permanent exposure risk (even if you delete later)
Fix: encrypted secrets or external secret systems
Anti-pattern 3: “One giant repo with no ownership boundaries”
Symptom: everyone commits everywhere, reviews become meaningless
Why it’s bad: permission chaos, blame culture, accidental changes
Fix: define ownership by folders + code owners + protected branches, or use separate repos
Anti-pattern 4: “No pruning, ever”
Symptom: you remove resources from Git, but they stay running in cluster
Why it’s bad: resource sprawl, zombie workloads, hidden cost and risk
Fix: enable prune where safe (especially in non-prod), and phase it into prod
Anti-pattern 5: “Self-heal ON while teams still do manual hotfixes”
Symptom: engineers apply a quick fix, GitOps immediately reverts it
Why it’s bad: people start fighting the system
Fix: define an incident workflow: hotfix PRs, or temporary sync freeze policy
Anti-pattern 6: “GitOps used only for apps, not for platform add-ons”
Symptom: ingress controller installed manually; policies drift; add-ons inconsistent
Why it’s bad: clusters diverge and become snowflakes
Fix: manage platform add-ons via GitOps too (with separate ownership boundaries)
Anti-pattern 7: “Over-templating everything”
Symptom: charts and templates so abstract no one understands them
Why it’s bad: debug time increases, onboarding slows
Fix: keep templates simple; optimize for readability first
The “Day 2 operations” playbook (what nobody tells beginners)
GitOps isn’t “install tool and done.” You operate it.
1) How to do a safe rollback
Rollback in GitOps = revert commit
- Revert the PR that introduced the change
- GitOps reconciles back to known good state
This is the cleanest rollback model you can have.
2) How to handle emergencies without breaking GitOps
Pick one:
- Fast PR hotfix (best)
- Temporary sync pause (for extreme incidents)
- Manual change + follow-up PR (last resort)
But always end with: Git matches the final state.
3) How to upgrade GitOps safely
- Upgrade controllers like any other platform component
- Test in staging cluster first
- Keep an upgrade runbook
4) How to keep repo hygiene healthy
- enforce review
- keep diffs small
- avoid “mega PRs”
- add change notes in commit messages
A practical “first GitOps implementation” blueprint
If you want a strong, low-drama rollout, do this:
Week 1: Inform + structure
- pick repo strategy (start with app repo + env repo if possible)
- define labels/tags/ownership
- set up staging GitOps first
Week 2: Ship one service end-to-end
- onboard 1 service with Kustomize overlays or Helm values
- enable prune in staging
- do a promotion PR from staging → prod
- practice rollback by revert
Week 3: Scale patterns
- template the onboarding process (copyable folder structure)
- add policy guardrails
- decide drift policy for prod
Week 4: Multi-team readiness
- define ownership boundaries
- add app onboarding checklist
- document incident workflow
This is how GitOps becomes a habit, not a fight.
FAQ (the questions readers always ask)
“Is GitOps only for Kubernetes?”
It’s most popular with Kubernetes because reconciliation is natural there, but the model works anywhere you can declare desired state and reconcile continuously.
“Do I still need CI/CD?”
Yes. CI builds and tests artifacts. GitOps handles deploying desired state to clusters. They complement each other.
“What if I want approvals before production changes?”
That’s exactly what PR reviews and branch protections do well. GitOps strengthens approvals instead of bypassing them.
“How do I prevent people from changing the cluster manually?”
Use RBAC to restrict write access and rely on GitOps. But also create an incident process so people don’t feel trapped.
Final take: choose the tool, but commit to the model
Argo CD vs Flux is a real choice—but GitOps success comes more from:
- repo structure,
- promotion workflow,
- secret handling,
- drift strategy,
- ownership boundaries,
- and operational habits
Pick one, implement the patterns, avoid the anti-patterns, and GitOps will feel like a superpower.
1) Best GitOps tool choice by platform (EKS / AKS / GKE)
AWS EKS
Best default: Argo CD
Why: easiest adoption (UI), great multi-team ops, very common in EKS platforms.
Flux is also great if you want a more “Kubernetes-native controller toolkit” style and strong Git-based automation.
Azure AKS
Best default: Flux
Why: Flux fits cleanly into Kubernetes-native workflows and is commonly used in AKS setups.
Argo CD is equally valid if you want UI-first operations and app-centric visibility.
Google GKE
Best default: Argo CD or Flux (tie)
If your teams want UI + app health views → Argo CD
If your teams prefer CRDs + CLI + controller composition → Flux
2) Helm vs Kustomize: what to choose (practical rule)
Choose Helm when:
- You need reusable packaging across many teams/services
- You want a clean values-per-environment model
- You use many third-party charts (ingress, monitoring, cert-manager, etc.)
Choose Kustomize when:
- You want simple overlays and fewer moving parts
- Your manifests are mostly hand-written YAML
- You want patch-based env diffs that are very readable
My 2026 default:
- Helm for platform add-ons (ingress, cert-manager, monitoring, external-dns)
- Kustomize for application overlays (dev/stage/prod)
This hybrid is extremely common and works well.
3) Single-cluster vs Multi-cluster: the repo strategy you should use
Single-cluster (prod + staging in one cluster or single cluster total)
Best repo approach: “App repo + Env repo” (still)
Because you’ll grow into multi-cluster anyway.
Structure
env-repo/
clusters/
main/
platform/ # cluster add-ons
apps/ # app definitions pointing to app repos
policies/
shared/
Multi-cluster (recommended for serious orgs)
Best repo approach: “One folder per cluster” in env-repo
Structure
env-repo/
clusters/
eks-prod-ap-south-1/
eks-staging-ap-south-1/
aks-prod-eastus/
gke-prod-us-central1/
tenants/
team-a/
team-b/
platform/
ingress/
monitoring/
logging/
This is the cleanest way to avoid snowflake clusters.
4) Ready-to-use “best structure” for each combination
Below are the best patterns for each of the 12 combinations.
A) AWS EKS + Helm + Single-cluster
Best pattern: Argo CD ApplicationSets + Helm values per env
Use when: you want fast onboarding + UI visibility
Repo idea:
env-repo/
apps/
payments/
values-prod.yaml
values-staging.yaml
appsets/
apps.yaml # generates Applications for each app/env
Why it works: Helm is great for values overlays, Argo CD makes it very visible and easy.
B) AWS EKS + Helm + Multi-cluster
Best pattern: Argo CD ApplicationSets + “cluster generator” + Helm
Repo idea:
env-repo/
clusters/
eks-prod/
apps/
platform/
eks-staging/
apps/
platform/
appsets/
apps-by-cluster.yaml
Why it works: cluster-based separation + scalable app generation.
C) AWS EKS + Kustomize + Single-cluster
Best pattern: Argo CD Applications pointing to Kustomize overlays
Repo idea:
app-repo/
apps/payments/
base/
overlays/
staging/
prod/
env-repo/
argocd-apps/
payments-staging.yaml
payments-prod.yaml
Why it works: simple overlays, readable diffs, easy promotion via PR.
D) AWS EKS + Kustomize + Multi-cluster
Best pattern: “One folder per cluster” + Kustomize overlays + App-of-apps
Repo idea:
env-repo/
clusters/
eks-prod/
kustomization.yaml
apps/
eks-staging/
kustomization.yaml
apps/
Why it works: prevents cluster drift and makes bootstrap repeatable.
E) Azure AKS + Helm + Single-cluster
Best pattern: Flux HelmRelease + values per env
Repo idea:
env-repo/
clusters/
aks-main/
sources/
releases/
payments-helmrelease.yaml
values/
payments-prod.yaml
payments-staging.yaml
Why it works: Flux Helm controller feels natural and clean for AKS workflows.
F) Azure AKS + Helm + Multi-cluster
Best pattern: Flux + per-cluster folder + shared charts
Repo idea:
env-repo/
clusters/
aks-prod/
releases/
values/
aks-staging/
releases/
values/
shared/
charts/
Why it works: keeps per-cluster state isolated and repeatable.
G) Azure AKS + Kustomize + Single-cluster
Best pattern: Flux GitRepository + Kustomization per env
Repo idea:
env-repo/
clusters/
aks-main/
kustomizations/
payments-prod.yaml
payments-staging.yaml
apps/
payments/
overlays/prod
overlays/staging
Why it works: very Kubernetes-native, minimal moving parts.
H) Azure AKS + Kustomize + Multi-cluster
Best pattern: Flux + cluster folders + Kustomize overlays
Repo idea:
env-repo/
clusters/
aks-prod/
kustomizations/
aks-staging/
kustomizations/
apps/
payments/overlays/...
Why it works: scales cleanly and avoids cross-cluster coupling.
I) GKE + Helm + Single-cluster
Best pattern: Argo CD (UI-first) OR Flux (controller-first)
- If teams are many → Argo CD
- If platform is small, CLI-heavy → Flux
Repo idea is similar to EKS/Aks above based on tool choice.
J) GKE + Helm + Multi-cluster
Best pattern: Argo CD ApplicationSets if you want global visibility
or Flux per cluster if you want strict Kubernetes-native control.
Repo idea: one folder per cluster.
K) GKE + Kustomize + Single-cluster
Best pattern: Kustomize overlays + Argo CD Applications
Repo idea: app-repo overlays + env-repo app definitions.
L) GKE + Kustomize + Multi-cluster
Best pattern: cluster folders + App-of-apps (Argo CD)
or Flux per cluster Kustomization objects.
Repo idea: one folder per cluster.
5) My “best default architecture” (safe, scalable, team-friendly)
If you want the best long-term setup that works in almost every org:
Tool:
- Argo CD (best for adoption + visibility)
- (Optional later) use Flux-style automation for image updates, or implement a Git-based image update pipeline
Packaging:
- Helm for platform add-ons
- Kustomize for application overlays
Top-level repo layout (recommended)
- App repos: each service owns manifests/chart
- Env repo: platform owns what is deployed where
App repo
service-repo/
deploy/
base/
overlays/
staging/
prod/
Env repo
env-repo/
clusters/
prod/
platform/
apps/
staging/
platform/
apps/
policies/
shared/
This gives:
- clear ownership
- clean promotion PRs
- scalable multi-cluster
Perfect — EKS + Kubernetes + Multi-cluster is where GitOps shines the most. Below is a battle-tested blueprint you can copy: repo structure, patterns, sample manifests (Argo CD and Flux options), promotion flow, drift rules, and the anti-patterns to avoid.
The best GitOps blueprint for EKS multi-cluster (2026-ready)
What you’re building
A system where:
- Each EKS cluster is bootstrapped the same way (repeatable)
- Platform add-ons (ingress, cert-manager, monitoring) are managed via GitOps
- Apps are deployed consistently across clusters
- Promotion is a PR (staging → prod)
- Drift is detected and handled cleanly
- Teams have ownership boundaries
1) Recommended architecture (simple and scalable)
My strong default for EKS multi-cluster
- Argo CD as the GitOps engine (best visibility + easiest adoption)
- Kustomize overlays for apps (clean diffs, easy promotion)
- Helm for platform add-ons (most vendors ship Helm charts)
This “Helm for platform, Kustomize for apps” hybrid is common and very stable.
(If you want Flux instead, I’ll include a Flux-ready layout too.)
2) Repo strategy (this matters more than the tool)
Use 2 repos (best ownership model)
A) platform-env-repo (owned by platform team)
This repo defines:
- clusters
- platform add-ons
- what apps run where
B) app-repos (owned by each app team)
Each service repo contains its deploy manifests (base + overlays).
This keeps teams independent, and promotion becomes controlled and auditable.
3) Folder structure you can copy (multi-cluster)
Platform repo: platform-env-repo
platform-env-repo/
clusters/
eks-staging/
bootstrap/
platform/
apps/
eks-prod/
bootstrap/
platform/
apps/
platform/
ingress/
cert-manager/
external-dns/
metrics/
logging/
monitoring/
security/
namespaces/
tenants/
team-payments/
team-catalog/
policies/
required-labels/
resource-limits/
allowed-registries/
shared/
kustomize-bases/
helm-values/
What goes where (very practical)
clusters/eks-staging/platform/→ cluster add-ons for stagingclusters/eks-staging/apps/→ “what apps run in staging”platform/→ reusable platform modulespolicies/→ guardrails (admission rules, baseline limits)tenants/→ team namespaces, quotas, RBAC
App repo: payments-service
payments-service/
deploy/
base/
deployment.yaml
service.yaml
hpa.yaml
kustomization.yaml
overlays/
staging/
kustomization.yaml
patch.yaml
prod/
kustomization.yaml
patch.yaml
4) The promotion model (how staging → prod works)
Promotion PR = change only one small thing
- image tag (recommended)
- or Helm chart version
- or config value
Example promotion workflow
- App team merges code → builds image
payments:1.8.2 - A PR updates staging overlay to
1.8.2 - GitOps syncs staging
- After validation, a PR updates prod overlay to
1.8.2 - GitOps syncs prod
- Rollback = revert prod PR
This keeps history clean and rollback instant.
5) Argo CD setup for EKS multi-cluster (recommended)
Pattern: “App of Apps” per cluster (cleanest)
Each cluster has one “root” Argo app that points to cluster folder.
clusters/eks-staging/bootstrap/root-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: eks-staging-root
namespace: argocd
spec:
project: default
source:
repoURL: REPO_PLACEHOLDER
targetRevision: main
path: clusters/eks-staging
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
clusters/eks-staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- platform/kustomization.yaml
- apps/kustomization.yaml
clusters/eks-staging/platform/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../platform/namespaces
- ../../platform/ingress
- ../../platform/cert-manager
- ../../platform/monitoring
- ../../platform/security
clusters/eks-staging/apps/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- payments-app.yaml
- catalog-app.yaml
App definitions (apps can live in their own repos)
clusters/eks-staging/apps/payments-app.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: payments-staging
namespace: argocd
spec:
project: default
source:
repoURL: APP_REPO_PLACEHOLDER
targetRevision: main
path: deploy/overlays/staging
destination:
server: https://kubernetes.default.svc
namespace: payments
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
For prod, same app but
path: deploy/overlays/prodandnamespace: payments.
6) Drift handling rules (multi-cluster best practice)
Staging cluster
selfHeal: trueprune: true- Fast iteration, keep it clean
Prod cluster
selfHeal: true(recommended)prune: true(recommended)- But enforce an incident workflow for emergencies:
- If emergency manual change occurs, you must follow with a PR that makes Git match reality.
Rule to publish internally:
“Prod is Git-driven. If you patch prod manually, you owe the repo a PR immediately.”
7) Secrets (must-do for EKS GitOps)
Do not store plaintext secrets in Git.
Safe patterns:
- Store secrets in a cloud secret manager and sync to Kubernetes
- Or store encrypted secrets in Git
- Or use sealed/encrypted secret custom resources
Your GitOps repo should never contain raw passwords/tokens.
8) Multi-cluster patterns that work extremely well
Pattern A: One folder per cluster (most maintainable)
You already saw this. It prevents “snowflake clusters”.
Pattern B: “Platform first, apps second”
Bootstrap order:
- namespaces + RBAC + policies
- ingress/cert/observability
- apps
This reduces weird failures.
Pattern C: Shared platform modules
Put common add-ons under /platform and reference from cluster folders.
- one change updates all clusters (via PR)
- clusters remain consistent
Pattern D: App onboarding template
Every app team must ship:
deploy/basedeploy/overlays/stagingdeploy/overlays/prod- resource requests/limits
- health checks
- HPA (if applicable)
This eliminates “special snowflake apps”.
9) Anti-patterns to avoid (EKS multi-cluster edition)
- CI directly kubectl-applies to prod (breaks Git as source of truth)
- No pruning (zombie resources stay forever, cost + risk)
- Secrets in Git (even once = permanent exposure)
- One giant repo with no ownership (chaos and accidental prod changes)
- Manual changes in prod without follow-up PR (drift becomes normal)
- Different add-ons per cluster “because reasons” (clusters diverge over time)
10) If you prefer Flux instead of Argo CD (quick mapping)
Flux multi-cluster is typically:
- one folder per cluster
- a
GitRepositorysource per cluster Kustomizationobjects per cluster/platform/apps
Structure stays almost identical; only the “app definition objects” change.