Mohammad Gufran Jahangir February 15, 2026 0

GitOps is one of those ideas that sounds like buzzwords… until you run it for 30 days and suddenly you can’t imagine operating Kubernetes without it.

Because GitOps gives you something engineers love:

  • A single source of truth
  • A deterministic deploy process
  • Built-in rollback
  • Drift detection
  • Auditability
  • Repeatability across environments

And it does it with a loop so simple it feels obvious in hindsight:

Git declares what should be running. A controller makes the cluster match Git. Always.

Let’s break it down in a way that’s beginner-friendly, but also practical enough that you can implement it this week.


Table of Contents

What GitOps actually means (without hype)

GitOps is an operating model where:

  1. The desired state of your system is stored in Git (declarative configs)
  2. Changes happen via pull requests (reviewable, auditable)
  3. The cluster is continuously reconciled by agents/controllers that pull from Git and apply changes
  4. Any drift (manual changes) is detected and corrected (or at least flagged)

The GitOps “control loop”

Think of it like Kubernetes itself:

  • Kubernetes reconciles pods to match Deployments
  • GitOps reconciles clusters to match Git

Desired state (Git)Reconciler (Argo CD / Flux)Actual state (Cluster)
…and if Actual ≠ Desired, it corrects.


Why GitOps is worth your time (especially for engineers)

Without GitOps, deployments often become:

  • “CI pushes to cluster”
  • “Someone applied kubectl manually”
  • “Helm install is different per person”
  • “Prod differs slightly from staging”
  • “Nobody knows why it changed”

With GitOps, deployments become:

  • “If it’s not in Git, it doesn’t exist”
  • “Every change has a PR, reviewer, and history”
  • “Clusters converge automatically”
  • “Rollback = revert commit”
  • “Drift becomes visible”

If you like the idea of treating production like code (not like a pet), GitOps fits perfectly.


The two main players: Argo CD and Flux (what they are)

Both Argo CD and Flux implement GitOps for Kubernetes. They are pull-based: they watch Git and reconcile the cluster.

But they feel different in how they operate:

  • Argo CD is often described as “GitOps with a strong UI and application-centric model.”
  • Flux is often described as “GitOps as a set of composable controllers (Kubernetes-native first).”

Let’s compare them the way engineers actually choose tools.


Argo CD vs Flux: the practical comparison

1) Mental model

Argo CD

You declare Applications (or generate them via ApplicationSets).
Argo CD is “app-first”: what app, what repo path, what cluster, what sync policy.

Flux

You declare sources and reconciliation objects like GitRepository, Kustomization, HelmRelease.
Flux is “controller-first”: source → reconcile → apply.


2) Developer experience (day-to-day)

Argo CD

  • Strong UI: see apps, health, sync status, diffs
  • Easy for teams to “see what’s going on”
  • Great for multi-team visibility and operations

Flux

  • More Kubernetes-native: you interact mostly via YAML + kubectl
  • Less UI-oriented (you rely on CRD status, events, dashboards)
  • Great if your org prefers Git + CLI workflows over UI

3) Installation and footprint

Argo CD

  • One main product with a clear structure
  • Generally quick to get a first app synced

Flux

  • A toolkit of controllers (source-controller, kustomize-controller, helm-controller, notification-controller, etc.)
  • Very modular and clean once you learn the pieces

4) Multi-tenancy and access control

Argo CD

  • Strong RBAC model for app access
  • UI makes it easy to manage who sees what
  • Often chosen by platform teams supporting many app teams

Flux

  • Multi-tenancy is usually enforced via Kubernetes RBAC + namespaces + repo layout
  • Very solid, but feels more “build your structure” than “use built-in UI RBAC”

5) Handling many apps / many clusters

Both scale well, but patterns differ:

  • Argo CD: ApplicationSets can generate hundreds/thousands of Applications cleanly.
  • Flux: You structure reconciliation objects per cluster/namespace/path.

6) Progressive delivery and rollout patterns

  • Argo ecosystems often pair well with rollout tooling and app health views.
  • Flux ecosystems often pair well with controller-based rollout automation patterns.

(You can do progressive delivery with either—what changes is how you wire it.)


7) Image update automation

  • Flux has a strong “image automation” story via dedicated controllers that can update manifests by committing back to Git.
  • Argo can do this too, usually via external automation components or pipeline steps that update Git (still GitOps, as long as Git remains the source of truth).

Quick decision guide (real-world)

Pick Argo CD if:

  • You want a strong UI for app teams and ops
  • You want “applications” as the primary abstraction
  • You want faster visibility for drift, diff, health, sync

Pick Flux if:

  • You want a Kubernetes-native, modular controller approach
  • Your org is comfortable with CRDs + YAML-driven ops
  • You want first-class Git-based automation for image updates and composability

Many organizations standardize on one. Some run both (platform choice depends on team needs). If you’re starting fresh and want faster adoption, Argo CD often feels easier for beginners because of the UI. If you want a clean controller toolkit and love Kubernetes primitives, Flux feels elegant.


The GitOps “golden rules” (the parts you must get right)

No matter which tool you choose, GitOps works when you follow these rules:

Rule 1: Git is the source of truth

If someone changes the cluster manually, that’s drift.
Either:

  • GitOps corrects it (self-heal), or
  • GitOps flags it and you fix via PR

Rule 2: Reconciliation is continuous

Not “deploy once.”
It’s “always converge.”

Rule 3: Changes flow through PRs

You want:

  • review
  • audit trail
  • rollback by revert
  • fewer “who changed prod?” mysteries

Rule 4: Separate “what” from “how”

  • “What”: desired state (manifests/helm/kustomize)
  • “How”: GitOps controller reconciliation and policies

This separation keeps the system maintainable.


Step-by-step GitOps setup (works for Argo CD or Flux)

Below is a practical blueprint you can implement in a real platform team.

Step 1: Decide your repo strategy (this matters more than the tool)

There are three common approaches:

A) Monorepo (apps + environments together)

Structure

repo/
  apps/
    payments/
    catalog/
  clusters/
    prod/
    staging/

Pros: simple, one place to look
Cons: permissions and ownership can get messy at scale

B) App repo + environment repo (most scalable)

  • Each service owns its app manifests in its repo
  • Platform owns an “environments” repo that composes versions

Pros: clean ownership, good for large orgs
Cons: needs good release process for promotion

C) Repo per cluster (infrastructure-first orgs)

Pros: cluster config isolated
Cons: cross-cluster consistency becomes harder

If you’re unsure, choose B (app repo + environment repo). It scales best.


Step 2: Choose a deployment packaging style (Kustomize or Helm)

  • Kustomize is great for overlays (dev/stage/prod)
  • Helm is great for reusable charts with values per env

You can mix them, but start with one for simplicity.


Step 3: Create environment overlays (dev/stage/prod)

Example with Kustomize overlays:

apps/payments/
  base/
    deployment.yaml
    service.yaml
    kustomization.yaml
  overlays/
    staging/
      kustomization.yaml
      patch.yaml
    prod/
      kustomization.yaml
      patch.yaml

Base contains common config.
Overlays change replicas, resources, env vars, ingress, etc.


Step 4: Add a promotion workflow

A clean GitOps promotion is simple:

  • Merge to main updates staging
  • A PR from staging → prod updates prod

Promotion becomes:

  • a PR with a diff
  • a review
  • a merge
  • an automatic reconciliation

No special “release day ritual.”


Step 5: Decide your drift policy

Three common policies:

  • Self-heal ON (cluster always forced back to Git)
    Best for: mature teams, strict compliance, “no manual changes” culture
  • Self-heal OFF but alert
    Best for: adoption phase, when teams still occasionally hotfix manually
  • Hybrid: self-heal for some namespaces, alert-only for others
    Best for: gradual rollout

Step 6: Handle secrets properly (do NOT wing this)

Beginner mistake: putting secrets directly in Git. Don’t.

Good patterns:

  • Encrypt secrets in Git (git-stored but encrypted)
  • Or store secrets in a secret manager and sync them into Kubernetes
  • Or use sealed/encrypted secret resources

Core idea: Git can contain secrets only if they’re encrypted and safe-by-design.


Real examples: Argo CD vs Flux YAML

You said “packed with real examples,” so let’s do it.

Example 1: Argo CD “Application” (deploy one app)

This is how Argo CD typically models a deployment:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-staging
  namespace: argocd
spec:
  project: default
  source:
    repoURL: REPO_PLACEHOLDER
    targetRevision: main
    path: apps/payments/overlays/staging
  destination:
    server: https://kubernetes.default.svc
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - CreateNamespace=true

What this means

  • “Argo CD, keep payments-staging synced to this Git path”
  • prune: true removes resources deleted from Git
  • selfHeal: true fixes drift automatically

Example 2: Flux “GitRepository + Kustomization” (deploy one app)

This is how Flux often models the same thing using Kubernetes-native objects:

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: platform-repo
  namespace: flux-system
spec:
  interval: 1m
  url: REPO_PLACEHOLDER
  ref:
    branch: main
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: payments-staging
  namespace: flux-system
spec:
  interval: 5m
  sourceRef:
    kind: GitRepository
    name: platform-repo
  path: ./apps/payments/overlays/staging
  prune: true
  targetNamespace: payments
  wait: true
  timeout: 2m

What this means

  • “Flux, pull this repo every minute”
  • “Apply that path every 5 minutes into this namespace”
  • “Prune removed objects”
  • “Wait for readiness”

What you should notice

  • Argo CD: a single “Application” object is the main thing you manage.
  • Flux: you manage “where the source is” + “how to reconcile it.”

Both are GitOps. Both are valid. The best choice depends on team preferences and operating style.


GitOps patterns that work (and why they work)

Now the fun part: the patterns that make GitOps feel powerful instead of painful.

Pattern 1: “App of Apps” (platform-friendly)

Instead of creating 100 app definitions manually, you define one “root” that points to everything.

Why it’s great

  • onboarding a new app = add a folder + commit
  • less manual wiring
  • consistent structure

Best for

  • platform teams managing many apps

Pattern 2: “One folder per cluster” (clean multi-cluster)

Structure example:

clusters/
  prod-cluster-1/
    kustomization.yaml
    apps.yaml
  staging-cluster-1/
    kustomization.yaml
    apps.yaml

Why it’s great

  • you can reason about a cluster’s desired state in one place
  • disaster recovery becomes straightforward

Pattern 3: “Environment overlays” (dev/stage/prod without chaos)

This is how you avoid copy-paste manifests:

  • base/ = shared config
  • overlays/staging/ = staging-specific patches
  • overlays/prod/ = prod-specific patches

Why it’s great

  • consistent changes across environments
  • minimal diffs for promotion

Pattern 4: PR-based promotion (the safest release pipeline)

Promotion becomes a PR that changes only what’s necessary (often just image tag or chart version).

Why it’s great

  • the diff tells the story
  • the review catches mistakes
  • rollback is easy (revert PR)

Pattern 5: “Cost + security guardrails via policy”

GitOps makes it easy to standardize:

  • resource limits
  • approved registries
  • required labels
  • network policies
  • ingress rules

You can enforce these with admission policies so “bad config never lands.”


Pattern 6: Drift strategy by environment

A practical approach:

  • Prod: self-heal ON, prune ON (strict)
  • Staging: self-heal ON, prune ON (strict)
  • Dev: self-heal OFF but alert (flexible)

This avoids early adoption pain while still keeping prod clean.


Pattern 7: “Bootstrap” a cluster the same way every time

A cluster is “GitOps-ready” when:

  • GitOps tool installed
  • policies installed
  • base namespaces and platform add-ons installed
  • app sync begins

If you can bootstrap consistently, scaling becomes easy.


GitOps anti-patterns (the traps that make teams quit)

This is the section that saves you months.

Anti-pattern 1: “CI pushes directly to the cluster”

Symptom: pipeline runs kubectl apply or helm upgrade into prod
Why it’s bad: Git is no longer the source of truth; auditing and rollback get messy
Fix: CI should update Git (create a PR / commit), and GitOps reconciles


Anti-pattern 2: “Secrets in plain text in Git”

Symptom: Secret YAML committed with real passwords/tokens
Why it’s bad: permanent exposure risk (even if you delete later)
Fix: encrypted secrets or external secret systems


Anti-pattern 3: “One giant repo with no ownership boundaries”

Symptom: everyone commits everywhere, reviews become meaningless
Why it’s bad: permission chaos, blame culture, accidental changes
Fix: define ownership by folders + code owners + protected branches, or use separate repos


Anti-pattern 4: “No pruning, ever”

Symptom: you remove resources from Git, but they stay running in cluster
Why it’s bad: resource sprawl, zombie workloads, hidden cost and risk
Fix: enable prune where safe (especially in non-prod), and phase it into prod


Anti-pattern 5: “Self-heal ON while teams still do manual hotfixes”

Symptom: engineers apply a quick fix, GitOps immediately reverts it
Why it’s bad: people start fighting the system
Fix: define an incident workflow: hotfix PRs, or temporary sync freeze policy


Anti-pattern 6: “GitOps used only for apps, not for platform add-ons”

Symptom: ingress controller installed manually; policies drift; add-ons inconsistent
Why it’s bad: clusters diverge and become snowflakes
Fix: manage platform add-ons via GitOps too (with separate ownership boundaries)


Anti-pattern 7: “Over-templating everything”

Symptom: charts and templates so abstract no one understands them
Why it’s bad: debug time increases, onboarding slows
Fix: keep templates simple; optimize for readability first


The “Day 2 operations” playbook (what nobody tells beginners)

GitOps isn’t “install tool and done.” You operate it.

1) How to do a safe rollback

Rollback in GitOps = revert commit

  • Revert the PR that introduced the change
  • GitOps reconciles back to known good state

This is the cleanest rollback model you can have.

2) How to handle emergencies without breaking GitOps

Pick one:

  • Fast PR hotfix (best)
  • Temporary sync pause (for extreme incidents)
  • Manual change + follow-up PR (last resort)

But always end with: Git matches the final state.

3) How to upgrade GitOps safely

  • Upgrade controllers like any other platform component
  • Test in staging cluster first
  • Keep an upgrade runbook

4) How to keep repo hygiene healthy

  • enforce review
  • keep diffs small
  • avoid “mega PRs”
  • add change notes in commit messages

A practical “first GitOps implementation” blueprint

If you want a strong, low-drama rollout, do this:

Week 1: Inform + structure

  • pick repo strategy (start with app repo + env repo if possible)
  • define labels/tags/ownership
  • set up staging GitOps first

Week 2: Ship one service end-to-end

  • onboard 1 service with Kustomize overlays or Helm values
  • enable prune in staging
  • do a promotion PR from staging → prod
  • practice rollback by revert

Week 3: Scale patterns

  • template the onboarding process (copyable folder structure)
  • add policy guardrails
  • decide drift policy for prod

Week 4: Multi-team readiness

  • define ownership boundaries
  • add app onboarding checklist
  • document incident workflow

This is how GitOps becomes a habit, not a fight.


FAQ (the questions readers always ask)

“Is GitOps only for Kubernetes?”

It’s most popular with Kubernetes because reconciliation is natural there, but the model works anywhere you can declare desired state and reconcile continuously.

“Do I still need CI/CD?”

Yes. CI builds and tests artifacts. GitOps handles deploying desired state to clusters. They complement each other.

“What if I want approvals before production changes?”

That’s exactly what PR reviews and branch protections do well. GitOps strengthens approvals instead of bypassing them.

“How do I prevent people from changing the cluster manually?”

Use RBAC to restrict write access and rely on GitOps. But also create an incident process so people don’t feel trapped.


Final take: choose the tool, but commit to the model

Argo CD vs Flux is a real choice—but GitOps success comes more from:

  • repo structure,
  • promotion workflow,
  • secret handling,
  • drift strategy,
  • ownership boundaries,
  • and operational habits

Pick one, implement the patterns, avoid the anti-patterns, and GitOps will feel like a superpower.


1) Best GitOps tool choice by platform (EKS / AKS / GKE)

AWS EKS

Best default: Argo CD
Why: easiest adoption (UI), great multi-team ops, very common in EKS platforms.

Flux is also great if you want a more “Kubernetes-native controller toolkit” style and strong Git-based automation.

Azure AKS

Best default: Flux
Why: Flux fits cleanly into Kubernetes-native workflows and is commonly used in AKS setups.

Argo CD is equally valid if you want UI-first operations and app-centric visibility.

Google GKE

Best default: Argo CD or Flux (tie)
If your teams want UI + app health views → Argo CD
If your teams prefer CRDs + CLI + controller composition → Flux


2) Helm vs Kustomize: what to choose (practical rule)

Choose Helm when:

  • You need reusable packaging across many teams/services
  • You want a clean values-per-environment model
  • You use many third-party charts (ingress, monitoring, cert-manager, etc.)

Choose Kustomize when:

  • You want simple overlays and fewer moving parts
  • Your manifests are mostly hand-written YAML
  • You want patch-based env diffs that are very readable

My 2026 default:

  • Helm for platform add-ons (ingress, cert-manager, monitoring, external-dns)
  • Kustomize for application overlays (dev/stage/prod)
    This hybrid is extremely common and works well.

3) Single-cluster vs Multi-cluster: the repo strategy you should use

Single-cluster (prod + staging in one cluster or single cluster total)

Best repo approach: “App repo + Env repo” (still)
Because you’ll grow into multi-cluster anyway.

Structure

env-repo/
  clusters/
    main/
      platform/        # cluster add-ons
      apps/            # app definitions pointing to app repos
  policies/
  shared/

Multi-cluster (recommended for serious orgs)

Best repo approach: “One folder per cluster” in env-repo
Structure

env-repo/
  clusters/
    eks-prod-ap-south-1/
    eks-staging-ap-south-1/
    aks-prod-eastus/
    gke-prod-us-central1/
  tenants/
    team-a/
    team-b/
  platform/
    ingress/
    monitoring/
    logging/

This is the cleanest way to avoid snowflake clusters.


4) Ready-to-use “best structure” for each combination

Below are the best patterns for each of the 12 combinations.


A) AWS EKS + Helm + Single-cluster

Best pattern: Argo CD ApplicationSets + Helm values per env

Use when: you want fast onboarding + UI visibility
Repo idea:

env-repo/
  apps/
    payments/
      values-prod.yaml
      values-staging.yaml
  appsets/
    apps.yaml   # generates Applications for each app/env

Why it works: Helm is great for values overlays, Argo CD makes it very visible and easy.


B) AWS EKS + Helm + Multi-cluster

Best pattern: Argo CD ApplicationSets + “cluster generator” + Helm

Repo idea:

env-repo/
  clusters/
    eks-prod/
      apps/
      platform/
    eks-staging/
      apps/
      platform/
  appsets/
    apps-by-cluster.yaml

Why it works: cluster-based separation + scalable app generation.


C) AWS EKS + Kustomize + Single-cluster

Best pattern: Argo CD Applications pointing to Kustomize overlays

Repo idea:

app-repo/
  apps/payments/
    base/
    overlays/
      staging/
      prod/
env-repo/
  argocd-apps/
    payments-staging.yaml
    payments-prod.yaml

Why it works: simple overlays, readable diffs, easy promotion via PR.


D) AWS EKS + Kustomize + Multi-cluster

Best pattern: “One folder per cluster” + Kustomize overlays + App-of-apps

Repo idea:

env-repo/
  clusters/
    eks-prod/
      kustomization.yaml
      apps/
    eks-staging/
      kustomization.yaml
      apps/

Why it works: prevents cluster drift and makes bootstrap repeatable.


E) Azure AKS + Helm + Single-cluster

Best pattern: Flux HelmRelease + values per env

Repo idea:

env-repo/
  clusters/
    aks-main/
      sources/
      releases/
        payments-helmrelease.yaml
      values/
        payments-prod.yaml
        payments-staging.yaml

Why it works: Flux Helm controller feels natural and clean for AKS workflows.


F) Azure AKS + Helm + Multi-cluster

Best pattern: Flux + per-cluster folder + shared charts

Repo idea:

env-repo/
  clusters/
    aks-prod/
      releases/
      values/
    aks-staging/
      releases/
      values/
  shared/
    charts/

Why it works: keeps per-cluster state isolated and repeatable.


G) Azure AKS + Kustomize + Single-cluster

Best pattern: Flux GitRepository + Kustomization per env

Repo idea:

env-repo/
  clusters/
    aks-main/
      kustomizations/
        payments-prod.yaml
        payments-staging.yaml
  apps/
    payments/
      overlays/prod
      overlays/staging

Why it works: very Kubernetes-native, minimal moving parts.


H) Azure AKS + Kustomize + Multi-cluster

Best pattern: Flux + cluster folders + Kustomize overlays

Repo idea:

env-repo/
  clusters/
    aks-prod/
      kustomizations/
    aks-staging/
      kustomizations/
  apps/
    payments/overlays/...

Why it works: scales cleanly and avoids cross-cluster coupling.


I) GKE + Helm + Single-cluster

Best pattern: Argo CD (UI-first) OR Flux (controller-first)

  • If teams are many → Argo CD
  • If platform is small, CLI-heavy → Flux

Repo idea is similar to EKS/Aks above based on tool choice.


J) GKE + Helm + Multi-cluster

Best pattern: Argo CD ApplicationSets if you want global visibility
or Flux per cluster if you want strict Kubernetes-native control.

Repo idea: one folder per cluster.


K) GKE + Kustomize + Single-cluster

Best pattern: Kustomize overlays + Argo CD Applications

Repo idea: app-repo overlays + env-repo app definitions.


L) GKE + Kustomize + Multi-cluster

Best pattern: cluster folders + App-of-apps (Argo CD)
or Flux per cluster Kustomization objects.

Repo idea: one folder per cluster.


5) My “best default architecture” (safe, scalable, team-friendly)

If you want the best long-term setup that works in almost every org:

Tool:

  • Argo CD (best for adoption + visibility)
  • (Optional later) use Flux-style automation for image updates, or implement a Git-based image update pipeline

Packaging:

  • Helm for platform add-ons
  • Kustomize for application overlays

Top-level repo layout (recommended)

  • App repos: each service owns manifests/chart
  • Env repo: platform owns what is deployed where

App repo

service-repo/
  deploy/
    base/
    overlays/
      staging/
      prod/

Env repo

env-repo/
  clusters/
    prod/
      platform/
      apps/
    staging/
      platform/
      apps/
  policies/
  shared/

This gives:

  • clear ownership
  • clean promotion PRs
  • scalable multi-cluster

Perfect — EKS + Kubernetes + Multi-cluster is where GitOps shines the most. Below is a battle-tested blueprint you can copy: repo structure, patterns, sample manifests (Argo CD and Flux options), promotion flow, drift rules, and the anti-patterns to avoid.


The best GitOps blueprint for EKS multi-cluster (2026-ready)

What you’re building

A system where:

  • Each EKS cluster is bootstrapped the same way (repeatable)
  • Platform add-ons (ingress, cert-manager, monitoring) are managed via GitOps
  • Apps are deployed consistently across clusters
  • Promotion is a PR (staging → prod)
  • Drift is detected and handled cleanly
  • Teams have ownership boundaries

1) Recommended architecture (simple and scalable)

My strong default for EKS multi-cluster

  • Argo CD as the GitOps engine (best visibility + easiest adoption)
  • Kustomize overlays for apps (clean diffs, easy promotion)
  • Helm for platform add-ons (most vendors ship Helm charts)

This “Helm for platform, Kustomize for apps” hybrid is common and very stable.

(If you want Flux instead, I’ll include a Flux-ready layout too.)


2) Repo strategy (this matters more than the tool)

Use 2 repos (best ownership model)

A) platform-env-repo (owned by platform team)

This repo defines:

  • clusters
  • platform add-ons
  • what apps run where

B) app-repos (owned by each app team)

Each service repo contains its deploy manifests (base + overlays).

This keeps teams independent, and promotion becomes controlled and auditable.


3) Folder structure you can copy (multi-cluster)

Platform repo: platform-env-repo

platform-env-repo/
  clusters/
    eks-staging/
      bootstrap/
      platform/
      apps/
    eks-prod/
      bootstrap/
      platform/
      apps/

  platform/
    ingress/
    cert-manager/
    external-dns/
    metrics/
    logging/
    monitoring/
    security/
    namespaces/

  tenants/
    team-payments/
    team-catalog/

  policies/
    required-labels/
    resource-limits/
    allowed-registries/

  shared/
    kustomize-bases/
    helm-values/

What goes where (very practical)

  • clusters/eks-staging/platform/ → cluster add-ons for staging
  • clusters/eks-staging/apps/ → “what apps run in staging”
  • platform/ → reusable platform modules
  • policies/ → guardrails (admission rules, baseline limits)
  • tenants/ → team namespaces, quotas, RBAC

App repo: payments-service

payments-service/
  deploy/
    base/
      deployment.yaml
      service.yaml
      hpa.yaml
      kustomization.yaml
    overlays/
      staging/
        kustomization.yaml
        patch.yaml
      prod/
        kustomization.yaml
        patch.yaml

4) The promotion model (how staging → prod works)

Promotion PR = change only one small thing

  • image tag (recommended)
  • or Helm chart version
  • or config value

Example promotion workflow

  1. App team merges code → builds image payments:1.8.2
  2. A PR updates staging overlay to 1.8.2
  3. GitOps syncs staging
  4. After validation, a PR updates prod overlay to 1.8.2
  5. GitOps syncs prod
  6. Rollback = revert prod PR

This keeps history clean and rollback instant.


5) Argo CD setup for EKS multi-cluster (recommended)

Pattern: “App of Apps” per cluster (cleanest)

Each cluster has one “root” Argo app that points to cluster folder.

clusters/eks-staging/bootstrap/root-app.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: eks-staging-root
  namespace: argocd
spec:
  project: default
  source:
    repoURL: REPO_PLACEHOLDER
    targetRevision: main
    path: clusters/eks-staging
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

clusters/eks-staging/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - platform/kustomization.yaml
  - apps/kustomization.yaml

clusters/eks-staging/platform/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../platform/namespaces
  - ../../platform/ingress
  - ../../platform/cert-manager
  - ../../platform/monitoring
  - ../../platform/security

clusters/eks-staging/apps/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - payments-app.yaml
  - catalog-app.yaml

App definitions (apps can live in their own repos)

clusters/eks-staging/apps/payments-app.yaml

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-staging
  namespace: argocd
spec:
  project: default
  source:
    repoURL: APP_REPO_PLACEHOLDER
    targetRevision: main
    path: deploy/overlays/staging
  destination:
    server: https://kubernetes.default.svc
    namespace: payments
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

For prod, same app but path: deploy/overlays/prod and namespace: payments.


6) Drift handling rules (multi-cluster best practice)

Staging cluster

  • selfHeal: true
  • prune: true
  • Fast iteration, keep it clean

Prod cluster

  • selfHeal: true (recommended)
  • prune: true (recommended)
  • But enforce an incident workflow for emergencies:
    • If emergency manual change occurs, you must follow with a PR that makes Git match reality.

Rule to publish internally:

“Prod is Git-driven. If you patch prod manually, you owe the repo a PR immediately.”


7) Secrets (must-do for EKS GitOps)

Do not store plaintext secrets in Git.

Safe patterns:

  • Store secrets in a cloud secret manager and sync to Kubernetes
  • Or store encrypted secrets in Git
  • Or use sealed/encrypted secret custom resources

Your GitOps repo should never contain raw passwords/tokens.


8) Multi-cluster patterns that work extremely well

Pattern A: One folder per cluster (most maintainable)

You already saw this. It prevents “snowflake clusters”.

Pattern B: “Platform first, apps second”

Bootstrap order:

  1. namespaces + RBAC + policies
  2. ingress/cert/observability
  3. apps

This reduces weird failures.

Pattern C: Shared platform modules

Put common add-ons under /platform and reference from cluster folders.

  • one change updates all clusters (via PR)
  • clusters remain consistent

Pattern D: App onboarding template

Every app team must ship:

  • deploy/base
  • deploy/overlays/staging
  • deploy/overlays/prod
  • resource requests/limits
  • health checks
  • HPA (if applicable)

This eliminates “special snowflake apps”.


9) Anti-patterns to avoid (EKS multi-cluster edition)

  1. CI directly kubectl-applies to prod (breaks Git as source of truth)
  2. No pruning (zombie resources stay forever, cost + risk)
  3. Secrets in Git (even once = permanent exposure)
  4. One giant repo with no ownership (chaos and accidental prod changes)
  5. Manual changes in prod without follow-up PR (drift becomes normal)
  6. Different add-ons per cluster “because reasons” (clusters diverge over time)

10) If you prefer Flux instead of Argo CD (quick mapping)

Flux multi-cluster is typically:

  • one folder per cluster
  • a GitRepository source per cluster
  • Kustomization objects per cluster/platform/apps

Structure stays almost identical; only the “app definition objects” change.


Category: 
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments