Mohammad Gufran Jahangir January 24, 2026 0

If you’ve ever had one “shared” cloud account/project that slowly turned into a jungle—random resources, unclear ownership, surprise bills, and “who created this?” mysteries—then you already understand why governance matters.

The trick is: governance shouldn’t feel like bureaucracy.
Great governance feels like power steering: it makes it easier to move fast safely.

This guide will show you how to design multi-account / multi-project guardrails that scale across AWS/Azure/GCP—using a simple model you can implement even if you’re starting from scratch.

Table of Contents

What “multi-account / multi-project governance” really means

You’re splitting your cloud into many “containers”:

AWS: multiple accounts inside an organization
Azure: multiple subscriptions under management groups
GCP: multiple projects under folders (inside an org)

Why? Because isolation is a superpower:

Blast-radius control: a mistake in one account/project doesn’t kill everything.
Cleaner security boundaries: permissions are easier to reason about.
Better billing & ownership: cost is assigned correctly.
Faster teams: teams can self-serve within safe boundaries.

But isolation creates a new problem:

“How do we keep hundreds of accounts/projects consistent, secure, and cost-controlled—without manually policing them?”

Answer: guardrails.

Guardrails: the 3 types (memorize this)

Every scalable governance program uses these three guardrail types:

1) Prevent (stop bad things from happening)

Examples:

Block public storage buckets by default
Block creating resources in unapproved regions
Require encryption at rest
Require tags/labels for ownership

2) Detect (spot issues fast)

Examples:

Central logging
Security findings aggregator
Drift detection for IaC
Budget and anomaly alerts

3) Correct (auto-fix or fast-fix)

Examples:

Auto-remediate public exposure
Auto-attach required policies
Auto-delete unattached volumes after X days (non-prod)
Auto-open a ticket with owner + evidence

Scaling secret: You don’t need 200 rules.
You need 20–30 high-impact guardrails applied consistently, automatically.

The outcome you’re building (the “landing zone” in plain English)

A scalable setup has a reliable “platform skeleton”:

A hierarchy (org → folders/management groups → accounts/projects)
A standard way to create accounts/projects (a vending machine)
A baseline security configuration applied automatically
Central logging, security visibility, and cost visibility
A guardrail catalog (what’s enforced, why, and how to request exceptions)

Let’s build it step-by-step.

Step-by-step blueprint: guardrails that scale

Step 1 — Decide your account/project strategy (simple patterns)

The 3 most common patterns

Pattern A: By environment

Prod account/project
Stage account/project
Dev account/project
Good when you want strong isolation between environments.

Pattern B: By team or product

Payments account/project
Search account/project
Data platform account/project
Good when teams are autonomous and you want clear ownership.

Pattern C: Hybrid (best for most orgs)

A few “shared platform” accounts/projects (logging, networking, CI, security)
Many product/team accounts/projects
Environments separated inside each product/team boundary, or split into prod/non-prod
This scales best in real life.

Real example: a sane starting layout

Shared services: Identity, logging, security tooling
Networking: Hub connectivity, shared DNS, shared egress
Workloads: One account/project per product/team (prod + non-prod separated)

Step 2 — Create a “home for everything” hierarchy

This is non-negotiable. Without hierarchy, governance becomes manual.

Example hierarchy (cloud-agnostic)

Org Root
- Platform
  - Logging
  - Security
  - Networking
- Workloads
  - Team A
  - Team B
- Sandbox
  - Personal sandboxes
- Quarantine
  - Suspicious or non-compliant accounts/projects moved here automatically

Why this works:

Platform is locked down tightly
Workloads have freedom within limits
Sandbox exists but is controlled
Quarantine gives you an “oops, contain it” button

Step 3 — Build the “Account/Project Vending Machine” (AVM/PVM)

This is the moment governance becomes scalable.

Instead of asking admins to manually create accounts/projects, teams request them through a standard process that automatically applies:

Baseline policies
Logging and monitoring
Networking defaults
Tag/label standards
Budget defaults
Access patterns (who gets admin, who gets read-only)

What a request form should collect (keep it minimal)

Team name
Application/product name
Environment (prod/non-prod)
Data classification (public/internal/confidential/restricted)
Expected monthly spend range (small/medium/large)
Owner group (email/identity group)

Output of vending machine (what gets created)

New account/project/subscription
Standard IAM roles/groups
Logging shipped to central place
Baseline security policies enforced
Budget + anomaly alerts configured
Mandatory tags/labels
Default network route strategy (hub/spoke)

Scaling win: New accounts/projects become safe by default.

The Guardrail Catalog (the part that actually prevents chaos)

Below are the guardrails that matter most at scale. You can copy this as your “starter guardrail catalog.”

Guardrail Group 1 — Identity & access (least privilege at scale)

Guardrail 1: No long-lived human access keys

Prevent: block creating permanent keys (or heavily restrict)
Operate: use SSO + short-lived access

Real example:
If an engineer leaves, you don’t want to hunt for scattered access keys across 200 accounts. With SSO, access stops centrally.

Guardrail 2: Separate human access from machine access

Humans: SSO roles
Machines: workload identities (service accounts/managed identities)

Real example:
A CI runner should not use an engineer’s credentials. Give it a dedicated identity with scoped permissions.

Guardrail 3: Permission boundaries (or role templates)

You allow teams to create roles/users, but only within a boundary.

Example rule:
“Teams can create IAM roles, but those roles can’t grant admin, can’t disable logging, can’t modify org policies.”

Guardrail 4: Break-glass access is controlled and audited

Only for emergencies
Strong MFA
Alerts on use
Short session duration

Guardrail Group 2 — Networking (reduce blast radius and surprise spend)

Guardrail 5: Default deny inbound from the internet (unless explicitly approved)

Prevent accidental public exposure
Use approved patterns for public apps

Real example:
Public endpoints must go through an approved ingress layer or gateway with WAF controls, not random public IPs.

Guardrail 6: Approved regions only

Prevent resources in random regions
Helps compliance and cost predictability

Real example:
A developer spins up GPU instances in an expensive region “just testing.” Region guardrails stop this instantly.

Guardrail 7: Central egress controls (where practical)

Reduce data exfiltration risk
Improve visibility
Make security teams happy without slowing devs

Guardrail Group 3 — Data protection (the “don’t get breached” essentials)

Guardrail 8: Encryption at rest is mandatory

Storage, databases, disks

Guardrail 9: Public storage is blocked by default

Storage buckets/containers should not be public unless explicitly approved

Guardrail 10: Secrets must be stored in a secrets manager

Prevent secrets in code, configs, or environment variables exposed in logs

Real example:
Someone prints env vars in a debug log. If secrets are not in env vars, damage is reduced.

Guardrail 11: Backup defaults for critical tiers

Prod databases: backup + retention
Non-prod: lighter policies

Guardrail Group 4 — Logging, detection, and audit (your “black box” recorder)

Guardrail 12: Centralized audit logs are mandatory

All accounts/projects ship audit logs to a central location
Immutable retention for a defined period

Guardrail 13: Central security findings aggregation

One view of security posture across everything

Guardrail 14: Alerts for high-risk changes

Disabling logging
Changing org policies
Opening wide network access
Creating public endpoints
Sudden spend spikes

Real example:
If someone disables audit logging (even accidentally), that should page the platform/security team immediately.

Guardrail Group 5 — Cost & resource hygiene (FinOps-friendly by design)

Guardrail 15: Tag/label enforcement

Minimum tags:

app, team, env, owner, cost_center, managed_by

Real example:
If a resource is untagged, it can’t be created (or it gets quarantined).

Guardrail 16: Budgets and anomaly alerts per account/project

A small default budget is better than no budget
Alert at 50%, 80%, 100%

Guardrail 17: Sandbox quotas and time limits

Sandboxes are for learning, not for running production workloads forever.

Example rule:
“Sandbox resources auto-expire after 7–14 days unless extended.”

Guardrail 18: Approved instance/resource families for non-prod

Prevent someone from creating huge instances “for testing”

Guardrail Group 6 — Deployment and change control (without slowing delivery)

Guardrail 19: Infrastructure as Code is the default path

Not “no console ever,” but:
- production changes go through IaC
- console is for investigation, not permanent change

Guardrail 20: Drift detection

If prod differs from IaC, flag it quickly

Guardrail 21: Standard pipelines for provisioning

The “vending machine” should be the normal route, not a special route.

How guardrails “scale” technically (the automation model)

You don’t scale by writing policies.
You scale by making policies repeatable and centrally managed.

The scalable model looks like this:

One place to define guardrails (policy-as-code repository)
One pipeline to apply guardrails across the hierarchy
One system to report compliance (dashboard)
One exception process (time-bound approvals)

“Guardrails as Code” (the mindset)

Treat guardrails like:

versioned code
reviewed changes
tested policies
staged rollouts
rollback ability

Real example:
You introduce a new “deny public storage” policy in warn-only mode first, monitor impact, then enforce.

The exception process (this is where most governance fails)

If you don’t design exceptions, people bypass governance.

A good exception process:

Has an owner and a business reason
Is time-bound (expires automatically)
Is logged and visible
Has compensating controls

Example exception

A team needs a public storage bucket for a public dataset.

Approved exception conditions:

Only a specific bucket name pattern
Only read-only public access
Access logging enabled
Data classification = public
Exception expires in 30 days unless renewed

This keeps governance strict and practical.

A real scenario walkthrough (so it clicks)

Situation:

You have 50 teams. Everyone shares one cloud account/project. Bills are chaotic.

You implement:

Create org hierarchy: Platform / Workloads / Sandbox / Quarantine
Build account/project vending machine
Move each team to its own account/project
Apply baseline guardrails:
- central logging
- region restrictions
- tag enforcement
- no public storage by default
- budgets and anomaly alerts
Add “detect + correct”:
- alert on public exposure
- auto-quarantine for repeated non-compliance
Establish weekly governance heartbeat:
- review exceptions
- review top anomalies
- measure compliance score

What changes in 30 days:

“Who owns this?” becomes obvious (tags + account ownership)
Security incidents become containable (blast radius reduced)
Cloud cost becomes explainable (per team/project budgets)
Engineers ship faster (self-serve accounts/projects)

Minimum Viable Guardrails (MVG): the fastest safe start

If you’re starting today, do these first:

Central audit logging across all accounts/projects
SSO + no long-lived human keys
Approved regions only
Block public storage by default
Mandatory tags/labels
Budget + anomaly alerts
A standard account/project vending machine
Quarantine mechanism for non-compliance

That set alone prevents most expensive mistakes.

Maturity levels (so you know what “good” looks like)

Level 1: Manual governance

Policies exist, but applied inconsistently
Lots of exceptions and drift

Level 2: Baseline automated

Vending machine exists
Core guardrails applied automatically
Central logging and budgets in place

Level 3: Detect + auto-remediate

Security posture centralized
Auto-fix for common misconfigurations
Compliance reporting is reliable

Level 4: Policy-as-code + engineering integration

Guardrails tested and deployed like software
Cost and security checks integrated into CI/CD
Exceptions are controlled and expiring

Common mistakes (and how to avoid them)

Mistake 1: “We’ll tag later”

Fix: enforce tags at creation time. Retro-tagging never finishes.

Mistake 2: Too many policies too early

Fix: start with MVG. Add guardrails only when they address real incidents or recurring waste.

Mistake 3: No exception process

Fix: time-bound exceptions with compensating controls. Without this, people bypass governance.

Mistake 4: No ownership mapping

Fix: every account/project needs an owner group and a cost owner.

Mistake 5: Governance that blocks delivery

Fix: build paved roads (vending machine + templates). Make the safe path the easiest path.

Final takeaway (the sentence to remember)

Guardrails that scale are not “rules.” They are automated defaults, clear ownership, fast feedback, and a safe self-service path.

If you implement:

a hierarchy,
a vending machine,
a minimal guardrail catalog,
centralized logging/security/cost views,
and a clean exception process,

…you’ll get governance that helps engineers move faster instead of slowing them down.

Perfect — here’s a ready-to-copy “Guardrail Catalog” tailored for a Kubernetes-heavy AWS setup (EKS + multi-account), but written so it still works if you later expand to Azure/GCP.

You can paste this into your internal wiki as your baseline governance standard.

Multi-account governance guardrail catalog (AWS + EKS)

Scope and goals (copy/paste)

Scope: All AWS accounts under the organization, including shared platform accounts and workload accounts running EKS and managed cloud services.

Goals:

Keep teams fast: self-service within safe boundaries
Prevent top security + cost mistakes by default
Centralize logs, identity, security findings, and cost visibility
Keep exceptions possible, controlled, time-bound, and auditable

Guardrail types: Prevent (block), Detect (alert), Correct (auto-remediate)

Enforcement levels:

BLOCK = must not happen (hard stop)
WARN = allowed but alerts + ticket
MONITOR = visible and tracked

1) Organization structure (baseline layout)

Accounts you should have

Core platform accounts

Management / Org Admin (very locked down)
Security tooling (Security Hub, GuardDuty admin, findings aggregation)
Log archive (central CloudTrail, Config, VPC Flow Logs)
Shared networking (optional hub if you use hub-spoke)
Shared CI/CD (optional)

Workload accounts

One per product/team (recommended), with separate prod and non-prod if needed

Sandbox accounts

For experiments, with strict budgets + expiry

Quarantine account / OU

For accounts/projects that violate critical guardrails repeatedly

OU layout (simple and scalable)

Platform OU (most restricted)
Workloads OU (standard restrictions)
Sandbox OU (tight cost controls)
Quarantine OU (maximum restrictions)

2) Account “Vending Machine” standard (what every new account gets)

When a new workload account is created, it must automatically include:

Identity & access

SSO integration enabled
Default roles: Admin (restricted), Operator, ReadOnly, Audit
Break-glass role configured + alarms

Logging & audit

Org-level CloudTrail enabled and sent to Log Archive
AWS Config enabled + aggregator in Security account
VPC Flow Logs enabled (at least for shared VPCs / critical networks)

Security posture

GuardDuty enabled and delegated to Security account
Security Hub enabled with aggregation
Baseline Config rules enabled

Cost controls

Budget created with alerts (50/80/100%)
Cost Anomaly Detection monitors for the account (or OU)
Mandatory tagging enforcement (details below)

Network baseline

Default deny inbound pattern
Region restrictions (only approved regions)
No public endpoints by default unless via approved pattern

3) Mandatory tags (cost + ownership + ops)

Required tags on all supported resources:

app (service/product)
team (owner team)
env (prod/stage/dev)
owner (email or group)
cost_center
managed_by (terraform/helm/manual)

Policy rule:

BLOCK creation of resources missing required tags (where supported).
For resources that can’t be tagged, they must be categorized as shared/platform and tracked in a shared-cost model.

4) Guardrails catalog (the actual rules)

Below is the “starter set” that prevents most chaos. Keep it tight and high-impact.

A) Identity & access guardrails

SSO required for humans

Level: BLOCK
Rule: No long-lived human access keys; no direct IAM users for humans.
Detect: alert on any IAM user creation outside break-glass patterns.

Break-glass access

Level: WARN (use only in emergencies)
Rule: break-glass role requires MFA, has short session duration, and triggers immediate alert + incident ticket.

Least privilege via role templates

Level: BLOCK
Rule: teams must use approved role templates; admin permissions restricted in workload accounts.

Permission boundaries for team-created roles

Level: BLOCK
Rule: even if teams create roles, permission boundaries prevent granting org-admin/security-disabling powers.

Workload identity (IRSA) required

Level: BLOCK
Rule: Kubernetes workloads must use IRSA for AWS access; no node-instance-role “god access”.

B) Network & exposure guardrails

Approved regions only

Level: BLOCK
Rule: restrict resource creation to approved regions.

No public S3 by default

Level: BLOCK
Rule: block public bucket policies and public ACLs unless exception approved.

No public inbound admin ports

Level: BLOCK
Rule: block wide-open inbound rules for SSH/RDP/admin ports; enforce approved access methods.

Public endpoints must use approved ingress patterns

Level: BLOCK/WARN depending on maturity
Rule: public apps must go through an approved ingress layer (e.g., ALB Ingress Controller + WAF if applicable), not random public IPs.

Egress visibility

Level: MONITOR → WARN
Rule: track major egress paths and large egress spikes; alert on unusual outbound traffic patterns.

C) Data protection guardrails

Encryption at rest

Level: BLOCK
Rule: encryption required for EBS, RDS, S3, EFS (where applicable).

Secrets management

Level: BLOCK
Rule: no secrets in code, container images, ConfigMaps, or plaintext env vars for production.
Kubernetes: require Secret store integration or encrypted secrets approach.

Backups

Level: BLOCK for prod, WARN for non-prod
Rule: production databases/storage require backup policies and retention aligned to data classification.

Data classification rules

Level: BLOCK
Rule: restricted data cannot be placed in sandbox accounts or public-facing storage.

D) Logging, monitoring, and audit guardrails

CloudTrail must stay on

Level: BLOCK
Rule: deny disabling CloudTrail / Config / log delivery.
Detect: alert on any attempt.

Central log retention

Level: BLOCK
Rule: minimum retention for audit logs; immutable storage for log archive.

Security findings aggregation

Level: MONITOR → WARN
Rule: all accounts must report to central Security account.
KPI: coverage percentage by OU.

E) Kubernetes (EKS) guardrails that actually scale

Pod Security baseline

Level: BLOCK for prod, WARN for non-prod initially
Rule: enforce Pod Security Standards (baseline/restricted) depending on workload needs.

No privileged containers by default

Level: BLOCK
Rule: deny privileged pods, hostPID/hostNetwork, hostPath mounts unless exception.

Resource requests required

Level: WARN → BLOCK
Rule: pods must set CPU/memory requests at minimum (prevents cluster waste and scheduling chaos).

Namespace isolation

Level: WARN
Rule: teams deploy into dedicated namespaces; apply RBAC boundaries per namespace.

NetworkPolicies for prod namespaces

Level: WARN → BLOCK
Rule: prod namespaces require default-deny + explicit allow rules (start with critical apps).

Image provenance

Level: WARN → BLOCK
Rule: only allow images from approved registries; block latest tag in prod.

Runtime access

Level: WARN
Rule: restrict kubectl exec in prod; log all access; require ticket/approval for sensitive namespaces.

Ingress standardization

Level: WARN
Rule: use one approved ingress strategy per cluster; enforce TLS; disallow plaintext for prod.

F) Cost and hygiene guardrails (FinOps-friendly)

Budgets per account + anomaly detection

Level: WARN
Rule: alerts at 50/80/100% and anomalies > threshold.

Sandbox quotas and expiry

Level: BLOCK
Rule: sandbox resources expire after N days unless extended.
Enforce: scheduled cleanup + owner notification.

Orphan cleanup

Level: WARN → Correct
Targets: unattached EBS volumes, old snapshots, idle load balancers, unused EIPs, outdated AMIs (non-prod first)

Non-prod scheduling

Level: WARN
Rule: non-prod environments should be schedulable off-hours unless business requires 24/7.

G) Delivery and change management guardrails

Infrastructure as Code for prod

Level: BLOCK
Rule: prod infra changes must go through IaC pipeline (Terraform/CloudFormation).
Console allowed for investigation, not permanent drift.

Drift detection

Level: WARN
Rule: detect drift in prod stacks; create tickets with owner + diff.

Policy-as-code

Level: MONITOR → WARN
Rule: guardrails are versioned, reviewed, tested, and rolled out progressively by OU.

5) Exception process (simple, strict, and fast)

An exception must include:

Business reason
Owner
Affected resources
Compensating controls (extra logging, narrower scope, time limits)
Expiration date

Rules:

Exceptions are time-bound (default 30 days)
Auto-expire unless renewed
Logged and visible in a central register
Repeated exceptions trigger a “fix the root cause” task

Example exception: public S3 bucket for public dataset

Allowed only with:
- Read-only public access
- Access logs enabled
- Explicit bucket naming convention
- Approved data classification = public
- Auto-expiry in 30 days

6) Rollout plan (so this doesn’t break teams)

Week 1–2: Inform mode

Turn on detection everywhere
Create dashboards: compliance %, untagged spend %, public exposure count
Start with WARN, not BLOCK (except critical items like CloudTrail off)

Week 3–6: Block the top-risk items

Start blocking:

Disabling audit logs
Public storage exposure
Unapproved regions
Long-lived human keys
Privileged pods in prod

Week 7–12: Expand to deeper controls

Enforce tagging
Enforce NetworkPolicies for prod namespaces
Enforce image source policies
Enforce IaC-only changes for prod

7) Governance KPIs (track these monthly)

% accounts onboarded to baseline (target 100%)
Allocated spend % via tags (target 90%+)
# critical guardrail violations (trend down)
Mean time to remediate critical findings (trend down)
% prod namespaces with NetworkPolicies (trend up)
# exceptions active + expired (keep low, auto-expire working)

8) “Paved roads” (how you keep engineers happy)

Governance scales when the safe path is easiest:

Provide:

Terraform modules/templates for common patterns (VPC, EKS, RDS, S3)
Standard Helm charts (logging/metrics, ingress, baseline policies)
Golden pipelines (build → scan → deploy)
Standard service blueprint (namespace + RBAC + NetworkPolicy + ingress + budget tags)

Result: engineers stop “inventing” infrastructure, and governance becomes effortless.

Mohammad Gufran Jahangir

Tags: #CloudAndDevOps

Category:

Multi-account / multi-project governance: guardrails that scale (practical, step-by-step)

What “multi-account / multi-project governance” really means

Guardrails: the 3 types (memorize this)

1) Prevent (stop bad things from happening)

2) Detect (spot issues fast)

3) Correct (auto-fix or fast-fix)

The outcome you’re building (the “landing zone” in plain English)

Step-by-step blueprint: guardrails that scale

Step 1 — Decide your account/project strategy (simple patterns)

The 3 most common patterns

Real example: a sane starting layout

Step 2 — Create a “home for everything” hierarchy

Example hierarchy (cloud-agnostic)

Step 3 — Build the “Account/Project Vending Machine” (AVM/PVM)

What a request form should collect (keep it minimal)

Output of vending machine (what gets created)

The Guardrail Catalog (the part that actually prevents chaos)

Guardrail Group 1 — Identity & access (least privilege at scale)

Guardrail 1: No long-lived human access keys

Guardrail 2: Separate human access from machine access

Guardrail 3: Permission boundaries (or role templates)

Guardrail 4: Break-glass access is controlled and audited

Guardrail Group 2 — Networking (reduce blast radius and surprise spend)

Guardrail 5: Default deny inbound from the internet (unless explicitly approved)

Guardrail 6: Approved regions only

Guardrail 7: Central egress controls (where practical)

Guardrail Group 3 — Data protection (the “don’t get breached” essentials)

Guardrail 8: Encryption at rest is mandatory

Guardrail 9: Public storage is blocked by default

Guardrail 10: Secrets must be stored in a secrets manager

Guardrail 11: Backup defaults for critical tiers

Guardrail Group 4 — Logging, detection, and audit (your “black box” recorder)

Guardrail 12: Centralized audit logs are mandatory

Guardrail 13: Central security findings aggregation

Guardrail 14: Alerts for high-risk changes

Guardrail Group 5 — Cost & resource hygiene (FinOps-friendly by design)

Guardrail 15: Tag/label enforcement

Guardrail 16: Budgets and anomaly alerts per account/project

Guardrail 17: Sandbox quotas and time limits

Guardrail 18: Approved instance/resource families for non-prod

Guardrail Group 6 — Deployment and change control (without slowing delivery)

Guardrail 19: Infrastructure as Code is the default path

Guardrail 20: Drift detection

Guardrail 21: Standard pipelines for provisioning

How guardrails “scale” technically (the automation model)

The scalable model looks like this:

“Guardrails as Code” (the mindset)

The exception process (this is where most governance fails)

Example exception

A real scenario walkthrough (so it clicks)

Situation:

You implement:

What changes in 30 days:

Minimum Viable Guardrails (MVG): the fastest safe start

Maturity levels (so you know what “good” looks like)

Level 1: Manual governance

Level 2: Baseline automated

Level 3: Detect + auto-remediate

Level 4: Policy-as-code + engineering integration

Common mistakes (and how to avoid them)

Mistake 1: “We’ll tag later”

Mistake 2: Too many policies too early

Mistake 3: No exception process

Mistake 4: No ownership mapping

Mistake 5: Governance that blocks delivery

Final takeaway (the sentence to remember)

Multi-account governance guardrail catalog (AWS + EKS)

Scope and goals (copy/paste)

1) Organization structure (baseline layout)

Accounts you should have

OU layout (simple and scalable)

2) Account “Vending Machine” standard (what every new account gets)

Identity & access

Logging & audit

Security posture

Cost controls

Network baseline

3) Mandatory tags (cost + ownership + ops)

4) Guardrails catalog (the actual rules)

A) Identity & access guardrails