CI/CD sounds simple until you’re responsible for a production system and realize:
- A “successful build” doesn’t mean the app is safe to ship
- “Deploy” isn’t the end — verification and rollback readiness matter more
- One bad release can burn hours of on-call time if rollback isn’t engineered in
This guide gives you a reference CI/CD pipeline you can adopt for most cloud apps (containers + Kubernetes or similar). It’s beginner-friendly, but it’s also how mature teams structure reliable delivery.
By the end, you’ll have a pipeline that:
- Builds reproducible artifacts
- Scans code + dependencies + container + IaC
- Deploys safely (staging → production)
- Verifies with smoke tests and health checks
- Rolls back fast with minimal panic
No fluff. Just steps, patterns, and examples.
The reference pipeline in one view
Here’s the shape we’re building (you’ll implement it step-by-step):
PR Pipeline (fast feedback)
- Lint + unit tests
- SAST + dependency scan + secret scan
- Build container (optionally) + quick container scan
- Report results back to PR
Main/Release Pipeline (shipping)
- Build → version → tag
- Run full test suite
- Scan (code, deps, container, IaC)
- Create SBOM + sign artifacts
- Push image to registry
- Deploy to staging
- Smoke tests + optional integration tests
- Promote to production (progressive)
- Post-deploy verification + monitoring gates
- Rollback plan ready at every step
The “golden rules” (these prevent 90% of CI/CD pain)
1) Build once, deploy many
Never rebuild the same version in staging and production.
You build an immutable artifact (example: a container image with a unique tag), then promote it across environments.
2) Everything is versioned
- App version
- Container image tag
- Helm chart / manifest version
- Database migration version (if applicable)
If you can’t name what’s running, you can’t roll it back confidently.
3) Security gates are part of the pipeline, not a meeting
Scanning must run automatically. If it’s manual, it won’t happen consistently.
4) Rollback is a feature, not a reaction
A mature pipeline always assumes: some release will fail.
So it makes rollback fast, safe, and boring.
What you need before building the pipeline
You can follow this guide with any CI system. The concepts stay the same.
Minimal prerequisites
- A Git repo for your app
- A Dockerfile (or build definition)
- A container registry (private is fine)
- A deployment target (Kubernetes recommended, but any environment works)
- Secrets storage (CI secret store or cloud secret manager)
- A way to run tests
Suggested repo structure (simple and scalable)
repo/
app/ # source code
Dockerfile
.ci/ # pipeline scripts (optional)
deploy/
k8s/ # manifests or helm chart
base/
overlays/
dev/
stage/
prod/
scripts/
smoke-test.sh
migrate.sh
Step 1 — BUILD (fast, reproducible, traceable)
Your build stage should answer 3 questions:
- Can we reproduce this build later?
- Can we trace this build to a commit?
- Can we trust the artifact is the same in every environment?
Build checklist (what “good” looks like)
- ✅ Deterministic dependencies (lock files pinned)
- ✅ Version injected from Git commit or tag
- ✅ Unit tests run (at least)
- ✅ Container image built with a unique tag
- ✅ Artifact stored (image pushed OR saved as build output)
A practical versioning scheme
Use something humans and machines can read:
1.6.0for releases (tags)1.6.0+sha-abc1234for traceability- Container tag examples:
app:1.6.0app:sha-abc1234app:1.6.0-sha-abc1234
Rule: Production should deploy release tags or commit tags, not “latest”.
Example: container build commands (conceptual)
- Build:
docker build -t app:sha-abc1234 . - Test: run unit tests in CI
- Push:
docker push app:sha-abc1234
Step 2 — SCAN (catch issues before they become incidents)
Scanning isn’t one tool. It’s coverage.
A strong pipeline scans 5 areas:
- Secrets scan (accidental keys in code)
- SAST (static code vulnerabilities)
- SCA / dependency scan (libraries you import)
- Container image scan (OS packages + known CVEs)
- IaC scan (Terraform/Kubernetes misconfigurations)
The right way to gate scans (so dev velocity stays high)
PR gating (fast)
Block PRs only for:
- leaked secrets
- critical vulnerabilities with known exploitability
- clearly unsafe IaC patterns (public buckets, open security groups, privileged pods)
Everything else becomes:
- warnings
- tickets
- backlog items
Release gating (strict)
Before production, you enforce:
- no critical vulnerabilities without exception approval
- no embedded secrets
- baseline security policies satisfied
Real example: a sensible vulnerability policy
- Critical: block release (unless exception approved)
- High: block if internet-facing + reachable, otherwise ticket
- Medium/Low: ticket + fix in next sprint
- Accepted risk: record with expiry date (don’t accept forever)
This keeps you secure and shipping.
Step 3 — PACKAGE (SBOM + signing + provenance)
This is where many pipelines level up.
Why SBOM matters (engineer explanation)
An SBOM is simply: “What exactly is inside this artifact?”
If a library vulnerability drops tomorrow, you can answer:
“Are we affected? Which services? Which versions?”
What to generate and store
- SBOM file (for the image/build)
- Build metadata (commit, build ID, dependency lock hash)
- Optional: signature for the image
Result: your pipeline produces artifacts you can audit and trust.
Step 4 — DEPLOY (staging first, then production)
Deploy should be boring and repeatable.
Environment strategy that works for most teams
- dev: fast, flexible, may use mocks
- stage: production-like, used for final verification
- prod: controlled, progressive releases, strict gates
Promotion rule
Only deploy to production from:
- a tagged release, or
- an approved commit from main
No “random branch deploys to prod.”
The reference deployment flow (staging → prod)
Stage deployment
- Apply manifests / chart to staging
- Wait for rollout complete
- Run smoke tests
- Optional: integration tests
- Capture deployment report (versions, rollout time, test results)
Prod deployment (progressive)
- Deploy canary (small percentage)
- Verify (metrics + logs + error rates)
- Gradually increase traffic
- Full rollout
- Post-deploy verification window
Step 5 — VERIFY (the step teams skip… and regret)
Deploying is not the same as shipping.
You need automated checks that answer:
- Is the service responding?
- Is latency acceptable?
- Are error rates normal?
- Are key workflows working?
Smoke tests (simple and powerful)
A smoke test is a short script that:
- hits
/healthand core endpoints - checks auth works (if relevant)
- validates one “golden path” transaction
Example smoke-test script behavior
- Call health endpoint
- Call one API endpoint with a test token
- Validate response schema
- Exit non-zero if any check fails
You run smoke tests:
- after staging deploy
- after production canary
- after full rollout
Step 6 — ROLLBACK (fast, safe, and predictable)
Rollback isn’t one button. It’s a design.
Two rollback types you must plan for
A) Application rollback (easy)
When the app code is bad:
- Roll back to the previous image tag / chart version
- Re-route traffic back to stable version
This should be automated and fast.
B) Data rollback (hard)
When migrations or data changes are involved:
- You often cannot “undo” safely
So you use the expand/contract pattern:
- Expand: add new columns/fields in a backward-compatible way
- Deploy app that writes both old and new (if needed)
- Migrate data safely
- Contract: remove old fields later (after stability)
Rule: If a release includes a breaking DB change, rollback becomes risky.
So engineer DB changes to be backward-compatible.
Rollback strategies you should know (choose based on risk)
1) Rolling rollback (basic)
- Re-deploy previous version
- Works when traffic can tolerate brief disruption
2) Blue/Green (clean rollback)
- Two environments: Blue (current), Green (new)
- Switch traffic to Green
- If bad: switch back to Blue
Rollback is almost instant (traffic switch).
3) Canary (best balance)
- Send small traffic to new version
- If metrics degrade: stop canary and revert
This reduces blast radius dramatically.
What should trigger an automatic rollback?
Pick a short verification window (example: 10–20 minutes after deployment).
If any of these break thresholds, rollback automatically:
- Error rate over X%
- Latency p95 over Y ms
- CrashLoopBackOff / unhealthy pods
- Failed smoke tests
Important: Auto-rollback should be conservative.
You don’t want flapping. Use sensible thresholds.
The Reference Pipeline (copyable blueprint)
Below is a pipeline blueprint written in a generic CI style so you can adapt it to any CI tool.
PR Pipeline (fast feedback)
Goal: prevent unsafe code from merging.
Stages
lintunit_testsecret_scansast_scandependency_scaniac_scan(if deploy files changed)build_check(optional container build)
Outputs
- PR status checks (pass/fail)
- Security report summary
- Artifact only if needed (not always)
Main Pipeline (ship)
Goal: produce a trusted artifact and deploy progressively.
Stages
build- set version from git
- run unit tests
- build image
app:sha
scan- run secret scan
- SAST + dependency scan
- image scan
- IaC scan
package- generate SBOM
- sign artifact (optional)
push- push image to registry
- publish SBOM + metadata
deploy_stage- deploy image tag to staging
test_stage- smoke tests + optional integration tests
promote_prod- manual approval or policy gate (depending on org)
deploy_prod_canary- canary release
verify_canary- metrics checks + smoke tests
deploy_prod_fullverify_prodrollback_if_needed
- automated rollback logic on failure
Real example walkthrough (from commit to production)
Let’s play out a real release:
Day 1: Developer opens PR
- Lint fails → fixed quickly
- Dependency scan flags a vulnerable library (high severity)
- Policy says: ticket created, but PR allowed because it’s not exploitable in this path
Result: dev velocity stays high, risk is tracked.
Day 2: Merge to main
Main pipeline runs:
- Build creates
app:sha-abc1234 - Scans pass
- SBOM generated
- Image pushed
Day 2: Deploy to staging
- Staging deploy succeeds
- Smoke test fails because a config value is missing
Pipeline stops. Nothing reaches prod.
Fix is made, pipeline re-runs.
Day 3: Production canary
- 5% traffic routed to new version
- Error rate rises above threshold within 3 minutes
Auto-rollback triggers: - Traffic goes back to stable version
- Incident avoided
- Pipeline marks release as failed with logs + metrics snapshot
That is a mature pipeline: fast failure, tiny blast radius, safe rollback.
Common mistakes (and how to avoid them)
Mistake 1: “We’ll add scanning later”
Later never comes. Add scanning early with soft gates, then tighten.
Mistake 2: Rebuilding per environment
This destroys traceability. Build once, promote.
Mistake 3: “Rollback = redeploy previous”
That’s only true if data changes are backward-compatible.
Mistake 4: No post-deploy verification
Deploying without verification is gambling.
Mistake 5: Secrets in CI variables forever
Rotate secrets, use short-lived credentials where possible, and audit access.
CI/CD maturity levels (so you know what to aim for)
Level 1: Basic
- Build + unit tests + deploy
(works until your first serious incident)
Level 2: Safe
- Add scanning + staging + smoke tests
(now you block common disasters)
Level 3: Reliable
- Progressive delivery + automated rollback
(blast radius becomes small)
Level 4: Trusted
- SBOM + signing + policy gates + audit trails
(you can prove what you shipped and why)
Final “reference checklist” (use this as your implementation guide)
Build
- deterministic dependencies
- versioning from Git
- immutable artifacts (image tags)
- tests in CI
Scan
- secret scan
- SAST
- dependency scan
- container scan
- IaC scan
- clear gating policy
Deploy
- staging first
- promotion-only to prod
- progressive delivery for prod
Verify
- smoke tests automated
- metrics-based gates
- post-deploy verification window
Rollback
- rollback tested regularly
- backward-compatible DB strategy
- canary abort or traffic switch available