Deployment Cheatsheet Cheatsheet

🚀

Deployment Overview & Strategy

FOUNDATION

Deployment Lifecycle

1. PlanDefine requirements, architecture, environments

2. DevelopWrite code, unit tests, documentation

3. BuildCompile, bundle, create artifacts / images

4. TestUnit → Integration → E2E → Performance → Security

5. StageDeploy to staging, run acceptance tests, UAT

6. DeployRelease to production (blue-green, canary, rolling)

7. MonitorObserve metrics, logs, traces; set alerts

8. IterateCollect feedback, optimize, repeat

Deployment Strategies Comparison

Strategy	Downtime	Risk	Complexity	Best For
Rolling	None	Medium	Low	Stateless services, small teams
Blue-Green	None	Low	Medium	Full switch deployments, databases
Canary	None	Low	High	Gradual rollout, metric validation
A/B Testing	None	Low	High	Feature experiments, user testing
Shadow	None	Very Low	High	Load testing on production traffic
Feature Flags	None	Very Low	Low	Progressive feature rollout

CI/CD Pipeline Flow

SourceGit push, PR, tag trigger

BuildCompile, dependency install, artifact creation

TestUnit, integration, E2E, lint, type-check

SecuritySAST, dependency scan, secrets check

ArtifactDocker image, bundle, binary — push to registry

DeployStaging → Approval → Production

VerifySmoke tests, health checks, synthetic monitoring

12-Factor App Principles (Key 6)

#	Principle	Summary
1	Codebase	One codebase per app, many deploys
3	Config	Store config in env vars, not in code
4	Backing Services	Treat databases/cache as attached resources
5	Build/Release/Run	Strictly separate stages
8	Concurrency	Scale out via process model
9	Disposability	Fast startup, graceful shutdown
10	Dev/Prod Parity	Keep environments as similar as possible
11	Logs	Treat logs as event streams

💡

Always separate build, release, and run stages. A deploy should never trigger a build. Build once, promote the same artifact through environments. This guarantees what you tested in staging is exactly what runs in production.

🐳

Container Deployment (Docker)

CONTAINERS

Dockerfile Best Practices

Multi-stageUse separate builder & runtime stages to reduce image size

Layer cachingCOPY package files first, install deps, then copy source

Non-root userAlways add a USER directive, never run as root

.dockerignoreExclude node_modules, .git, dist, test files

Specific tagsUse alpine or distroless base images; pin versions

Minimal layersCombine RUN commands with && to reduce layer count

Read-only FSAdd --read-only flag; use tmpfs for writable dirs

HealthcheckAdd HEALTHCHECK instruction for orchestration

Container Registries

Registry	Provider	Auth Method
Docker Hub	Docker Inc.	docker login (username/token)
ECR	AWS	aws ecr get-login-password \| docker login
GCR / Artifact Registry	Google Cloud	gcloud auth configure-docker
ACR	Azure	az acr login
GitHub Container Registry	GitHub	ghcr.io (PAT or GITHUB_TOKEN)
Quay	Red Hat	docker login quay.io

Dockerfile

# ── Multi-stage Dockerfile (Node.js) ──

# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force
COPY . .
RUN npm run build

# Stage 2: Runtime
FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

USER nonroot:nonroot
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s CMD curl -f http://localhost:3000/health
ENTRYPOINT ["node", "dist/server.js"]

Container Networking

Driver	Scope	Use Case
bridge (default)	Single host	Containers on same host communicate via internal DNS
host	Single host	Container shares host network stack (no isolation)
overlay	Multi-host (Swarm)	Cross-host communication for Docker Swarm
macvlan	Single host	Container gets a real MAC address on the LAN
none	Single host	No networking — fully isolated container

Container Security Checklist

Practice	How
Non-root user	USER 1000:1000 in Dockerfile
Read-only FS	docker run --read-only --tmpfs /tmp
No SUID binaries	Use distroless or scratch images
Secrets	Use Docker secrets or mounted files, never ENV for secrets
Image scanning	Trivy, Snyk, docker scout
Minimal image	Alpine, distroless, or DockerSlim
Capabilities drop	--cap-drop ALL --cap-add NET_BIND_SERVICE

⚠️

Always use multi-stage builds and distroless images. A typical Node.js app goes from ~1 GB (full image) to ~50 MB (distroless). Use DockerSlim to automatically strip unused dependencies. Scan images with Trivy before pushing to any registry.

☸️

Kubernetes Deployment

ORCHESTRATION

Workload Types

Resource	Use Case	Key Features
Deployment	Stateless apps	Rolling updates, rollbacks, ReplicaSet management
StatefulSet	Databases, queues	Stable identities, persistent volumes, ordered pods
DaemonSet	Logging, monitoring agents	Runs on every node (or subset via taints)
Job	Batch processing	Runs to completion, supports parallelism
CronJob	Scheduled tasks	Cron schedule, concurrency policy, history limits
ReplicaSet	Pod replication	Ensures N pods running (usually via Deployment)

Service Types & Networking

Service	Reachability	Use Case
ClusterIP	Cluster internal only	Internal microservice communication
NodePort	NodeIP:Port (30000-32767)	Dev/test external access
LoadBalancer	Cloud LB + NodePort	Production external traffic
Headless (None)	Direct pod IPs via DNS	StatefulSet pods, custom load balancing

k8s-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  labels:
    app: web-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web-app
          image: registry.example.com/web-app:v1.2.0
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /ready
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: db-url
      terminationGracePeriodSeconds: 30

K8s Configuration & Storage

Resource	Purpose
ConfigMap	Non-sensitive configuration (env vars, config files)
Secret	Sensitive data (passwords, tokens, TLS certs)
PersistentVolume (PV)	Cluster-wide storage resource
PersistentVolumeClaim (PVC)	Pod request for storage (bound to PV)
StorageClass	Dynamic provisioning (gp3, io1, standard)
Ingress	HTTP/S routing with TLS, path-based routing

Scaling & Availability

Feature	What It Does
HPA	Scales pods based on CPU/memory/custom metrics
VPA	Adjusts resource requests/limits automatically
Cluster Autoscaler	Adds/removes nodes based on pending pods
PodDisruptionBudget	Min/max available pods during disruptions
TopologySpreadConstraints	Spread pods across zones/nodes
PriorityClass	Preemption ordering for resource contention

💡

GitOps is the standard for Kubernetes. Use ArgoCD or Flux to sync your cluster state from Git. Define everything declaratively (Helm + Kustomize). Never run kubectl apply manually in production — if it is not in Git, it does not exist.

⚡

Serverless Deployment

SERVERLESS

Serverless Platforms

Platform	Language Support	Timeout	Key Features
AWS Lambda	Node, Python, Java, Go, Rust, .NET	15 min	Layers, SnapStart, Provisioned Concurrency
Azure Functions	C#, JS, Python, Java, PowerShell	10 min (Consumption)	Durable Functions,_slots, managed identity
Google Cloud Functions	Node, Python, Go, Java, Ruby	9 min (1st gen), 60 min (2nd gen)	Eventarc, Cloud Run-based (2nd gen)
Cloudflare Workers	JS, TS, Wasm	30s (free), 15 min (paid)	Edge runtime, KV storage, R2
Vercel Edge	JS, TS, Go, Rust (Wasm)	30s	Edge SSR, ISR, streaming
Deno Deploy	TS, JS, Wasm	60s	Edge-native, zero-config deploy

Function Triggers

Trigger Type	Examples
HTTP / REST	API Gateway, ALB, Function URL
Queue / Message	SQS, SNS, EventBridge, Pub/Sub, Service Bus
Timer / Schedule	EventBridge Scheduler, CronTrigger
Blob / Storage	S3, Azure Blob, GCS object changes
Database	DynamoDB Streams, Firebase, Supabase triggers
Stream	Kinesis, Kafka (via EventBridge)

serverless-framework.yml

# ── Serverless Framework config ──
service: my-api
frameworkVersion: '3'

provider:
  name: aws
  runtime: nodejs20.x
  architecture: arm64
  stage: production
  region: us-east-1
  environment:
    NODE_ENV: production
  httpApi:
    cors: true

functions:
  api:
    handler: src/handler.handler
    events:
      - httpApi:
          path: /{proxy+}
          method: ANY
    provisionedConcurrency: 5   # Reduce cold starts

resources:
  Resources:
    ApiLogGroup:
      Type: AWS::Logs::LogGroup
      Properties:
        RetentionInDays: 30

🚫

Cold starts are the #1 serverless pain point. Mitigate with: provisioned concurrency (AWS), min instances (GCP), keep-warm cron pings, choosing ARM64, and keeping packages small. For APIs with consistent traffic, consider container-based (Cloud Run, Fargate) instead.

☁️

PaaS Deployment

PAAS

Popular PaaS Platforms

Platform	Best For	Key Features
Heroku	Quick startups, low ops	Git push deploy, add-ons, Review Apps
Render	Modern Heroku alternative	Docker, background workers, auto-deploy
Railway	Full-stack apps	PR environments, infra-as-code, monorepo support
Fly.io	Edge/low-latency	Docker-based, global regions, persistent volumes
Northflank	Multi-service apps	Docker compose, CI/CD, managed DBs
Google App Engine	Google Cloud native	Auto-scaling, versions, traffic splitting

PaaS Concepts

Concept	Details
Buildpacks	Auto-detect language, install deps, build artifact
Procfile	Defines process types: web, worker, clock
Environment vars	Primary config mechanism (12-factor)
Add-ons	Managed services (Postgres, Redis, SendGrid)
Slug/Image	Compiled artifact uploaded to dyno/instance
Release	Immutable artifact + config version

Procfile

# ── Procfile (Heroku / Render / Railway) ──
web: npm start
worker: node worker.js
release: npx prisma migrate deploy
clock: node scheduler.js

💡

Use PaaS for early-stage projects and small teams. Zero infrastructure management, Git-to-deploy, and built-in scaling. When you need custom networking, GPU access, or cost optimization at scale, migrate to containers (Kubernetes) or bare metal.

🌐

Static Site & Frontend Deployment

FRONTEND

Frontend Hosting Platforms

Platform	Build	Key Features
Vercel	Git → auto-detect	Edge SSR, ISR, Preview deploys, Analytics
Netlify	Git → auto-detect	Edge Functions, Split Testing, Forms, Identity
Cloudflare Pages	Git → auto-detect	Global CDN, Workers integration, R2
GitHub Pages	Git branch	Free, Jekyll native, Actions-based deploys
AWS Amplify	Git → auto-detect	Fullstack (auth, API, hosting), SSR support
Firebase Hosting	firebase deploy	CDN, Cloud Functions, real-time DB

Rendering Strategies

Strategy	When to Use	Platforms
SSG (Static)	Blogs, docs, marketing pages	All platforms
SSR (Server)	Dynamic, personalized content	Vercel, Netlify, Cloudflare Pages
ISR (Incremental)	Frequently updated static pages	Vercel, Netlify ISR
CSR (Client)	Highly interactive apps, dashboards	All platforms
Streaming SSR	Large pages, progressive loading	Next.js, Remix on Vercel
Edge SSR	Ultra-low latency dynamic pages	Vercel Edge, Cloudflare Workers

next.config.js

// ── Next.js Deployment Configuration ──
/** @type {import('next').NextConfig} */
const nextConfig = {
  output: 'standalone',           // For Docker / bare metal
  images: {
    remotePatterns: [{
      protocol: 'https',
      hostname: 'cdn.example.com',
    }],
  },
  experimental: {
    serverActions: true,
  },
};

// For Vercel: git push → auto-detect & deploy
// For Docker: use output: 'standalone' with node:alpine
// For Netlify: use @netlify/plugin-nextjs

⚠️

Preview deploys are essential for frontend teams. Every PR should generate a unique preview URL. Vercel and Netlify provide this out of the box. Use branch deploys for staging environments. Always test SSR pages with real data before promoting to production.

🖥️

Bare Metal / VM Deployment

TRADITIONAL

Traditional Deployment Stack

Layer	Tool	Purpose
Reverse Proxy	nginx / Caddy	TLS termination, static serving, rate limiting
App Server	PM2 / systemd	Process management, auto-restart, log rotation
Database	PostgreSQL / MySQL	Managed via systemd, daily backups
Load Balancer	HAProxy / nginx	Distribute traffic across app instances
SSL/TLS	Let's Encrypt / Certbot	Free auto-renewing TLS certificates
Firewall	UFW / iptables	Port filtering, IP whitelisting

Process Managers

Manager	Type	Best For
systemd	System-level	Production Linux servers (built-in)
PM2	Node.js	Node.js apps, cluster mode, graceful reload
supervisord	General purpose	Multi-language process management
runit	System-level	Lightweight init system (used by Docker)

deploy.sh

#!/bin/bash
# ── Bare Metal Deploy Script ──
set -euo pipefail

APP_DIR="/opt/myapp"
BRANCH="main"

echo ">>> Pulling latest code..."
cd $APP_DIR && git pull origin $BRANCH

echo ">>> Installing dependencies..."
npm ci --production

echo ">>> Running database migrations..."
npx prisma migrate deploy

echo ">>> Building application..."
npm run build

echo ">>> Restarting service..."
sudo systemctl restart myapp
sudo systemctl status myapp --no-pager

echo ">>> Health check..."
sleep 3 && curl -sf http://localhost:3000/health || exit 1
echo "Deploy successful!"

🚫

Always use systemd for production services. It handles auto-restart on crash, log rotation (journald), dependency ordering, and resource limits. Combine with nginx as reverse proxy for TLS termination. Never run Node.js directly — always use a process manager.

🗄️

Database Deployment

DATABASE

Managed Database Services

Service	Cloud	Engines
RDS	AWS	PostgreSQL, MySQL, MariaDB, SQL Server, Oracle
Cloud SQL	GCP	PostgreSQL, MySQL, SQL Server
Azure SQL	Azure	SQL Server, PostgreSQL, MySQL, CosmosDB
PlanetScale	Serverless	MySQL-compatible, branching, non-blocking schema changes
Neon	Serverless	PostgreSQL, branching, auto-scaling, serverless driver
Supabase	Serverless	PostgreSQL, auth, realtime, storage built-in

Migration Strategies

Tool	Language	Key Features
Prisma Migrate	Node/TS	Declarative schema, auto-generate migrations
Flyway	Java/Kotlin	Version-controlled SQL migrations
Liquibase	Java/Kotlin	XML/YAML/JSON changelogs, rollback support
EF Core Migrations	C#/.NET	Code-first model changes
Atlas	Go	Declarative HCL schema, plan-and-apply workflow
golang-migrate	Go	CLI + library, Go/SQL source files

Database Scaling Strategies

Strategy	How	Trade-off
Vertical	Bigger instance (more CPU/RAM)	Simple but has a ceiling, brief downtime
Read replicas	Direct reads to replicas	Write is single-node; eventual consistency
Connection pooling	PgBouncer, ProxySQL	Reduces connections to the DB server
Sharding	Partition data across DB instances	Complex; app must be shard-aware
Partitioning	Split tables by range/hash/list	Transparent to app; single DB
Caching	Redis, Memcached in front of DB	Reduces read load; cache invalidation complexity

Backup & Recovery

Feature	Details
Automated backups	Daily snapshots (AWS RDS, Cloud SQL)
Point-in-time recovery	Restore to any second (PITR), 7-35 day retention
Cross-region replication	DR standby in another region
Export/Import	pg_dump, mysqldump, or managed export
Backup testing	Regularly restore to a test environment

💡

Always run migrations in a separate deploy step before the app. Use backward-compatible migrations (add column before removing old one). Test rollbacks. Never change schema in application code. In Kubernetes, use init containers or Jobs for migrations.

🔄

CI/CD Pipelines

AUTOMATION

CI/CD Platforms

Platform	Config Format	Runners	Notes
GitHub Actions	YAML (.github/workflows)	GitHub-hosted or self-hosted	Largest marketplace; free for public repos
GitLab CI	.gitlab-ci.yml	GitLab-hosted or self-hosted	Built-in container registry, environments
Azure DevOps	azure-pipelines.yml	Microsoft-hosted or self-hosted	Deep Azure integration, release gates
Jenkins	Jenkinsfile	Self-hosted agents	Most flexible, plugin ecosystem, heavy setup
CircleCI	.circleci/config.yml	Cloud or self-hosted	Docker layer caching, orbs
Bitbucket Pipelines	bitbucket-pipelines.yml	Atlassian-hosted	Integrated with Bitbucket, Docker support

Pipeline Best Practices

Practice	Why
Cache dependencies	Save 50-80% build time (npm, pip, Docker layers)
Parallel jobs	Run lint, test, security scan simultaneously
Artifact promotion	Build once, deploy same artifact everywhere
Environment protection	Require approval for production deploys
Secret scanning	TruffleHog, gitleaks in CI pipeline
Fast feedback	Under 10 min total pipeline time
Immutable artifacts	Version-tagged Docker images, never mutable tags

.github/workflows/deploy.yml

name: CI/CD Pipeline
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20, cache: 'npm' }
      - run: npm ci
      - run: npm run lint
      - run: npm run test
      - run: npm run test:e2e

  build-push:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - uses: actions/checkout@v4
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/org/app:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    needs: build-push
    runs-on: ubuntu-latest
    environment: production
    steps:
      - run: echo "Deploy image ${{ github.sha }} to production"

⚠️

Keep CI pipelines under 10 minutes. Use dependency caching (npm, Docker layers), parallel test execution, and incremental builds. A slow pipeline discourages developers from committing. Track pipeline duration as a metric to improve.

🏗️

Infrastructure as Code (IaC)

IAC

IaC Tools Comparison

Tool	Language	State Mgmt	Best For
Terraform	HCL	Remote state (S3, GCS)	Multi-cloud, mature ecosystem, modules
Pulumi	Python, TS, Go, C#	Pulumi Cloud / backend	Teams preferring real languages over HCL
AWS CDK	TypeScript, Python, Java	CloudFormation	AWS-native, object-oriented IaC
CloudFormation	JSON/YAML	AWS managed	AWS-native, no extra tooling
Bicep	Bicep DSL	Azure managed	Azure-native, cleaner than ARM templates
Crossplane	YAML (CRDs)	Kubernetes	K8s-native multi-cloud IaC

Terraform vs Pulumi

Criteria	Terraform	Pulumi
Language	HCL (declarative DSL)	General-purpose (TS, Python, Go)
Learning curve	Moderate (learn HCL)	Low (use existing language skills)
Ecosystem	Massive (15K+ providers)	Growing (Terraform providers compatible)
Testing	terratest (Go), pytest	Native unit testing per language
State	S3/GCS/remote backend	Pulumi Cloud / S3 / local
GitOps	Atlantis, TF Controller	Pulumi Deployments

main.tf

# ── Terraform: EC2 + RDS + VPC ──
terraform {
  required_version = ">= 1.5"
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
  backend "s3" {
    bucket = "my-terraform-state"
    key    = "prod/terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" { region = "us-east-1" }

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"
  cidr    = "10.0.0.0/16"
  azs     = ["us-east-1a", "us-east-1b"]
  public_subnets  = ["10.0.1.0/24", "10.0.2.0/24"]
  private_subnets = ["10.0.3.0/24", "10.0.4.0/24"]
}

resource "aws_db_instance" "app" {
  engine               = "postgres"
  engine_version       = "16"
  instance_class       = "db.t3.micro"
  allocated_storage    = 20
  db_name              = "appdb"
  username             = "admin"
  password             = var.db_password
  skip_final_snapshot  = true
}

🚫

Always use remote state with locking. Never store Terraform state locally in production. Use S3 + DynamoDB (AWS) or GCS for state. Lock prevents concurrent state corruption. Enable state encryption at rest.

🔐

Configuration & Secrets Management

SECURITY

Secrets Management Solutions

Solution	Provider	Key Features
HashiCorp Vault	Self-hosted or HCP	Dynamic secrets, PKI, encryption-as-service, transit
AWS Secrets Manager	AWS	Auto-rotation, RDS integration, cross-account access
AWS SSM Parameter Store	AWS	Hierarchical params, KMS encryption, free tier generous
Azure Key Vault	Azure	Keys, secrets, certificates, managed HSM
GCP Secret Manager	GCP	Auto-replication, versioning, IAM integration
1Password / Doppler	SaaS	Developer-friendly, CLI sync, team management

Kubernetes Secrets Strategies

Approach	How	Trade-off
K8s native Secrets	Base64 encoded, etcd	Simple but not encrypted by default
Sealed Secrets	Bitnami, encrypt with cert	GitOps-friendly, store sealed data in Git
External Secrets Operator	Sync from Vault/AWS/GCP	Centralized, auto-sync, enterprise-grade
SOPS + Flux	Encrypt with age/KMS	GitOps-native, encrypted files in repo
CSI Secrets Store	Mount secrets as volumes	Pod-level secret injection, auto-rotation

⚠️

Never commit secrets to Git — ever. Use pre-commit hooks with gitleaks or detect-secrets. Store secrets in Vault/AWS SM and inject at runtime. Use .env.example files (no real values) in repos. Rotate secrets regularly and audit access.

📊

Monitoring & Observability

OBSERVABILITY

The Three Pillars of Observability

Pillar	Tool	What It Tells You
Metrics	Prometheus, Grafana, Datadog	Numbers over time (CPU, latency, error rate)
Logs	ELK Stack, Loki, CloudWatch	Discrete events (errors, requests, stack traces)
Traces	Jaeger, Zipkin, OpenTelemetry	Request lifecycle across services

SLO/SLI/SLA Definitions

Term	Definition	Example
SLI	Service Level Indicator (the metric)	Request latency p99 < 200ms
SLO	Service Level Objective (the target)	99.9% of requests under 200ms
SLA	Service Level Agreement (the contract)	99.9% uptime or credit refund
Error Budget	Allowed failures per period	0.1% of 30 days = 43.2 min downtime
Burn Rate	How fast error budget is consumed	Burn rate 1x = normal; 6x = page immediately

Alerting Tools & Strategies

Tool	Type	Best For
PagerDuty	On-call management	Escalation policies, incident workflows
OpsGenie	On-call management	Team routing, alert deduplication
Grafana Alerts	Metric-based	Prometheus/Grafana stack native alerts
AWS CloudWatch Alarms	Cloud-native	AWS resource metrics and logs
Sentry	Error tracking	Application exceptions, breadcrumbs, releases

Golden Signals (Google SRE)

Signal	Question	Metric Example
Latency	Is the service fast enough?	p50, p95, p99 response time
Traffic	How much load?	Requests/sec, concurrent connections
Errors	Are things failing?	Error rate %, HTTP 5xx count
Saturation	Are we near capacity?	CPU %, memory %, queue depth

💡

Monitor the four golden signals: Latency, Traffic, Errors, Saturation. Set alerts on error rates and saturation — not just thresholds. Use SLO-based alerting (burn rate) instead of static thresholds to reduce alert fatigue. Always have dashboards ready before you need them.

🛡️

Security in Deployment

SECURITY

Security Scanning Tools

Category	Tool	What It Scans
Container Image	Trivy, Snyk, Grype	CVEs in OS packages and dependencies
SAST	SonarQube, CodeQL, Semgrep	Static code analysis for vulnerabilities
DAST	OWASP ZAP, Burp Suite	Running application vulnerability scanning
Dependency	npm audit, Snyk, Dependabot	Known CVEs in npm/pip/maven packages
Secret Detection	gitleaks, trufflehog	Hardcoded secrets in code/repo history
IaC Scanning	Checkov, tfsec, Kics	Misconfigurations in Terraform/CloudFormation

Network & Access Security

Practice	How
TLS Everywhere	Let's Encrypt, managed certs, no HTTP
VPC / Private Subnets	No public DB endpoints, private subnets for app tiers
Security Groups	Least-privilege inbound/outbound rules
WAF	AWS WAF, Cloudflare WAF, OWASP rule sets
RBAC / IAM	Principle of least privilege, no admin access for apps
Service Accounts	Workload identity, IRSA, managed identity

🚫

Shift security left. Run SAST, dependency scans, and secret detection in CI — before merge, not after deploy. Block PRs with critical CVEs. Use admission controllers (OPA/Kyverno) in K8s to enforce policies at runtime.

🌍

Multi-Region & High Availability

HA/DR

HA Architectures

Pattern	Description	Complexity
Active-Active	All regions serve traffic simultaneously	High (data sync, split brain)
Active-Passive	One active, standby takes over on failure	Medium (DNS failover)
Pilot Light	Minimal resources in DR region, scale on failover	Low (fast RTO)
Warm Standby	Scaled-down replica in DR, scale up on failover	Medium

DR Metrics & Routing

Metric	Definition
RPO	Recovery Point Objective — max acceptable data loss time
RTO	Recovery Time Objective — max acceptable downtime
DNS Failover	Route 53 health checks, Cloudflare failover routing
Global LB	AWS Global Accelerator, Cloudflare, GCP Global LB
Anycast	Single IP announced from multiple POPs worldwide

⚠️

Define your RPO and RTO before architecting DR. If RPO is 0, you need synchronous replication (expensive). If RPO is 1 hour, async replication suffices. Test your DR plan quarterly — an untested DR plan is a failed DR plan.

🔀

Blue-Green & Canary Deployments

RELEASE

Blue-Green Deployment

SetupTwo identical environments (blue = current, green = new)

SwitchChange load balancer/router to point to green

RollbackSwitch traffic back to blue instantly

Cost2x infrastructure required

DatabaseBackward-compatible changes or dual-write

ToolsAWS Route 53, Flagger, Argo Rollouts, Cloudflare

Canary Deployment

SetupRoll out new version to small % of traffic

IncrementGradually increase: 1% → 5% → 25% → 50% → 100%

MonitorWatch error rate, latency, business metrics at each step

RollbackRedirect traffic back to stable version

ToolsFlagger, Argo Rollouts, Istio, LaunchDarkly

rollout.yaml

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: web-app
spec:
  replicas: 10
  strategy:
    canary:
      steps:
        - setWeight: 5
        - pause: { duration: 5m }
        - setWeight: 25
        - pause: { duration: 10m }
        - setWeight: 50
        - pause: { duration: 10m }
        - setWeight: 100
      canaryService: web-app-canary
      stableService: web-app-stable
      analysis:
        templates:
          - templateName: success-rate
        startingStep: 2
        args:
          - name: service-name
            value: web-app-canary

Feature Flag Platforms

Platform	Hosting	Key Features
LaunchDarkly	SaaS	Enterprise-grade, targeting, experimentation, gradual rollout
Unleash	Self-hosted / SaaS	Open-source, SDKs for all languages, gradual rollout
Flipt	Self-hosted	Open-source, Git-backed, declarative config
Statsig	SaaS	Feature flags + experimentation + A/B testing
PostHog	SaaS / Self-hosted	Feature flags + product analytics

Progressive Delivery Tools

Tool	What It Does	Integrates With
Argo Rollouts	Canary/blue-green for K8s	Istio, Nginx, ALB, SMI
Flagger	Canary + A/B + blue-green	Istio, Linkerd, App Mesh, Nginx
Traefik Mesh	Service mesh with traffic mgmt	K8s native, mTLS, traffic splitting

💡

Combine canary with feature flags for maximum safety. Deploy the new code behind a flag (dark launch). Enable for internal users first, then 1% of real traffic. Monitor metrics at each step. If anything degrades, flip the flag off — no rollback needed.

✅

Deployment Checklists

CHECKLISTS

Pre-Deployment Checklist

#	Check	Owner
1	All tests pass (unit, integration, E2E)	CI/CD
2	Security scan clean (no critical CVEs)	Security
3	Database migration tested on staging	DBA / Backend
4	Backward-compatible API changes verified	Backend
5	Environment variables configured	DevOps
6	Rollback procedure documented and tested	DevOps
7	Monitoring dashboards and alerts in place	SRE
8	Change request approved (if required)	Manager

Post-Deployment Verification

#	Verification	Method
1	Health endpoint returns 200	curl / Synthetic monitor
2	Error rate is within normal range	Grafana / Datadog
3	Latency p99 is acceptable	APM / traces
4	Key user flows work (smoke tests)	Automated E2E
5	Database queries performing well	DB monitoring
6	No new error spikes in logs	Log aggregation
7	Feature flags enabled as planned	Feature flag dashboard
8	Communicate deploy to stakeholders	Slack / Status page

Rollback Checklist

#	Step
1	Identify the previous stable version / image tag
2	Run rollback command (kubectl rollout undo, etc.)
3	Verify old version is running (health checks)
4	Run database migration rollback (if needed)
5	Check error rates and latency return to normal
6	Communicate rollback and RCA timeline
7	Create incident ticket and schedule post-mortem

Database Migration Checklist

#	Rule
1	Always write a down migration alongside every up migration
2	Add columns first (deploy), remove columns second (next release)
3	Never rename columns — add new + deprecate old
4	Test migrations on a production data copy
5	Lock tables only if necessary and keep it brief
6	Use a migration lock to prevent concurrent execution
7	Verify data integrity after migration with automated checks

⚠️

Automate your checklists. Convert manual checks into CI gates. Use deployment protection rules (Vercel, GitHub Environments) to enforce approvals. A checklist nobody follows is worthless — make it part of the pipeline.

⚖️

Deployment Comparison Matrix

DECISION

Deployment Approach Comparison

Criteria	Docker	Kubernetes	Serverless	PaaS	Bare Metal
Setup Complexity	Low	High	Low	Very Low	Medium
Ops Overhead	Medium	High	Very Low	Low	High
Cost (small)	Low	Medium	Free tier	Low	Fixed
Cost (large scale)	Medium	Optimized	Can spike	Premium	Most cost-effective
Scalability	Manual	Auto (HPA/CA)	Instant auto	Auto (limited)	Manual
Latency	Low	Low	Cold start risk	Low	Lowest
Control	High	Full	Very Limited	Limited	Full
Vendor Lock-in	Low	Low	High	Medium	None
Best Team Size	1-5	5+	1-3	1-5	3+ (with ops)
Good For	Simple services	Microservices	APIs, events	Startups, MVPs	Regulated, legacy

Decision Flowchart

Static frontend only?→ Vercel / Netlify / Cloudflare Pages

Simple API, low traffic?→ Serverless (Lambda / Cloud Functions)

Need fast dev, small team?→ PaaS (Railway / Render / Fly.io)

One or two containers?→ Docker Compose on a VM

Microservices, 5+ services?→ Kubernetes (EKS / GKE / AKS)

Compliance / air-gapped?→ Bare Metal / VMs with Ansible

Edge/low latency globally?→ Cloudflare Workers / Deno Deploy

Mixed workloads?→ K8s + Serverless + CDN (hybrid)

Real-World Examples

Company	Approach	Why
Stripe	Kubernetes + Edge	Microservices at scale, global payments
Vercel	Edge Functions	Frontend SSR at the edge, globally
Linear	Serverless + Edge	Real-time app, minimal ops team
GitLab	Kubernetes (GKE)	Self-hosted by customers, complex workloads
Basecamp	Bare Metal	Full control, predictable costs, 20+ years
Notion	Kubernetes	Complex collaboration, high availability

💡

Start simple, evolve as needed. Most projects begin on PaaS or serverless. Migrate to containers when you need custom networking, GPU access, or cost optimization. Move to K8s only when you have enough services and team size to justify the complexity. Premature Kubernetes is a leading cause of DevOps burnout.

⏳

Loading cheatsheet...