Kubernetes Deployment Strategies

Choose the right approach for updating your applications with zero to minimal downtime

Featured image



Overview

Kubernetes deployment strategies define how to update Pods or sets of Pods when deploying new application versions or container images. The right deployment strategy helps maintain system stability, minimize downtime, and ensure a smooth transition between versions.

Deployment Strategy Considerations

When choosing a deployment strategy, consider these factors:

  • Acceptable downtime window
  • Application architecture compatibility
  • Available resources
  • Risk tolerance
  • User impact
  • Rollback requirements
graph TD A[Application Update Needed] --> B{Downtime Acceptable?} B -->|Yes| C[Recreate Strategy] B -->|No| D{Testing with Real Users?} D -->|Yes| E[Canary Deployment] D -->|No| F{Need Instant Rollback?} F -->|Yes| G[Blue/Green Deployment] F -->|No| H[Rolling Update Deployment] style C fill:#f9f,stroke:#333,stroke-width:2px style E fill:#bbf,stroke:#333,stroke-width:2px style G fill:#ffa,stroke:#333,stroke-width:2px style H fill:#bfb,stroke:#333,stroke-width:2px


Basic Deployment Strategies

1. RollingUpdate (Default)

The RollingUpdate strategy gradually replaces instances of the previous version with instances of the new version, one by one.

sequenceDiagram participant User participant Svc as Service participant Old as Old Pods participant New as New Pods Note over Old: 3 pods running v1 User->>Svc: Traffic routed to old pods Note over Old,New: Deployment update triggered Svc->>Old: Traffic continues Note over New: 1 new pod created Note over Old: 1 old pod terminated Svc->>Old: Traffic to remaining v1 pods Svc->>New: Traffic to new v2 pod Note over New: Another new pod created Note over Old: Another old pod terminated Svc->>Old: Traffic to remaining v1 pod Svc->>New: Traffic to v2 pods Note over New: Final new pod created Note over Old: Final old pod terminated Svc->>New: All traffic to v2 pods
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Maximum number of pods that can be unavailable during update
      maxSurge: 1        # Maximum number of pods that can be created above desired number
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:v2
        ports:
        - containerPort: 8080

Key Parameters:

Benefits:

Limitations:


2. Recreate Strategy

The Recreate strategy terminates all existing pods before creating new ones.

sequenceDiagram participant User participant Svc as Service participant Old as Old Pods (v1) participant New as New Pods (v2) Note over Old: 3 pods running v1 User->>Svc: Traffic routed to old pods Svc->>Old: Requests served Note over Old,New: Deployment update triggered Note over Old: All v1 pods terminated User->>Svc: Requests fail (downtime) Note over New: All v2 pods created User->>Svc: Traffic resumes Svc->>New: Requests served by v2
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:v2
        ports:
        - containerPort: 8080

Benefits:

Limitations:


Strategy Comparison

Strategy Downtime Resource Usage Complexity Rollback Speed Best For
RollingUpdate Minimal Normal Low Gradual General-purpose applications with backward compatibility
Recreate Yes Minimal Very Low Immediate Development environments, major version changes
Blue/Green None Double High Immediate Mission-critical applications requiring zero downtime
Canary None Variable High Simple Risk-sensitive applications needing gradual validation


Advanced Deployment Strategies

1. Blue/Green Deployment

Blue/Green deployment maintains two identical environments, with only one serving production traffic at a time.

graph TD A[User Traffic] --> B[Service] subgraph "Before Switch" B -->|Active| C[Blue Environment
Version 1] D[Green Environment
Version 2] end subgraph "After Switch" E[Service] -->|Traffic Switched| G[Green Environment
Version 2] F[Blue Environment
Version 1] end style C fill:#1E88E5,stroke:#0D47A1,color:#FFF style G fill:#43A047,stroke:#2E7D32,color:#FFF

Implementation in Kubernetes:

# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: blue
  template:
    metadata:
      labels:
        app: my-app
        version: blue
    spec:
      containers:
      - name: my-app
        image: my-app:v1
        ports:
        - containerPort: 8080
---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
      version: green
  template:
    metadata:
      labels:
        app: my-app
        version: green
    spec:
      containers:
      - name: my-app
        image: my-app:v2
        ports:
        - containerPort: 8080
---
# Service routing to blue (switch to green during deployment)
apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  selector:
    app: my-app
    version: blue  # Switch between blue and green
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

Switching Between Environments:

# Update service to point to green environment
kubectl patch service my-app -p '{"spec":{"selector":{"version":"green"}}}'

# To rollback, point back to blue environment
kubectl patch service my-app -p '{"spec":{"selector":{"version":"blue"}}}'

Benefits:

Limitations:


2. Canary Deployment

Canary deployment routes a small percentage of traffic to the new version for testing before full rollout.

graph LR A[User Traffic] --> B[Load Balancer / Ingress] B --> |90%| C[Stable Version
v1] B --> |10%| D[Canary Version
v2] style C fill:#1E88E5,stroke:#0D47A1,color:#FFF style D fill:#FB8C00,stroke:#EF6C00,color:#FFF

Implementation Options:

  1. Using Kubernetes Services and multiple deployments:
# Stable deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-stable
spec:
  replicas: 9  # 90% of traffic
  selector:
    matchLabels:
      app: my-app
      track: stable
  template:
    metadata:
      labels:
        app: my-app
        track: stable
    spec:
      containers:
      - name: my-app
        image: my-app:v1
---
# Canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-canary
spec:
  replicas: 1  # 10% of traffic
  selector:
    matchLabels:
      app: my-app
      track: canary
  template:
    metadata:
      labels:
        app: my-app
        track: canary
    spec:
      containers:
      - name: my-app
        image: my-app:v2
---
# Service routing to both deployments
apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  selector:
    app: my-app  # Selects both stable and canary pods
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  1. Using Ingress for HTTP traffic splitting:
# Ingress with canary annotations (NGINX ingress controller)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-canary
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "20"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app-canary
            port:
              number: 80
  1. Using a service mesh like Istio:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-app
spec:
  hosts:
  - myapp.example.com
  http:
  - route:
    - destination:
        host: my-app-stable
        subset: v1
      weight: 80
    - destination:
        host: my-app-canary
        subset: v2
      weight: 20

Canary Implementation Approaches:

Method Pros Cons
Pod ratio adjustment - Simple to implement
- Works with any application
- Less precise traffic control
- Limited by pod granularity
Ingress controllers - More precise percentages
- HTTP header-based routing
- Only works for HTTP/HTTPS
- Requires specific ingress controller
Service mesh (Istio) - Fine-grained traffic control
- Advanced metrics and tracing
- Multiple routing criteria
- Complex to set up
- Requires service mesh installation
- Overhead in performance

Benefits:

Limitations:


Deployment Health Checks

Kubernetes uses probes to determine the health and readiness of your applications, which are critical for successful deployments.

1. Readiness Probe

Indicates when a pod is ready to accept traffic. During rolling updates, new pods won’t receive traffic until they pass readiness checks.

spec:
  containers:
  - name: my-app
    readinessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 3

Available Probe Types:

2. Liveness Probe

Detects when a pod enters an unhealthy state and needs to be restarted.

spec:
  containers:
  - name: my-app
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 3

3. Startup Probe

Indicates when an application has started successfully, providing additional time for slow-starting containers.

spec:
  containers:
  - name: my-app
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30
      periodSeconds: 10
Critical Probe Parameters:
  • initialDelaySeconds: Time to wait before first probe
  • periodSeconds: How often to perform the probe
  • timeoutSeconds: Probe timeout period
  • successThreshold: Minimum consecutive successes to consider probe successful
  • failureThreshold: Number of failures before giving up

Incorrect probe configuration can cause unnecessary restarts or prevent successful deployments!


Advanced Configuration and Tools

1. Progressive Delivery with Argo Rollouts

Argo Rollouts extends Kubernetes with advanced deployment strategies.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: my-app-rollout
spec:
  replicas: 5
  strategy:
    canary:
      steps:
      - setWeight: 20
      - pause: {duration: 1h}
      - setWeight: 40
      - pause: {duration: 1h}
      - setWeight: 60
      - pause: {duration: 1h}
      - setWeight: 80
      - pause: {duration: 1h}
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-app
        image: my-app:v2

2. Advanced A/B Testing Configuration

A/B testing allows directing specific users to different versions based on criteria.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-by-header: "User-Country"
    nginx.ingress.kubernetes.io/canary-by-header-value: "US"
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-app-experiment
            port:
              number: 80

3. Deployment Protection with Pod Disruption Budget (PDB)

PDBs ensure minimum available replicas during voluntary disruptions like deployments.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2  # or use maxUnavailable: 1
  selector:
    matchLabels:
      app: my-app


Common Pitfalls and Best Practices

Common Issues to Avoid

Deployment Pitfalls:
  • Resource constraints: Insufficient CPU/memory requests or limits
  • Probe misconfiguration: Aggressive timeouts or inappropriate health checks
  • Image issues: Using 'latest' tag or unreliable registries
  • Database schema changes: Not planning for backward compatibility
  • No rollback plan: Not preserving revision history
  • Insufficient monitoring: Inability to detect deployment issues

Best Practices for Successful Deployments

Deployment Best Practices:
  • Version control: Use specific immutable image tags, never 'latest'
  • Proper health checks: Implement meaningful readiness/liveness probes
  • Resource planning: Set appropriate CPU and memory requests/limits
  • Automated testing: Run thorough tests before deployment
  • Monitoring: Implement metrics for deployment success rates and performance
  • Rollback readiness: Maintain revision history and test rollback procedures
  • Documentation: Document deployment procedures, dependencies, and requirements
  • Small, frequent updates: Prefer smaller, incremental changes over large updates


Monitoring Deployments

Proper monitoring is essential for successful deployments and quick reaction to issues.

Key Metrics to Monitor

Metric Type Examples Why Monitor
Application Performance - Response time
- Error rate
- Throughput
Detect performance regressions with new version
Resource Utilization - CPU usage
- Memory consumption
- Network traffic
Identify resource leaks or inefficiencies
Deployment Status - Pod status
- Rollout progress
- Availability
Track deployment progress and success
Business Metrics - Conversion rates
- User engagement
- Transaction volume
Measure business impact of deployment

Example Prometheus Queries

# Success rate of HTTP requests
sum(rate(http_requests_total{status=~"2.."}[5m])) / sum(rate(http_requests_total[5m]))

# Error rate change after deployment
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

# Response time percentiles
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

# Container restarts during deployment
sum(changes(container_start_time_seconds[1h])) by (pod)

Visualizing Deployments

gantt title Deployment Timeline dateFormat YYYY-MM-DD HH:mm axisFormat %H:%M section Deployment Deployment Starts :a1, 2023-01-01 12:00, 2m First Pods Updated :a2, after a1, 3m 50% Complete :a3, after a2, 5m Deployment Complete :a4, after a3, 2m section Metrics Baseline Metrics :b1, 2023-01-01 11:55, 5m Canary Metrics :b2, 2023-01-01 12:02, 8m Full Deployment Metrics :b3, after a4, 10m section Validation Automated Tests :c1, after a1, 10m Manual Validation :c2, after a4, 15m Decision to Keep/Rollback :milestone, after c2, 0m


Conclusion and Strategy Selection

The choice of deployment strategy depends on your specific requirements, with tradeoffs between simplicity, downtime, resource usage, and risk.

graph TD A[Deployment Strategy Selection] --> B[Application Requirements] A --> C[Infrastructure Capabilities] A --> D[Team Experience] A --> E[Business Constraints] B --> F[Availability Requirements] B --> G[Version Compatibility] B --> H[Resource Requirements] C --> I[Kubernetes Version] C --> J[Available Tools] C --> K[Cluster Capacity] D --> L[Operations Experience] D --> M[Monitoring Capabilities] E --> N[Risk Tolerance] E --> O[User Impact] F & G & H & I & J & K & L & M & N & O --> P[Strategy Decision] style P fill:#f9f,stroke:#333,stroke-width:4px



References