Resolving Kubernetes Pod Restart Errors

How to ensure stable application startup and shutdown during Pod restarts

Featured image

Image Reference Link



Overview

When Pods are restarted in a Kubernetes environment—due to rolling updates or failure recovery—applications may not be ready or may not shut down safely, leading to issues such as:

This article introduces three practical methods to address these common issues in production. The key is to optimize probe settings, leverage lifecycle hooks, and configure termination grace periods.



Method 1: Optimize Probe Settings

When a Pod starts, Kubernetes uses the configured readinessProbe to determine if the application is ready. If the settings are too strict, a Pod that hasn’t fully initialized may be marked as failed or not receive traffic.

livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /health_check
    port: 8889
    scheme: HTTP
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 2

readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /health_check
    port: 8889
    scheme: HTTP
  initialDelaySeconds: 30
  periodSeconds: 10
  successThreshold: 2
  timeoutSeconds: 2

Key strategies:



Method 2: Use Lifecycle Hooks and Termination Grace Period

To ensure in-flight requests are handled safely when a Pod is terminated, configure a preStop hook and set a termination grace period.

spec:
  terminationGracePeriodSeconds: 60  # Total wait time
  containers:
    - name: app
      lifecycle:
        postStart:
          exec:
            command: ["/bin/sh", "-c", "sleep 5"]
        preStop:
          exec:
            command: ["/bin/sh", "-c", "curl -X POST http://localhost:8889/shutdown && sleep 30"]

Benefits:



Method 3: Adjust Probe Timing Based on Actual Startup Time

Setting initialDelaySeconds by guesswork can lead to failures. Instead, measure your application’s real startup time and add a buffer.

# Check Pod conditions
tkubectl get pod [pod-name] -o yaml | grep -A 5 "conditions:"

# Check application startup time from logs
kubectl logs [pod-name]

# Manually test readiness
curl http://[pod-ip]:8889/health_check


If startup takes 10 seconds:

readinessProbe:
  initialDelaySeconds: 15  # Actual time + buffer



Sequence Summary

  1. Pod is created
  2. InitContainer runs (if present)
  3. Waits for initialDelaySeconds
  4. Starts readinessProbe
  5. On success → Pod is Ready → Registered to ELB → Starts receiving traffic
  6. On termination, preStop hook runs → blocks new requests → graceful shutdown during termination grace period



Conclusion

Stable service operation in Kubernetes requires more than just starting Pods. You must consider when your application is truly ready to receive traffic and how to safely handle shutdowns.

The settings for readinessProbe, livenessProbe, lifecycle, and terminationGracePeriodSeconds are basic but powerful tools. With these, you can maintain service continuity during rolling updates without user-facing errors.



References