October 21, 2025 3 min to read

Resolving Kubernetes Pod Restart Errors

How to ensure stable application startup and shutdown during Pod restarts

Overview

When Pods are restarted in a Kubernetes environment—due to rolling updates or failure recovery—applications may not be ready or may not shut down safely, leading to issues such as:

Service response delays or failures
Restart loops caused by readiness/liveness probe failures
Intermittent errors experienced by users

This article introduces three practical methods to address these common issues in production. The key is to optimize probe settings, leverage lifecycle hooks, and configure termination grace periods.

Method 1: Optimize Probe Settings

When a Pod starts, Kubernetes uses the configured readinessProbe to determine if the application is ready. If the settings are too strict, a Pod that hasn’t fully initialized may be marked as failed or not receive traffic.

livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /health_check
    port: 8889
    scheme: HTTP
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 2

readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /health_check
    port: 8889
    scheme: HTTP
  initialDelaySeconds: 30
  periodSeconds: 10
  successThreshold: 2
  timeoutSeconds: 2

Key strategies:

initialDelaySeconds: Allow enough time for the application to fully start (e.g., increase from 1s to 30s)
timeoutSeconds: Add buffer for network latency (e.g., increase from 1s to 2s)
successThreshold: Require more stable state before marking as Ready (e.g., increase from 1 to 2)

Method 2: Use Lifecycle Hooks and Termination Grace Period

To ensure in-flight requests are handled safely when a Pod is terminated, configure a preStop hook and set a termination grace period.

spec:
  terminationGracePeriodSeconds: 60  # Total wait time
  containers:
    - name: app
      lifecycle:
        postStart:
          exec:
            command: ["/bin/sh", "-c", "sleep 5"]
        preStop:
          exec:
            command: ["/bin/sh", "-c", "curl -X POST http://localhost:8889/shutdown && sleep 30"]

Benefits:

Calls shutdown API before termination to block new requests
Allows existing connections to complete before shutdown
Ensures graceful shutdown within terminationGracePeriodSeconds

Method 3: Adjust Probe Timing Based on Actual Startup Time

Setting initialDelaySeconds by guesswork can lead to failures. Instead, measure your application’s real startup time and add a buffer.

# Check Pod conditions
tkubectl get pod [pod-name] -o yaml | grep -A 5 "conditions:"

# Check application startup time from logs
kubectl logs [pod-name]

# Manually test readiness
curl http://[pod-ip]:8889/health_check

If startup takes 10 seconds:

readinessProbe:
  initialDelaySeconds: 15  # Actual time + buffer

Sequence Summary

Pod is created
InitContainer runs (if present)
Waits for initialDelaySeconds
Starts readinessProbe
On success → Pod is Ready → Registered to ELB → Starts receiving traffic
On termination, preStop hook runs → blocks new requests → graceful shutdown during termination grace period

Conclusion

Stable service operation in Kubernetes requires more than just starting Pods. You must consider when your application is truly ready to receive traffic and how to safely handle shutdowns.

The settings for readinessProbe, livenessProbe, lifecycle, and terminationGracePeriodSeconds are basic but powerful tools. With these, you can maintain service continuity during rolling updates without user-facing errors.

somaz v3.1.2

Resolving Kubernetes Pod Restart Errors

Overview

Method 1: Optimize Probe Settings

Key strategies:

Method 2: Use Lifecycle Hooks and Termination Grace Period

Benefits:

Method 3: Adjust Probe Timing Based on Actual Startup Time

Sequence Summary

Conclusion

References

Terraform Fundamentals: Complete Guide to Infrastructure as Code

Somaz

Comments

Resolving Kubernetes Pod Restart Errors

Overview

Method 1: Optimize Probe Settings

Key strategies:

Method 2: Use Lifecycle Hooks and Termination Grace Period

Benefits:

Method 3: Adjust Probe Timing Based on Actual Startup Time

Sequence Summary

Conclusion

References

Terraform Fundamentals: Complete Guide to Infrastructure as Code

Share

Somaz

Comments