3 min to read
Resolving Kubernetes Pod Restart Errors
How to ensure stable application startup and shutdown during Pod restarts

Overview
When Pods are restarted in a Kubernetes environment—due to rolling updates or failure recovery—applications may not be ready or may not shut down safely, leading to issues such as:
- Service response delays or failures
- Restart loops caused by readiness/liveness probe failures
- Intermittent errors experienced by users
This article introduces three practical methods to address these common issues in production. The key is to optimize probe settings, leverage lifecycle hooks, and configure termination grace periods.
Method 1: Optimize Probe Settings
When a Pod starts, Kubernetes uses the configured readinessProbe
to determine if the application is ready. If the settings are too strict, a Pod that hasn’t fully initialized may be marked as failed or not receive traffic.
livenessProbe:
failureThreshold: 3
httpGet:
path: /health_check
port: 8889
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 2
readinessProbe:
failureThreshold: 3
httpGet:
path: /health_check
port: 8889
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 2
timeoutSeconds: 2
Key strategies:
initialDelaySeconds
: Allow enough time for the application to fully start (e.g., increase from 1s to 30s)timeoutSeconds
: Add buffer for network latency (e.g., increase from 1s to 2s)successThreshold
: Require more stable state before marking as Ready (e.g., increase from 1 to 2)
Method 2: Use Lifecycle Hooks and Termination Grace Period
To ensure in-flight requests are handled safely when a Pod is terminated, configure a preStop
hook and set a termination grace period.
spec:
terminationGracePeriodSeconds: 60 # Total wait time
containers:
- name: app
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
preStop:
exec:
command: ["/bin/sh", "-c", "curl -X POST http://localhost:8889/shutdown && sleep 30"]
Benefits:
- Calls shutdown API before termination to block new requests
- Allows existing connections to complete before shutdown
- Ensures graceful shutdown within
terminationGracePeriodSeconds
Method 3: Adjust Probe Timing Based on Actual Startup Time
Setting initialDelaySeconds
by guesswork can lead to failures. Instead, measure your application’s real startup time and add a buffer.
# Check Pod conditions
tkubectl get pod [pod-name] -o yaml | grep -A 5 "conditions:"
# Check application startup time from logs
kubectl logs [pod-name]
# Manually test readiness
curl http://[pod-ip]:8889/health_check
If startup takes 10 seconds:
readinessProbe:
initialDelaySeconds: 15 # Actual time + buffer
Sequence Summary
- Pod is created
- InitContainer runs (if present)
- Waits for
initialDelaySeconds
- Starts readinessProbe
- On success → Pod is Ready → Registered to ELB → Starts receiving traffic
- On termination, preStop hook runs → blocks new requests → graceful shutdown during termination grace period
Conclusion
Stable service operation in Kubernetes requires more than just starting Pods. You must consider when your application is truly ready to receive traffic and how to safely handle shutdowns.
The settings for readinessProbe
, livenessProbe
, lifecycle
, and terminationGracePeriodSeconds
are basic but powerful tools. With these, you can maintain service continuity during rolling updates without user-facing errors.
Comments