Troubleshooting Kubernetes Networks in Cilium CNI Environments

Understanding eBPF-based packet processing and effective monitoring strategies in mixed Cilium + IPVS environments

Troubleshooting Kubernetes Networks in Cilium CNI Environments



Overview

Cilium is a next-generation CNI plugin based on eBPF that can be used alongside traditional kube-proxy or completely replace it. This guide covers the unique characteristics and troubleshooting methods for mixed Cilium + kube-proxy environments.

The lab exercises in this guide use the same environment setup from the Complete Kubernetes Internal Network Analysis: Packet Flow Tracing in IPVS Mode article.


Environment Characteristics

Before diving into troubleshooting, it’s essential to understand your current environment configuration.


Lab Environment Overview

network-test-info.sh

#!/bin/bash

NAMESPACE="network-test"

echo "======================================"
echo "  Kubernetes Network Test Environment"
echo "======================================"
echo ""

# Service IP
SERVICE_IP=$(kubectl get svc nginx-service -n $NAMESPACE -o jsonpath='{.spec.clusterIP}')
echo "Service ClusterIP: $SERVICE_IP"
echo ""

# Client Pod IP
CLIENT_IP=$(kubectl get pod client-pod -n $NAMESPACE -o jsonpath='{.status.podIP}' 2>/dev/null)
CLIENT_NODE=$(kubectl get pod client-pod -n $NAMESPACE -o jsonpath='{.spec.nodeName}' 2>/dev/null)
echo "Client Pod:"
echo "  IP: $CLIENT_IP"
echo "  Node: $CLIENT_NODE"
echo ""

# Backend Pod IPs
echo "Backend Pods:"
kubectl get pods -n $NAMESPACE -l app=nginx -o custom-columns=NAME:.metadata.name,IP:.status.podIP,NODE:.spec.nodeName --no-headers | while read name ip node; do
    echo "  $name"
    echo "    IP: $ip"
    echo "    Node: $node"
done
echo ""

# Endpoints
echo "Endpoints:"
ENDPOINTS=$(kubectl get endpoints nginx-service -n $NAMESPACE -o jsonpath='{.subsets[0].addresses[*].ip}' 2>/dev/null)
echo "  $ENDPOINTS"
echo ""

# 첫 번째 백엔드 IP
BACKEND_IP=$(echo $ENDPOINTS | awk '{print $1}')

echo "======================================"
echo "  Useful Commands"
echo "======================================"
echo ""
echo "# Export variables:"
echo "export SERVICE_IP=$SERVICE_IP"
echo "export CLIENT_IP=$CLIENT_IP"
echo "export BACKEND_IP=$BACKEND_IP"
echo ""
echo "# IPVS rules on $CLIENT_NODE:"
echo "ssh $CLIENT_NODE 'sudo ipvsadm -Ln | grep -A 5 $SERVICE_IP'"
echo ""
echo "# tcpdump on client pod:"
echo "kubectl exec -it client-pod -n $NAMESPACE -- tcpdump -i any -nn \"host $SERVICE_IP or host $BACKEND_IP\" -w /tmp/client.pcap"
echo ""
echo "# tcpdump on node:"
echo "ssh $CLIENT_NODE 'sudo tcpdump -i any -nn \"host $SERVICE_IP or host $BACKEND_IP\" -w /tmp/node.pcap'"
echo ""


./network-test-info.sh 
======================================
  Kubernetes Network Test Environment
======================================

Service ClusterIP: 10.233.9.232

Client Pod:
  IP: 10.233.66.94
  Node: k8s-compute-02

Backend Pods:
  nginx-54c98b4f84-ftvlg
    IP: 10.233.66.111
    Node: k8s-compute-02
  nginx-54c98b4f84-tgmgh
    IP: 10.233.65.212
    Node: k8s-compute-01
  nginx-54c98b4f84-vzgjc
    IP: 10.233.67.86
    Node: k8s-compute-03

Endpoints:
  10.233.65.212 10.233.66.111 10.233.67.86

======================================
  Useful Commands
======================================

# Export variables:
export SERVICE_IP=10.233.9.232
export CLIENT_IP=10.233.66.94
export BACKEND_IP=10.233.65.212

# IPVS rules on k8s-compute-02:
ssh k8s-compute-02 'sudo ipvsadm -Ln | grep -A 5 10.233.9.232'

# tcpdump on client pod:
kubectl exec -it client-pod -n network-test -- tcpdump -i any -nn "host 10.233.9.232 or host 10.233.65.212" -w /tmp/client.pcap

# tcpdump on node:
ssh k8s-compute-02 'sudo tcpdump -i any -nn "host 10.233.9.232 or host 10.233.65.212" -w /tmp/node.pcap'


Step 1: Verify Current Environment

# 1. Check kube-proxy status
kubectl get pods -n kube-system | grep kube-proxy

# 2. Check kube-proxy mode
kubectl get cm kube-proxy -n kube-system -o yaml | grep mode

# 3. Check Cilium KubeProxyReplacement setting
kubectl -n kube-system exec -it ds/cilium -- cilium status | grep KubeProxyReplacement
# Expected output
kube-proxy-6lw4l         1/1     Running   1 (70d ago)   103d
mode: ipvs
KubeProxyReplacement:    False


Step 2: Environment Type Classification

kube-proxy Mode KubeProxyReplacement Environment Type
✅ Running ipvs False Cilium + IPVS Mixed
✅ Running iptables False Cilium + iptables Mixed
❌ Not Running - True Cilium Full Replacement
✅ Running ipvs - Pure IPVS


Primary Symptom: IPVS Statistics Show Zero


Symptom Verification

sudo ipvsadm -Ln --stats | grep -A 5 "10.233.9.232"
TCP  10.233.9.232:80 rr
  -> 10.233.65.212:80             Masq    0      0          0
  -> 10.233.66.111:80             Masq    0      0          0
  -> 10.233.67.86:80              Masq    0      0          0

All statistics show 0! However, curl http://nginx-service works perfectly fine.


Root Cause

Cilium eBPF intercepts packets at the Socket-level before they reach IPVS.

graph TB subgraph "Packet Processing Flow" App["Application (curl)"] eBPF["Cilium eBPF Hook
(Socket-level)"] Socket["Socket Layer"] Transport["Transport (TCP)"] Network["Network (IP)"] IPVS["IPVS/Netfilter
(Rules exist but no packets)"] DataLink["Data Link Layer"] App -->|syscall| eBPF eBPF -->|"Intercepted here!"| Socket Socket --> Transport Transport --> Network Network -.->|"Packets never reach"| IPVS Network --> DataLink end


Consequences


Cilium Environment Monitoring Methods


Tool Effectiveness Summary

Method Works? Recommended Notes
cilium bpf lb list ✅ Perfect ⭐⭐⭐⭐⭐ Most reliable method!
tcpdump ✅ Perfect ⭐⭐⭐⭐⭐ Packet-level analysis
cilium monitor ⚠️ Partial ⭐⭐ L7 not visible
conntrack -L ❌ No Bypassed by eBPF
cilium service list ❌ Empty Disabled in mixed mode
ipvsadm --stats ❌ Always 0 Reference only


Method 1: eBPF Load Balancing Map (★★★★★)

kubectl -n kube-system exec -it ds/cilium -- cilium bpf lb list | grep 10.233.9.232
10.233.9.232:80/TCP (1)      10.233.67.86:80/TCP (79) (1)                    
10.233.9.232:80/TCP (3)      10.233.65.212:80/TCP (79) (3)                   
10.233.9.232:80/TCP (2)      10.233.66.111:80/TCP (79) (2)                   
10.233.9.232:80/TCP (0)      0.0.0.0:0 (79) (0) [ClusterIP, non-routable]

Interpretation:


Method 2: Cilium Service List (★)

This does not work in mixed environments.

kubectl -n kube-system exec -it ds/cilium -- cilium service list | grep nginx
# No output

Reason:

Use cilium bpf lb list instead! ← This queries the actual eBPF dataplane.


Method 3: conntrack Real-Time Monitoring (★)

This does not work in Cilium eBPF environments.

# Attempt 1
sudo conntrack -L -p tcp --dport 80 | grep 10.233.9.232
conntrack v1.4.8 (conntrack-tools): 6 flow entries have been shown.
# No Service IP related connections!
# Attempt 2 (without port filter)
sudo conntrack -L | grep 10.233.9.232
conntrack v1.4.8 (conntrack-tools): 888 flow entries have been shown.
# Still no Service IP related connections!

Cause:


Method 4: tcpdump (★★★★★)

The most reliable tool that always works!

# All traffic related to Service IP
sudo tcpdump -i any -nn host 10.233.9.232

Terminal 1:

ssh k8s-compute-02
sudo tcpdump -i any -nn "host 10.233.9.232 or port 80"

Terminal 2:

kubectl exec -it client-pod -n network-test -- bash -c "for i in {1..5}; do echo Request $i; curl -s http://nginx-service; sleep 1; done"

Terminal 1 Output:

12:15:36.180603 In  10.233.66.94.34376 > 10.233.9.232.80: Flags [S], seq 1234567890
12:15:36.180621 Out 10.233.66.94.34376 > 10.233.65.212.80: Flags [S], seq 1234567890
12:15:36.180869 P   10.233.65.212.80 > 10.233.66.94.34376: Flags [S.], seq 9876543210
12:15:36.180901 Out 10.233.9.232.80 > 10.233.66.94.34376: Flags [S.], seq 9876543210

You can observe both pre and post DNAT/SNAT packets!


Method 5: Cilium Monitor (★★)

Works with limitations.

kubectl -n kube-system exec -it ds/cilium -- cilium monitor

Results:

# Attempt L7 traffic (may not work)
kubectl -n kube-system exec -it ds/cilium -- cilium monitor --type l7
Press Ctrl-C to quit
# No nginx related traffic
# Only other application traffic visible

Cause:


Practical Troubleshooting Scenarios


Scenario 1: Service Responds Normally but IPVS Statistics Are Zero

Symptom:

# curl works fine
kubectl exec -it client-pod -n network-test -- curl http://nginx-service
Backend POD-1: nginx-54c98b4f84-ftvlg

# IPVS statistics are 0
sudo ipvsadm -Ln --stats | grep 10.233.9.232
TCP  10.233.9.232:80 rr
  -> 10.233.65.212:80             Masq    0      0          0

Diagnosis:

# 1. Check Cilium eBPF map (most important!)
kubectl -n kube-system exec -it ds/cilium -- cilium bpf lb list | grep 10.233.9.232

# 2. Verify actual packets with tcpdump
sudo tcpdump -i any -nn host 10.233.9.232 -c 10

Conclusion: This is normal behavior in Cilium mixed environments. IPVS statistics should be ignored.


Scenario 2: Connections Not Visible in conntrack

Symptom:

sudo conntrack -L | grep 10.233.9.232
conntrack v1.4.8 (conntrack-tools): 888 flow entries have been shown.
# Nothing found

Cause:

Solution:

# Use tcpdump instead of conntrack
sudo tcpdump -i any -nn "host 10.233.9.232 or port 80"

# Or check Cilium eBPF map
kubectl -n kube-system exec -it ds/cilium -- cilium bpf lb list | grep 10.233.9.232

Conclusion: It’s normal if conntrack shows nothing. Use tcpdump instead.


Scenario 3: Traffic Not Visible in Cilium Monitor

Symptom:

kubectl -n kube-system exec -it ds/cilium -- cilium monitor --type l7
# No nginx traffic

Solution:

# Check regular monitor
kubectl -n kube-system exec -it ds/cilium -- cilium monitor

# Or use tcpdump
sudo tcpdump -i any -nn port 80

Conclusion: It’s normal if L7 monitor shows nothing.


Scenario 4: Service Connection Failure

Symptom:

kubectl exec -it client-pod -n network-test -- curl http://nginx-service
curl: (7) Failed to connect to nginx-service port 80: Connection timeout

Diagnostic Steps:

# 1. Verify Service exists
kubectl get svc nginx-service -n network-test

# 2. Check Endpoints
kubectl get endpointslices -n network-test

# 3. Check Cilium eBPF map (most important!)
kubectl -n kube-system exec -it ds/cilium -- cilium bpf lb list | grep 10.233.9.232

# 4. Check IPVS rules (reference only)
sudo ipvsadm -Ln | grep 10.233.9.232

# 5. Test direct Pod access
BACKEND_IP=$(kubectl get pods -n network-test -l app=nginx -o jsonpath='{.items[0].status.podIP}')
kubectl exec -it client-pod -n network-test -- curl http://$BACKEND_IP

# 6. Check packets with tcpdump
sudo tcpdump -i any -nn host 10.233.9.232

Possible Causes and Solutions:

Symptom Cause Solution
Service not found Not created Re-run kubectl expose
No Endpoints Pods not running Check Pod status
eBPF map empty Cilium agent issue Restart Cilium
Direct access fails Network Policy Review policies
No packets visible CNI issue Check CNI logs

Cilium Agent Restart:

kubectl rollout restart ds/cilium -n kube-system
kubectl rollout status ds/cilium -n kube-system


Packet Flow Comparison


Pure IPVS Environment

graph TB subgraph "Pure IPVS Packet Flow" Client["Client Pod
10.233.66.94"] NetStack1["Network Stack"] Netfilter["Netfilter
PREROUTING"] IPVS["IPVS
DNAT performed"] PodNet1["Pod Network"] Backend1["Backend Pod
10.233.65.212"] Client -->|"10.233.66.94 → 10.233.9.232:80"| NetStack1 NetStack1 --> Netfilter Netfilter --> IPVS IPVS -->|"10.233.66.94 → 10.233.65.212:80"| PodNet1 PodNet1 --> Backend1 end


Cilium Mixed Environment (Current Setup)

graph TB subgraph "Cilium Mixed Packet Flow" Client2["Client Pod
10.233.66.94"] eBPF["Cilium eBPF Hook
DNAT at Socket-level"] Socket["Socket Layer"] NetStack2["Network Stack"] IPVS2["Netfilter/IPVS
(Rules exist, no packets)"] PodNet2["Pod Network"] Backend2["Backend Pod
10.233.65.212"] Client2 -->|"curl syscall"| eBPF eBPF -->|"10.233.66.94 → 10.233.65.212:80"| Socket Socket --> NetStack2 NetStack2 -.->|"Packets never reach"| IPVS2 NetStack2 --> PodNet2 PodNet2 --> Backend2 end


Key Differences

Aspect IPVS Environment Cilium Mixed Environment
Processing Location Netfilter (PREROUTING) Socket-level (syscall)
Processing Timing Mid network stack Immediately after syscall
conntrack Recorded Bypassed
IPVS Involvement Actual processing Rules only
Performance Very fast Faster (syscall bypass)
Monitoring ipvsadm, conntrack tcpdump, cilium bpf


Useful Command Reference


Cilium eBPF Map Commands (Most Important!)


tcpdump Commands

# All traffic related to Service IP
sudo tcpdump -i any -nn host 10.233.9.232

# Between Client and Service
sudo tcpdump -i any -nn "host 10.233.66.94 and host 10.233.9.232"

# HTTP traffic only
sudo tcpdump -i any -nn port 80

# Save to file for later analysis
sudo tcpdump -i any -nn host 10.233.9.232 -w /tmp/capture.pcap
tcpdump -r /tmp/capture.pcap -nn -v


Cilium Status Commands

# Full status
kubectl -n kube-system exec -it ds/cilium -- cilium status

# Check KubeProxyReplacement
kubectl -n kube-system exec -it ds/cilium -- cilium status | grep KubeProxyReplacement

# Cilium version
kubectl -n kube-system exec -it ds/cilium -- cilium version


Cilium Endpoint Commands

# Endpoint list
kubectl -n kube-system exec -it ds/cilium -- cilium endpoint list

# Search by specific IP
kubectl -n kube-system exec -it ds/cilium -- cilium endpoint list | grep 10.233.66.111


Cilium Monitor Commands

# All events
kubectl -n kube-system exec -it ds/cilium -- cilium monitor

# Drop events
kubectl -n kube-system exec -it ds/cilium -- cilium monitor --type drop


IPVS Commands (Reference Only)

# Check rules
sudo ipvsadm -Ln | grep -A 5 "10.233.9.232"

# Statistics (always 0)
sudo ipvsadm -Ln --stats


conntrack Commands (Doesn’t Work)

# Can try but Service IP won't appear
sudo conntrack -L | grep 10.233.9.232


Environment-Specific Monitoring Strategies


Cilium + IPVS Mixed Environment (KubeProxyReplacement: False)

In this environment, Cilium eBPF processes packets at the Socket-level first, so traditional IPVS/conntrack-based monitoring tools don’t work properly.

Priority 1: tcpdump

# Most reliable method - always works
sudo tcpdump -i any -nn "host $SERVICE_IP or port 80"

Priority 2: Cilium eBPF Map

# Check Service load balancing status
kubectl -n kube-system exec -it ds/cilium -- cilium bpf lb list | grep $SERVICE_IP

Priority 3: Kubernetes Resources

# Check backend status via EndpointSlice
kubectl get endpointslices -n $NAMESPACE -o wide

Tools to Avoid:


Cilium Full Replacement Environment (KubeProxyReplacement: True)

An environment where kube-proxy doesn’t exist and Cilium handles all Service processing.

Priority 1: Cilium Service List

# Only works in full replacement environment
kubectl -n kube-system exec -it ds/cilium -- cilium service list

Priority 2: Cilium eBPF Map

kubectl -n kube-system exec -it ds/cilium -- cilium bpf lb list

Priority 3: Cilium Monitor

# Detailed info available when L7 visibility is enabled
kubectl -n kube-system exec -it ds/cilium -- cilium monitor --type l7


Mixed vs Full Replacement Comparison

Aspect Mixed Environment Full Replacement
kube-proxy Running Not present
KubeProxyReplacement False True
Service Management kube-proxy Cilium
Packet Processing Cilium eBPF Cilium eBPF
IPVS Rules Exist but unused None
cilium service list Empty Works
cilium bpf lb list Works Works
Resource Usage High (duplicate) Low
Complexity High Low


Pros and Cons of Mixed Environment

Advantages:

Disadvantages:


Consider Cilium full replacement if you meet these conditions:


Migration Considerations


Transitioning from Mixed to Full Replacement

Pre-flight Checks:

# Check Cilium version
kubectl -n kube-system exec -it ds/cilium -- cilium version

# Check current configuration
helm get values cilium -n kube-system

# Check cluster status
kubectl -n kube-system exec -it ds/cilium -- cilium status --verbose

Migration Considerations:

Post-Migration Verification:

# Verify kube-proxy removal
kubectl get pods -n kube-system | grep kube-proxy

# Verify Cilium full replacement
kubectl -n kube-system exec -it ds/cilium -- cilium status | grep KubeProxyReplacement

# Check Service list (should work now)
kubectl -n kube-system exec -it ds/cilium -- cilium service list


Lab Environment Cleanup

After completing the lab, clean up the created resources:

# Delete test namespace
kubectl delete namespace network-test

# Clean up local files
rm -f client.pcap node.pcap backend.pcap network-test-info.sh

# Clean up capture files on node
ssh k8s-compute-02 'sudo rm -f /tmp/*.pcap'

Verify Deletion:

kubectl get all -n network-test
# Output: No resources found in network-test namespace.


Conclusion


Key Takeaways

Understanding the unique characteristics of Cilium mixed environments is essential for effective troubleshooting.

The key is knowing which tools work and which don't in your specific environment.


Cilium + IPVS Mixed Environment Characteristics

Monitoring Strategy

Troubleshooting Approach

Recommendations



References