IPVS vs iptables in Kubernetes

A comprehensive comparison of kube-proxy modes for service networking

Featured image



Overview

Kubernetes Service networking is implemented through the kube-proxy component, which is responsible for routing traffic to the appropriate backend Pods. This component supports multiple proxy modes, with iptables and IPVS being the most commonly used options. Understanding the differences between these modes is crucial for optimizing your cluster’s networking performance.

What You'll Learn
  • How kube-proxy implements Service networking in Kubernetes
  • The architecture and operation of iptables and IPVS proxy modes
  • Performance characteristics and scalability considerations
  • How to configure and switch between proxy modes
  • Best practices for production environments
graph TD A[Client] --> B[Service VIP] B --> C{kube-proxy} C -->|iptables mode| D[iptables rules] C -->|IPVS mode| E[IPVS virtual servers] D --> F[Pod 1] D --> G[Pod 2] D --> H[Pod 3] E --> F E --> G E --> H style A fill:#f9f9f9,stroke:#333,stroke-width:2px style B fill:#bbdefb,stroke:#333,stroke-width:2px style C fill:#ffcc80,stroke:#333,stroke-width:2px style D fill:#c8e6c9,stroke:#333,stroke-width:2px style E fill:#e1bee7,stroke:#333,stroke-width:2px style F fill:#dcedc8,stroke:#333,stroke-width:2px style G fill:#dcedc8,stroke:#333,stroke-width:2px style H fill:#dcedc8,stroke:#333,stroke-width:2px


The Role of kube-proxy in Kubernetes

kube-proxy is a network proxy that runs on each node in your Kubernetes cluster, implementing part of the Kubernetes Service concept. Its primary responsibility is to:

  1. Watch the Kubernetes API server for changes to Service and Endpoint objects
  2. Maintain network rules that allow communication to Pods via Kubernetes Services
  3. Perform connection forwarding or load balancing across a set of backend Pods
Evolution of kube-proxy Modes

Kubernetes has evolved through several proxy implementations:

  • userspace (original mode): Simple but inefficient
  • iptables (default since Kubernetes 1.2): Better performance
  • IPVS (added in Kubernetes 1.8, stable in 1.11): Designed for high throughput


Concepts of IPVS and iptables

1️⃣ iptables Mode

iptables is a packet filtering and NAT (Network Address Translation) framework built into the Linux kernel. It defines rules that packets must pass through and decides whether to allow, deny, forward, or modify network traffic.

sequenceDiagram participant C as Client participant S as Service (ClusterIP) participant I as iptables Rules participant P as Pods C->>S: Request to Service VIP S->>I: Packet hits PREROUTING chain Note over I: iptables evaluates chains sequentially I->>I: Match ClusterIP and port I->>I: Apply probability-based
load balancing I->>P: DNAT to selected Pod IP:Port P->>C: Response (SNAT on return)

How iptables mode works in Kubernetes:

  1. kube-proxy watches for Service and Endpoint changes
  2. For each Service, it creates iptables rules in multiple chains:
    • KUBE-SERVICES: Entry point for service processing
    • KUBE-SVC-XXX: Chain for each service with probability-based rule selection
    • KUBE-SEP-XXX: Chain for each service endpoint with DNAT rules
# Example iptables rules for a Service with ClusterIP 10.96.0.1 and port 80
# Main chain entry
iptables -t nat -A PREROUTING -j KUBE-SERVICES
iptables -t nat -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m tcp --dport 80 -j KUBE-SVC-XYZABC

# Load balancing rules (3 endpoints with equal probability)
iptables -t nat -A KUBE-SVC-XYZABC -m statistic --mode random --probability 0.33333 -j KUBE-SEP-1
iptables -t nat -A KUBE-SVC-XYZABC -m statistic --mode random --probability 0.50000 -j KUBE-SEP-2
iptables -t nat -A KUBE-SVC-XYZABC -j KUBE-SEP-3

# Endpoint translation rules
iptables -t nat -A KUBE-SEP-1 -p tcp -j DNAT --to-destination 10.244.1.2:8080
iptables -t nat -A KUBE-SEP-2 -p tcp -j DNAT --to-destination 10.244.2.3:8080
iptables -t nat -A KUBE-SEP-3 -p tcp -j DNAT --to-destination 10.244.3.4:8080

Advantages of iptables mode:

Stability: Mature and widely used on Linux systems since the early 2000s
Simplicity: Easy to configure and troubleshoot with standard Linux tools
Built-in connection tracking: Provides stateful packet filtering without additional configuration
Low resource usage: Efficient in small clusters with few services and endpoints
Reliable failover: Dead endpoints are automatically removed from the rotation

Disadvantages of iptables mode:

O(n) rule processing complexity: Each packet must traverse rules linearly
Performance degradation with scale: Significant slowdown with many services/endpoints
Update latency: Rule changes become expensive with thousands of services
Limited load balancing capabilities: Only random probability-based distribution
Connection reset during updates: Can drop connections when rules change


2️⃣ IP Virtual Server (IPVS) Mode

IPVS is a layer 4 load balancer in the Linux kernel designed specifically for high performance and scalability. Unlike iptables, which processes rules sequentially, IPVS uses a hash table for efficient lookup.

sequenceDiagram participant C as Client participant S as Service (ClusterIP) participant V as IPVS Table participant P as Pods C->>S: Request to Service VIP S->>V: Packet lookup in IPVS table Note over V: Hash-based lookup (O(1)) V->>V: Apply selected scheduling algorithm V->>P: Forward to selected Pod P->>C: Response

How IPVS mode works in Kubernetes:

  1. kube-proxy creates an IPVS virtual server for each Service IP:Port
  2. Each backend Pod is added as a real server to the virtual server
  3. IPVS handles load balancing using its selected scheduling algorithm
  4. kube-proxy still uses some iptables rules for packet filtering, but not for NAT
# Example IPVS configuration for a Service
# View with: ipvsadm -ln

# Virtual Server (Service VIP and port)
-A -t 10.96.0.1:80 -s rr

# Real Servers (Endpoints)
-a -t 10.96.0.1:80 -r 10.244.1.2:8080 -m masq
-a -t 10.96.0.1:80 -r 10.244.2.3:8080 -m masq
-a -t 10.96.0.1:80 -r 10.244.3.4:8080 -m masq

IPVS Supported Load Balancing Algorithms:

Algorithm Description Use Case
rr Round-Robin: Simple rotation through backends General-purpose, similar backends
lc Least Connection: Routes to server with fewest active connections Variable connection duration workloads
dh Destination Hashing: Based on destination IP When specific clients need specific backends
sh Source Hashing: Based on source IP (session affinity) Stateful applications requiring sticky sessions
sed Shortest Expected Delay: Considers both connections and weights Heterogeneous backend capacity
nq Never Queue: Assigns to idle server if available Mixed-load environments

Advantages of IPVS mode:

O(1) lookup complexity: Hash tables provide consistent performance regardless of cluster size
High performance: Optimized for large-scale load balancing with low latency
Multiple load balancing algorithms: Flexible options for different workloads
Connection persistence: Support for maintaining client affinity to backends
Better scalability: Efficiently handles thousands of services with minimal degradation
Lower CPU usage: More efficient at scale compared to iptables

Disadvantages of IPVS mode:

Complexity: Requires additional kernel modules and configuration
Dependency management: Needs conntrack modules properly configured
Limited packet filtering: Still requires some iptables rules for certain functionality
Debugging difficulty: Less familiar to many administrators
Potential kernel compatibility issues: Requires specific kernel modules


Performance Comparison

As Kubernetes clusters grow in size, the difference in performance between iptables and IPVS becomes more pronounced.

xychart-beta title "Service Creation Time vs. Number of Services" x-axis [100, 500, 1000, 5000, 10000] y-axis "Creation Time (sec)" 0 --> 300 line [5, 25, 50, 250, 300] "iptables" line [4, 20, 38, 95, 120] "IPVS"
Real-world Performance Data

In a large-scale Kubernetes environment (5000+ services):

  • iptables mode: Service sync time can exceed 5 minutes
  • IPVS mode: Service sync time typically under 2 minutes
  • CPU usage for kube-proxy can be 3-4x higher in iptables mode at scale
  • Connection throughput in IPVS mode can be up to 5x higher than iptables


iptables vs IPVS: Detailed Comparison

🔑 Feature 📃 iptables 🚀 IPVS
Implementation NAT-based packet filtering framework Layer 4 load balancer using hash tables
Lookup Complexity O(n) - linear with number of rules O(1) - constant time using hash tables
Rule Structure Chains of sequential rules Hash table with virtual/real servers
Load Balancing Methods Random selection with weights Multiple algorithms (rr, lc, dh, sh, sed, nq)
Session Affinity Limited support (ClientIP) Strong support via multiple methods
Rule Updates Slow at scale, can affect traffic Fast updates with minimal disruption
Production Readiness Very stable, extensively tested Stable since Kubernetes 1.11
Memory Usage Lower in small clusters Higher baseline, more efficient at scale
CPU Usage Increases linearly with services Remains relatively constant with scale
Connection Tracking Built-in, simple Relies on conntrack (more complex)
Configuration Complexity Simple and familiar Requires additional modules and tuning
Debugging Tools Widely available (iptables-save) Less common (ipvsadm)


Configuring kube-proxy Mode

Checking Current Mode

To check which mode your kube-proxy is using:

# View kube-proxy ConfigMap
kubectl -n kube-system get configmap kube-proxy -o yaml | grep mode

# Or check kube-proxy logs
kubectl -n kube-system logs -l k8s-app=kube-proxy | grep "Using"

Switching to IPVS Mode

To switch to IPVS mode, follow these steps:

  1. First, ensure your nodes have the required kernel modules:
# Check for required modules
lsmod | grep -e ip_vs -e nf_conntrack

# Load modules if needed
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack
  1. Edit the kube-proxy ConfigMap:
kubectl -n kube-system edit configmap kube-proxy
  1. Update the mode in the ConfigMap:
# Find this section
data:
  config.conf: |-
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    kind: KubeProxyConfiguration
    mode: "ipvs"  # Change from "iptables" to "ipvs"
    # Add IPVS-specific settings if needed
    ipvs:
      scheduler: "rr"  # Default is rr (round robin)
  1. Restart kube-proxy pods:
kubectl -n kube-system delete pods -l k8s-app=kube-proxy
  1. Verify the change:
# Check if IPVS modules are loaded
kubectl -n kube-system exec -it $(kubectl -n kube-system get pods -l k8s-app=kube-proxy -o name | head -n 1) -- lsmod | grep ip_vs

# Check IPVS rules
kubectl -n kube-system exec -it $(kubectl -n kube-system get pods -l k8s-app=kube-proxy -o name | head -n 1) -- ipvsadm -ln


IPVS Optimization for Production

For optimal IPVS performance in production environments, consider the following optimizations:

Connection Tracking Tuning

# Increase connection tracking table size
sysctl -w net.netfilter.nf_conntrack_max=1000000

# Increase timeout for UDP connections
sysctl -w net.netfilter.nf_conntrack_udp_timeout=60
sysctl -w net.netfilter.nf_conntrack_udp_timeout_stream=180

# Make sure these settings persist
cat > /etc/sysctl.d/95-ipvs.conf << EOF
net.netfilter.nf_conntrack_max=1000000
net.netfilter.nf_conntrack_tcp_timeout_established=900
net.netfilter.nf_conntrack_udp_timeout=60
net.netfilter.nf_conntrack_udp_timeout_stream=180
EOF

IPVS Scheduler Selection

Choose the appropriate scheduling algorithm based on your workload:

# In kube-proxy ConfigMap
ipvs:
  scheduler: "rr"  # Options: rr, lc, dh, sh, sed, nq
  
  # For sticky sessions based on client IP
  scheduler: "sh"
  
  # For backend pods with different capacities
  scheduler: "lc"

Graceful Termination

Consider increasing terminationGracePeriodSeconds in your deployments to allow connections to drain properly before Pod termination.


When to Use Each Mode

🏷 Use iptables Mode When:

Small Cluster Size: Running fewer than ~200 services
Development Environment: Simplicity is more important than performance
Kernel Compatibility: Working with older or restricted kernels
Low Resource Requirements: Operating in constrained environments
Familiar Tooling: Team has strong experience with iptables debugging

🏷 Use IPVS Mode When:

Large Cluster Size: Running hundreds or thousands of services
Production Environment: Performance and scale are critical
Advanced Load Balancing: Need specific algorithms or sticky sessions
High Throughput: Processing many connections per second
Frequent Service Changes: Need faster service update times

graph TD A["Decision: iptables or IPVS?"] --> B{"Cluster Size?"} B -->|< 100 services| C["iptables"] B -->|≥ 100 services| D["IPVS"] A --> E{"Environment?"} E -->|Development/Testing| C E -->|Production| D A --> F{"Load Balancing?"} F -->|Basic Round-Robin| C F -->|Advanced Algorithms| D A --> G{"Resource Constraints?"} G -->|Limited Resources| C G -->|Performance Priority| D style A fill:#ffcc80,stroke:#333,stroke-width:2px style C fill:#c8e6c9,stroke:#333,stroke-width:2px style D fill:#e1bee7,stroke:#333,stroke-width:2px


🔐 Summary
iptables → Simple, reliable, and better for small, low-traffic environments. Excellent default choice for getting started with Kubernetes.

IPVS → High-performance, scalable, and ideal for production systems with complex networking requirements. Worth the additional setup complexity for large clusters.
Important Note

Starting with Kubernetes 1.20, the userspace proxy mode has been deprecated in favor of iptables and IPVS. If you're still using userspace mode, consider migrating to one of these two options.



References