November 5, 2025 14 min to read

Implementing Redis Redlock on Kubernetes — A Complete Guide to Distributed Locks

Principles of Redlock, safe implementation patterns on Kubernetes, operational pitfalls, and alternatives

Overview

Concurrency control is one of the trickiest problems in distributed systems.

When multiple instances contend for the same resource, you need a reliable locking mechanism to maintain data consistency and prevent race conditions.

A naive single Redis instance with SET NX is unsafe under network partitions, instance failures, and clock drift. To address these risks, Redis creator Salvatore Sanfilippo proposed the Redlock algorithm.

This post covers how Redlock works, safe implementation patterns on Kubernetes, must-have operational checks,

and how Redlock compares to DB- and Consul/etcd-based locking — all from a production perspective.

Scope

This guide targets teams operating Redis-based distributed locks reliably in on-premises or hybrid Kubernetes environments. It focuses on concepts, design, and operations — without code snippets.

What is Redlock?

Key ideas

A client must acquire the lock on a majority (N/2+1) of independent Redis instances to succeed.
Locks use a TTL and a unique token to validate ownership and guarantee automatic release.
Compute effective validity by accounting for clock drift, and release any partially acquired locks on failure.

Why multiple instances?

Eliminate single points of failure (SPOF)
Prevent split-brain/effective double-locks during network partitions
Gain tolerance to inter-node clock skew

How Redlock Works (Deep dive)

1) Acquire

Attempt to acquire the lock concurrently across instances with a unique token and TTL.
Sum the elapsed time and compute the effective validity relative to the TTL.

def acquire_lock(resource_name, ttl):
    start_time = current_time()
    successful_locks = 0
    unique_value = generate_unique_id()
    
    # To all Redis instances at the same time
    for redis_instance in redis_instances:
        if redis_instance.set(resource_name, unique_value, px=ttl, nx=True):
            successful_locks += 1
    
    elapsed_time = current_time() - start_time
    drift = (ttl * 0.01) + 2  # Clock drift calibration
    
    # Success conditions: majority + valid time left
    if successful_locks >= (len(redis_instances) / 2 + 1):
        validity_time = ttl - elapsed_time - drift
        if validity_time > 0:
            return True, validity_time
    
    # Unlock on all instances in case of failure
    release_lock(resource_name, unique_value)
    return False, 0

2) Success conditions

Majority of instances acquired
TTL − (elapsed + drift) > 0

3) Release

Delete only if the stored token matches the owner token (atomicity)
On failure/timeout, release any partial acquisitions

## Lua script 
if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end

Common Kubernetes Pitfalls

Many teams assume that deploying Redis as a StatefulSet with a Headless Service makes Redlock safe by default. It does not, for several reasons:

Headless DNS limitations: Even with multiple A records, many Redis clients use only the first IP.
Lack of independence: Co-locating pods on the same node/rack/storage overlaps failure domains.
Client awareness: Clients may not treat pods as distinct, independently monitored instances.

Correct Implementation Patterns (Kubernetes)

Method 1 — Per-pod Services (recommended)

Expose a dedicated Service per Redis pod to create clear, stable endpoints.
The application creates independent connections to each Service for the Redlock cluster.

apiVersion: v1
kind: Service
metadata:
  name: redis-0
spec:
  selector:
    app: redis-cluster
    statefulset.kubernetes.io/pod-name: redis-cluster-0
  ports:
    - port: 6379
      targetPort: 6379

---
apiVersion: v1
kind: Service
metadata:
  name: redis-1
spec:
  selector:
    app: redis-cluster
    statefulset.kubernetes.io/pod-name: redis-cluster-1
  ports:
    - port: 6379
      targetPort: 6379

---
apiVersion: v1
kind: Service
metadata:
  name: redis-2
spec:
  selector:
    app: redis-cluster
    statefulset.kubernetes.io/pod-name: redis-cluster-2
  ports:
    - port: 6379
      targetPort: 6379

Method 2 — Pod anti-affinity

Enforce spread policies to avoid co-scheduling on the same node/rack.
Maximize independence across failure domains (node/power/network/storage).

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - redis-cluster
            topologyKey: kubernetes.io/hostname

Method 3 — DNS-based dynamic discovery

Periodically resolve headless endpoints to refresh the connection pool dynamically.
Tune connection/retry/timeout policies conservatively for Redlock assumptions.

const dns = require('dns').promises;

async function getRedisInstances() {
  const addresses = await dns.lookup('redis-service', { all: true });
  return addresses.map(addr => ({
    host: addr.address,
    port: 6379
  }));
}

async function createRedlock() {
  const instances = await getRedisInstances();
  return new Redlock(instances.map(config => new Redis(config)));
}

Real-world Implementation Patterns (No code)

Application layer (e.g., Node.js, Java)

Use the same lock key across at least three independent Redis instances to achieve majority consensus.
Maintain separate event/error handlers per connection and observe connection health independently.
Configure Redlock client retry counts/delays/jitter and drift factors explicitly.

Node

const Redis = require('ioredis');
const Redlock = require('redlock');

class RedlockService {
  constructor() {
    // Connections using per-pod Services
    this.redisClients = [
      new Redis({
        host: 'redis-0.default.svc.cluster.local',
        port: 6379,
        retryDelayOnFailover: 100,
        maxRetriesPerRequest: 3
      }),
      new Redis({
        host: 'redis-1.default.svc.cluster.local',
        port: 6379,
        retryDelayOnFailover: 100,
        maxRetriesPerRequest: 3
      }),
      new Redis({
        host: 'redis-2.default.svc.cluster.local',
        port: 6379,
        retryDelayOnFailover: 100,
        maxRetriesPerRequest: 3
      })
    ];

    this.redlock = new Redlock(this.redisClients, {
      retryCount: 3,
      retryDelay: 200,
      retryJitter: 200,
      driftFactor: 0.01,
      clockDriftMs: 2
    });

    this.setupEventHandlers();
  }

  setupEventHandlers() {
    this.redisClients.forEach((client, index) => {
      client.on('connect', () => {
        console.log(`Redis-${index} connected`);
      });
      
      client.on('error', (err) => {
        console.error(`Redis-${index} error:`, err.message);
      });
    });

    this.redlock.on('clientError', (err) => {
      console.error('Redlock client error:', err);
    });
  }

  async acquireLock(resource, ttl = 10000) {
    try {
      console.log(`Attempting to acquire lock: ${resource}`);
      const lock = await this.redlock.acquire([resource], ttl);
      console.log(`Lock acquired: ${resource}`);
      return lock;
    } catch (error) {
      console.error(`Failed to acquire lock: ${resource}`, error.message);
      throw error;
    }
  }

  async releaseLock(lock) {
    try {
      await lock.release();
      console.log(`Lock released: ${lock.resources}`);
    } catch (error) {
      console.error(`Failed to release lock:`, error.message);
      throw error;
    }
  }
}

// Usage example
async function criticalSection() {
  const redlockService = new RedlockService();
  
  try {
    const lock = await redlockService.acquireLock('user-action-123', 15000);
    
    try {
      // Work performed in critical section
      console.log('Performing critical work...');
      await performCriticalOperation();
      console.log('Work completed');
    } finally {
      await redlockService.releaseLock(lock);
    }
  } catch (error) {
    console.error('Work failed:', error.message);
  }
}

Java (Redisson)

import org.redisson.Redisson;
import org.redisson.api.RLock;
import org.redisson.api.RedissonClient;
import org.redisson.config.Config;

@Service
public class RedlockService {
    
    private final RedissonClient redissonClient;
    
    public RedlockService() {
        Config config = new Config();
        config.useReplicatedServers()
            .addNodeAddress("redis://redis-0.default.svc.cluster.local:6379")
            .addNodeAddress("redis://redis-1.default.svc.cluster.local:6379")
            .addNodeAddress("redis://redis-2.default.svc.cluster.local:6379");
        
        this.redissonClient = Redisson.create(config);
    }
    
    public boolean executeWithLock(String lockName, int ttlSeconds, Runnable task) {
        RLock lock = redissonClient.getLock(lockName);
        
        try {
            if (lock.tryLock(10, ttlSeconds, TimeUnit.SECONDS)) {
                try {
                    log.info("Lock acquired: {}", lockName);
                    task.run();
                    return true;
                } finally {
                    lock.unlock();
                    log.info("Lock released: {}", lockName);
                }
            } else {
                log.warn("Failed to acquire lock: {}", lockName);
                return false;
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            log.error("Interrupted while acquiring lock: {}", lockName, e);
            return false;
        }
    }
}

Lock lifecycle

Set TTL to 2–3× expected work time (including buffer).
For long-running work, include lock extend/refresh and define rollback/retry on failure.
Release locks only after verifying the ownership token (avoid cross-release).

Operations Checklist

1) Clock synchronization

Enable NTP/Chrony on all nodes and build a drift monitoring dashboard.

# Check NTP synchronization state
chrony sources -v

# Reduce clock drift
echo "server time.google.com iburst" >> /etc/chrony.conf
systemctl restart chronyd

2) TTL/timeout policies

Keep connection/command timeouts low; use short, shallow retries for fast failure.
Derive TTLs from workload profiling, including peak scenarios.

const expectedWorkTime = 5000;  // 5 seconds
const safetyMargin = 2;
const ttl = expectedWorkTime * safetyMargin;  // 10 seconds

3) Observability

Collect and alert on the following metrics:

Metric	Purpose
redlock_acquire_success_total	Lock acquisition success trend
redlock_acquire_failure_total	Diagnose lock acquisition failures
redlock_acquire_duration_seconds	Distribution of acquisition latency
redlock_validity_time_remaining	Observe remaining validity time
redis_connection_failures_total	Per-instance connection stability

4) Failure scenarios

Provide safe abort/rollback paths when lock extension fails.
After partial success/failure, release on all instances to restore consistency.
Rehearse network partitions, node failures, and clock skew scenarios in advance.

Alternatives and Comparison

vs. Database-based locking

Aspect	Redlock	Database Lock
Performance	Very fast	Relatively slower
Complexity	Medium	Simple
Consistency	Eventual	Strong (transactional)
Failure recovery	Automatic TTL expiry	Manual/transactional release

vs. Consul/etcd-based locking

Aspect	Redlock	Consul/etcd
Infrastructure complexity	Low	High
Performance	Very fast	Fast
Consensus	Majority vote	Raft
Ecosystem	Redis-centric	Microservices/Orchestration

When to Use Redlock

Well-suited for

Real-time systems where very low latency and high throughput matter
Teams that already operate Redis with low operational overhead
Simple lock semantics (acquire/extend/release) without stringent consensus requirements

Avoid when

Domains requiring strict strong consistency (e.g., finance)
Environments with frequent partitions or poor time synchronization
Workloads where consensus correctness is paramount (prefer Raft-based approaches)

Conclusion: Safe Redlock on Kubernetes

Redlock is elegant, but correct implementation is everything.

On Kubernetes, a StatefulSet alone is insufficient — you need independent access paths to each Redis instance and separation of failure domains.

Combine that with clock sync, TTL/timeout discipline, observability, and failure drills to achieve operational reliability. Redlock is not a universal answer; depending on requirements, DB- or Consul/etcd-based locking may be a better fit.

References

Redis Redlock official documentation
Martin Kleppmann’s critique of Redlock
Kubernetes StatefulSet guide
Redis clustering best practices
Kubernetes Storage Concepts

somaz v3.1.2

Implementing Redis Redlock on Kubernetes — A Complete Guide to Distributed Locks

Overview

What is Redlock?

Key ideas

Why multiple instances?

How Redlock Works (Deep dive)

1) Acquire

2) Success conditions

3) Release

Common Kubernetes Pitfalls

Correct Implementation Patterns (Kubernetes)

Method 1 — Per-pod Services (recommended)

Method 2 — Pod anti-affinity

Method 3 — DNS-based dynamic discovery

Real-world Implementation Patterns (No code)

Application layer (e.g., Node.js, Java)

Node

Java (Redisson)

Lock lifecycle

Operations Checklist

1) Clock synchronization

2) TTL/timeout policies

3) Observability

4) Failure scenarios

Alternatives and Comparison

vs. Database-based locking

vs. Consul/etcd-based locking

When to Use Redlock

Well-suited for

Avoid when

Conclusion: Safe Redlock on Kubernetes

References

Automated Ceph Cluster Deployment with Cephadm-Ansible and Kubernetes Integration

Somaz

Comments

Implementing Redis Redlock on Kubernetes — A Complete Guide to Distributed Locks

Overview

What is Redlock?

Key ideas

Why multiple instances?

How Redlock Works (Deep dive)

1) Acquire

2) Success conditions

3) Release

Common Kubernetes Pitfalls

Correct Implementation Patterns (Kubernetes)

Method 1 — Per-pod Services (recommended)

Method 2 — Pod anti-affinity

Method 3 — DNS-based dynamic discovery

Real-world Implementation Patterns (No code)

Application layer (e.g., Node.js, Java)

Node

Java (Redisson)

Lock lifecycle

Operations Checklist

1) Clock synchronization

2) TTL/timeout policies

3) Observability

4) Failure scenarios

Alternatives and Comparison

vs. Database-based locking

vs. Consul/etcd-based locking

When to Use Redlock

Well-suited for

Avoid when

Conclusion: Safe Redlock on Kubernetes

References

Automated Ceph Cluster Deployment with Cephadm-Ansible and Kubernetes Integration

Share

Somaz

Comments