Implementing Redis Redlock on Kubernetes — A Complete Guide to Distributed Locks

Principles of Redlock, safe implementation patterns on Kubernetes, operational pitfalls, and alternatives

Featured image



Overview

Concurrency control is one of the trickiest problems in distributed systems.

When multiple instances contend for the same resource, you need a reliable locking mechanism to maintain data consistency and prevent race conditions.

A naive single Redis instance with SET NX is unsafe under network partitions, instance failures, and clock drift. To address these risks, Redis creator Salvatore Sanfilippo proposed the Redlock algorithm.

This post covers how Redlock works, safe implementation patterns on Kubernetes, must-have operational checks,

and how Redlock compares to DB- and Consul/etcd-based locking — all from a production perspective.

Scope

This guide targets teams operating Redis-based distributed locks reliably in on-premises or hybrid Kubernetes environments. It focuses on concepts, design, and operations — without code snippets.


What is Redlock?


Key ideas


Why multiple instances?


How Redlock Works (Deep dive)


1) Acquire

def acquire_lock(resource_name, ttl):
    start_time = current_time()
    successful_locks = 0
    unique_value = generate_unique_id()
    
    # To all Redis instances at the same time
    for redis_instance in redis_instances:
        if redis_instance.set(resource_name, unique_value, px=ttl, nx=True):
            successful_locks += 1
    
    elapsed_time = current_time() - start_time
    drift = (ttl * 0.01) + 2  # Clock drift calibration
    
    # Success conditions: majority + valid time left
    if successful_locks >= (len(redis_instances) / 2 + 1):
        validity_time = ttl - elapsed_time - drift
        if validity_time > 0:
            return True, validity_time
    
    # Unlock on all instances in case of failure
    release_lock(resource_name, unique_value)
    return False, 0


2) Success conditions


3) Release

## Lua script 
if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end


Common Kubernetes Pitfalls

Many teams assume that deploying Redis as a StatefulSet with a Headless Service makes Redlock safe by default. It does not, for several reasons:


Correct Implementation Patterns (Kubernetes)


apiVersion: v1
kind: Service
metadata:
  name: redis-0
spec:
  selector:
    app: redis-cluster
    statefulset.kubernetes.io/pod-name: redis-cluster-0
  ports:
    - port: 6379
      targetPort: 6379

---
apiVersion: v1
kind: Service
metadata:
  name: redis-1
spec:
  selector:
    app: redis-cluster
    statefulset.kubernetes.io/pod-name: redis-cluster-1
  ports:
    - port: 6379
      targetPort: 6379

---
apiVersion: v1
kind: Service
metadata:
  name: redis-2
spec:
  selector:
    app: redis-cluster
    statefulset.kubernetes.io/pod-name: redis-cluster-2
  ports:
    - port: 6379
      targetPort: 6379


Method 2 — Pod anti-affinity

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - redis-cluster
            topologyKey: kubernetes.io/hostname


Method 3 — DNS-based dynamic discovery

const dns = require('dns').promises;

async function getRedisInstances() {
  const addresses = await dns.lookup('redis-service', { all: true });
  return addresses.map(addr => ({
    host: addr.address,
    port: 6379
  }));
}

async function createRedlock() {
  const instances = await getRedisInstances();
  return new Redlock(instances.map(config => new Redis(config)));
}


Real-world Implementation Patterns (No code)


Application layer (e.g., Node.js, Java)

Node

const Redis = require('ioredis');
const Redlock = require('redlock');

class RedlockService {
  constructor() {
    // Connections using per-pod Services
    this.redisClients = [
      new Redis({
        host: 'redis-0.default.svc.cluster.local',
        port: 6379,
        retryDelayOnFailover: 100,
        maxRetriesPerRequest: 3
      }),
      new Redis({
        host: 'redis-1.default.svc.cluster.local',
        port: 6379,
        retryDelayOnFailover: 100,
        maxRetriesPerRequest: 3
      }),
      new Redis({
        host: 'redis-2.default.svc.cluster.local',
        port: 6379,
        retryDelayOnFailover: 100,
        maxRetriesPerRequest: 3
      })
    ];

    this.redlock = new Redlock(this.redisClients, {
      retryCount: 3,
      retryDelay: 200,
      retryJitter: 200,
      driftFactor: 0.01,
      clockDriftMs: 2
    });

    this.setupEventHandlers();
  }

  setupEventHandlers() {
    this.redisClients.forEach((client, index) => {
      client.on('connect', () => {
        console.log(`Redis-${index} connected`);
      });
      
      client.on('error', (err) => {
        console.error(`Redis-${index} error:`, err.message);
      });
    });

    this.redlock.on('clientError', (err) => {
      console.error('Redlock client error:', err);
    });
  }

  async acquireLock(resource, ttl = 10000) {
    try {
      console.log(`Attempting to acquire lock: ${resource}`);
      const lock = await this.redlock.acquire([resource], ttl);
      console.log(`Lock acquired: ${resource}`);
      return lock;
    } catch (error) {
      console.error(`Failed to acquire lock: ${resource}`, error.message);
      throw error;
    }
  }

  async releaseLock(lock) {
    try {
      await lock.release();
      console.log(`Lock released: ${lock.resources}`);
    } catch (error) {
      console.error(`Failed to release lock:`, error.message);
      throw error;
    }
  }
}

// Usage example
async function criticalSection() {
  const redlockService = new RedlockService();
  
  try {
    const lock = await redlockService.acquireLock('user-action-123', 15000);
    
    try {
      // Work performed in critical section
      console.log('Performing critical work...');
      await performCriticalOperation();
      console.log('Work completed');
    } finally {
      await redlockService.releaseLock(lock);
    }
  } catch (error) {
    console.error('Work failed:', error.message);
  }
}

Java (Redisson)

import org.redisson.Redisson;
import org.redisson.api.RLock;
import org.redisson.api.RedissonClient;
import org.redisson.config.Config;

@Service
public class RedlockService {
    
    private final RedissonClient redissonClient;
    
    public RedlockService() {
        Config config = new Config();
        config.useReplicatedServers()
            .addNodeAddress("redis://redis-0.default.svc.cluster.local:6379")
            .addNodeAddress("redis://redis-1.default.svc.cluster.local:6379")
            .addNodeAddress("redis://redis-2.default.svc.cluster.local:6379");
        
        this.redissonClient = Redisson.create(config);
    }
    
    public boolean executeWithLock(String lockName, int ttlSeconds, Runnable task) {
        RLock lock = redissonClient.getLock(lockName);
        
        try {
            if (lock.tryLock(10, ttlSeconds, TimeUnit.SECONDS)) {
                try {
                    log.info("Lock acquired: {}", lockName);
                    task.run();
                    return true;
                } finally {
                    lock.unlock();
                    log.info("Lock released: {}", lockName);
                }
            } else {
                log.warn("Failed to acquire lock: {}", lockName);
                return false;
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            log.error("Interrupted while acquiring lock: {}", lockName, e);
            return false;
        }
    }
}


Lock lifecycle


Operations Checklist


1) Clock synchronization

# Check NTP synchronization state
chrony sources -v

# Reduce clock drift
echo "server time.google.com iburst" >> /etc/chrony.conf
systemctl restart chronyd


2) TTL/timeout policies

const expectedWorkTime = 5000;  // 5 seconds
const safetyMargin = 2;
const ttl = expectedWorkTime * safetyMargin;  // 10 seconds


3) Observability

Metric Purpose
redlock_acquire_success_total Lock acquisition success trend
redlock_acquire_failure_total Diagnose lock acquisition failures
redlock_acquire_duration_seconds Distribution of acquisition latency
redlock_validity_time_remaining Observe remaining validity time
redis_connection_failures_total Per-instance connection stability


4) Failure scenarios


Alternatives and Comparison


vs. Database-based locking

Aspect Redlock Database Lock
Performance Very fast Relatively slower
Complexity Medium Simple
Consistency Eventual Strong (transactional)
Failure recovery Automatic TTL expiry Manual/transactional release


vs. Consul/etcd-based locking

Aspect Redlock Consul/etcd
Infrastructure complexity Low High
Performance Very fast Fast
Consensus Majority vote Raft
Ecosystem Redis-centric Microservices/Orchestration


When to Use Redlock


Well-suited for


Avoid when


Conclusion: Safe Redlock on Kubernetes

Redlock is elegant, but correct implementation is everything.

On Kubernetes, a StatefulSet alone is insufficient — you need independent access paths to each Redis instance and separation of failure domains.

Combine that with clock sync, TTL/timeout discipline, observability, and failure drills to achieve operational reliability. Redlock is not a universal answer; depending on requirements, DB- or Consul/etcd-based locking may be a better fit.



References