Advanced Ceph Erasure Code Configuration and Performance Optimization

Complete guide to implementing Erasure Code in Ceph clusters for storage efficiency and data protection

Featured image



Overview

In large-scale storage systems, achieving both reliability and efficiency simultaneously presents significant challenges. Ceph addresses this through two distinct data protection methods: Replica and Erasure Code.

This article focuses primarily on Erasure Code configuration and practical implementation, highlighting its superior storage efficiency. We’ll explore the fundamental differences between Replica and Erasure Code approaches, provide step-by-step configuration guides, and address common troubleshooting scenarios.

Erasure Code offers substantial storage savings while maintaining high availability, making it ideal for environments where storage cost optimization is critical without compromising data durability.



What is Erasure Code?

Erasure Code is a data protection method that divides data into multiple chunks (k) and adds recovery chunks (m), storing the complete dataset across N = k + m blocks.

This approach enables data recovery even when some blocks are lost, providing resilience through mathematical redundancy rather than simple replication.


Key Characteristics:


Example Configuration:

# k=3, m=2 configuration
# - 3 data chunks + 2 parity chunks = 5 total chunks
# - Can survive up to 2 chunk failures
# - Storage efficiency: 3/5 = 60% (vs 33% for 3-way replication)



Replica vs Erasure Code Comparison

Aspect Replica Method Erasure Code Method
Data Protection Identical data copies Data segmentation + parity
Storage Efficiency Very Low (33% for 3-replica) High (60% for k=3,m=2)
Recovery Speed Fast (simple copy) Slower (parity calculation)
CPU/Compute Load Low High (parity operations)
Network Overhead High (full data copies) Lower (distributed chunks)
Suitable Environment Fast recovery critical Storage space optimization
Minimum OSDs Equal to replica count k + m
Failure Tolerance n-1 failures m failures
Comparison between Replica and Erasure Code strategies



Infrastructure Requirements


Minimum Cluster Configuration

Component Requirement Recommendation
OSDs k + m (minimum) k + m + 2 (for maintenance)
Memory 4GB per OSD 8GB per OSD
Network 1Gbps 10Gbps
CPU 1 core per OSD 2 cores per OSD
Failure Domains k + m distinct domains Well-distributed topology
Minimum hardware and topology recommendations


Example Cluster Topology

# 3-node cluster example for k=3, m=2
Node 1: 2 OSDs (rack-1)
Node 2: 2 OSDs (rack-2)  
Node 3: 1 OSD (rack-3)
Total: 5 OSDs across 3 failure domains



Erasure Code Profile Configuration


Step 1: Verify Cluster Status

# Check Ceph version
ceph --version

# Verify cluster health
ceph -s

# Check OSD status
ceph osd tree

# Verify available failure domains
ceph osd crush tree


Step 2: Create Erasure Code Profile


Profile Parameters Explanation:

# Core Parameters
k=3                        # Number of data chunks
m=2                        # Number of parity chunks
plugin=jerasure           # Erasure coding algorithm

# Advanced Parameters
technique=reed_sol_van    # Specific technique within plugin
crush-failure-domain=host # Failure domain for chunk placement
crush-device-class=ssd    # Restrict to specific device class


Step 3: Verify Profile Creation

# List all erasure code profiles
ceph osd erasure-code-profile ls

# Display profile details
ceph osd erasure-code-profile get ec-profile-advanced

# Expected output:
# crush-device-class=ssd
# crush-failure-domain=host
# k=4
# m=2
# plugin=jerasure
# technique=reed_sol_van



Pool Creation and Management


Step 1: Calculate Placement Groups (PGs)

# Formula: (OSDs * 100) / (k + m) / pool_count
# Example: (12 OSDs * 100) / 6 / 2 pools = 100 PGs
# Round to nearest power of 2: 128 PGs

# For production environments, use Ceph PG calculator:
# https://ceph.io/pgcalc/


Step 2: Create Erasure Code Pool

# Create pool with calculated PGs
ceph osd pool create ec-pool-storage 128 erasure ec-profile-advanced

# Verify pool creation
ceph osd pool ls detail | grep ec-pool-storage

# Check pool status
ceph osd pool get ec-pool-storage all


Step 3: Configure Pool for Applications

# Enable for RGW (Object Storage)
ceph osd pool application enable ec-pool-storage rgw

# Enable for RBD (Block Storage) - requires metadata pool
ceph osd pool create ec-pool-metadata 32 replicated
ceph osd pool application enable ec-pool-metadata rbd
ceph osd pool application enable ec-pool-storage rbd

# Enable for CephFS (File Storage)
ceph osd pool application enable ec-pool-storage cephfs



Performance Optimization


Step 1: CRUSH Map Optimization

# Extract current CRUSH map
ceph osd getcrushmap -o crushmap.bin
crushtool -d crushmap.bin -o crushmap.txt

# Edit crushmap.txt to optimize placement rules
# Add custom rules for erasure code pools

# Compile and inject updated CRUSH map
crushtool -c crushmap.txt -o crushmap-new.bin
ceph osd setcrushmap -i crushmap-new.bin

Custom CRUSH Rule Example:

# Custom rule in crushmap.txt
rule ec_ssd_rule {
    id 1
    type erasure
    min_size 3
    max_size 6
    step take default class ssd
    step chooseleaf firstn 0 type host
    step emit
}


Step 2: Configure Pool Parameters

# Set optimal parameters for erasure code pool
ceph osd pool set ec-pool-storage pg_num 128
ceph osd pool set ec-pool-storage pgp_num 128

# Optimize for read performance
ceph osd pool set ec-pool-storage fast_read true

# Configure compression (optional)
ceph osd pool set ec-pool-storage compression_algorithm zstd
ceph osd pool set ec-pool-storage compression_mode force


Step 3: Monitor Performance Metrics

# Check pool statistics
ceph df detail

# Monitor I/O performance
ceph osd perf

# Check placement group distribution
ceph pg dump | grep ec-pool-storage

# Monitor erasure code operations
ceph daemon osd.0 perf dump | grep erasure



RBD Integration with Erasure Code


Step 1: Create Metadata and Data Pools

# Create replicated metadata pool
ceph osd pool create rbd-metadata 32 replicated

# Create erasure code data pool
ceph osd pool create rbd-ec-data 128 erasure ec-profile-advanced

# Enable RBD application
ceph osd pool application enable rbd-metadata rbd
ceph osd pool application enable rbd-ec-data rbd


Step 2: Configure RBD with Erasure Code Backend

# Create RBD image with erasure code data pool
rbd create --size 10G --data-pool rbd-ec-data rbd-metadata/test-image

# Verify image configuration
rbd info rbd-metadata/test-image

# Expected output:
# rbd image 'test-image':
#     size 10 GiB in 2560 objects
#     order 22 (4 MiB objects)
#     data_pool: rbd-ec-data


Step 3: Kubernetes StorageClass Configuration

# rbd-ec-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rbd-erasure-code
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "false"
provisioner: rbd.csi.ceph.com
parameters:
  clusterID: "b9127830-b0cc-4e34-aa47-9d1a2e9949a8"
  pool: "rbd-metadata"
  dataPool: "rbd-ec-data"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
  csi.storage.k8s.io/provisioner-secret-namespace: kube-system
  csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: kube-system
  csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
  csi.storage.k8s.io/node-stage-secret-namespace: kube-system
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  - discard



Troubleshooting Common Issues


Issue 1: Degraded Data Redundancy

Symptoms:

HEALTH_WARN
Degraded data redundancy: 122 pgs undersized

Root Causes:

Resolution Steps:

# Check OSD distribution
ceph osd tree

# Verify failure domain configuration
ceph osd crush tree

# Add more OSDs if needed
ceph-volume lvm create --data /dev/sdX

# Check PG mapping
ceph pg ls-by-pool ec-pool-storage | head -10

# Force PG repair if necessary
ceph pg repair <pg_id>


Issue 2: Pool Application Not Enabled

Symptoms:

HEALTH_WARN
1 pool(s) do not have an application enabled

Resolution:

# Enable appropriate application
ceph osd pool application enable ec-pool-storage rgw

# Verify application status
ceph osd pool application get ec-pool-storage


Issue 3: Slow Recovery Performance

Symptoms:

Optimization Steps:

# Adjust recovery parameters
ceph config set osd osd_recovery_max_active 3
ceph config set osd osd_max_backfills 1
ceph config set osd osd_recovery_sleep 0.1

# Monitor recovery progress
ceph -s | grep recovery

# Check individual OSD performance
ceph daemon osd.0 perf dump | grep recovery


Issue 4: CRUSH Rule Conflicts

Symptoms:

HEALTH_ERR
Pool 'ec-pool-storage' has no available OSDs for pg X.Y

Resolution:

# Check current CRUSH rules
ceph osd crush rule ls
ceph osd crush rule dump

# Verify pool's CRUSH rule
ceph osd pool get ec-pool-storage crush_rule

# Create appropriate CRUSH rule
ceph osd crush rule create-erasure ec-rule ec-profile-advanced

# Apply rule to pool
ceph osd pool set ec-pool-storage crush_rule ec-rule



Monitoring and Maintenance


Step 1: Setup Monitoring Scripts


Step 2: Automated Health Checks

# Create cron job for regular monitoring
cat > /etc/cron.d/ceph-monitoring << 'EOF'
# Monitor Ceph erasure code pools every 15 minutes
*/15 * * * * root /usr/local/bin/ceph-ec-monitor.sh >> /var/log/ceph-monitoring.log 2>&1
EOF


Step 3: Performance Baselines

# Establish performance baselines
echo "=== Storage Efficiency Comparison ===" > /var/log/ceph-baseline.log

# Calculate storage efficiency
TOTAL_RAW=$(ceph df | grep TOTAL | awk '{print $2}')
TOTAL_USED=$(ceph df | grep TOTAL | awk '{print $4}')
EFFICIENCY=$(echo "scale=2; $TOTAL_USED / $TOTAL_RAW * 100" | bc)

echo "Raw Storage: $TOTAL_RAW" >> /var/log/ceph-baseline.log
echo "Used Storage: $TOTAL_USED" >> /var/log/ceph-baseline.log
echo "Efficiency: $EFFICIENCY%" >> /var/log/ceph-baseline.log

# I/O performance baseline
rados bench -p ec-pool-storage 30 write > /var/log/ceph-write-baseline.log
rados bench -p ec-pool-storage 30 seq > /var/log/ceph-read-baseline.log



Best Practices and Recommendations


Production Deployment Guidelines

  1. Capacity Planning
    • Plan for k+m+2 OSDs minimum for maintenance windows
    • Consider network bandwidth for erasure code operations
    • Account for increased CPU requirements
  2. Failure Domain Design
    • Distribute OSDs across physical racks or availability zones
    • Ensure each failure domain can handle chunk placement
    • Plan for correlated failures (power, network)
  3. Performance Optimization
    • Use fast SSDs for metadata pools
    • Optimize CRUSH rules for data locality
    • Monitor and tune recovery parameters


When to Choose Erasure Code

Scenario Recommendation Rationale
Cold Storage Erasure Code Storage efficiency priority
Backup Systems Erasure Code Cost optimization critical
Archive Data Erasure Code Infrequent access patterns
Hot Database Replica Performance critical
Real-time Analytics Replica Low latency required
Mixed Workloads Hybrid Approach Balance efficiency and performance
Guidelines for selecting Replica vs Erasure Code


Configuration Templates



Conclusion

Erasure Code represents a significant advancement in storage efficiency for Ceph clusters, offering substantial cost savings while maintaining high durability.

The configuration complexity is offset by dramatic improvements in storage utilization, making it essential for large-scale deployments.


Key Achievements:

  1. Storage Efficiency: Up to 60-80% storage utilization vs 33% with 3-way replication
  2. Scalability: Better resource utilization for large datasets
  3. Flexibility: Multiple profiles for different use cases
  4. Integration: Seamless Kubernetes and application integration


Operational Benefits:


Future Considerations:

Mastering Erasure Code configuration enables organizations to build cost-effective, highly durable storage systems that scale efficiently with growing data requirements.

“Erasure Code transforms storage economics by delivering enterprise-grade durability at a fraction of traditional replication costs.”



References