November 26, 2025 15 min to read

Deploying and Operating Rook-Ceph on Kubernetes: Complete Implementation Guide

A comprehensive guide to deploying Ceph storage on Kubernetes using Rook operator with Helm charts and operational best practices

Overview

In this article, we’ll cover the complete process of deploying and operating Ceph storage in a Kubernetes cluster using Rook-Ceph.

Rook is an open-source orchestration tool that manages Ceph in a Kubernetes-native way, abstracting complex distributed storage systems into Kubernetes resources for easy installation and operation. It helps simplify the deployment and management of sophisticated storage infrastructure.

In this implementation, we built a Kubernetes cluster using Kubespray on Google Cloud VM environment, then installed the Rook Operator and Rook-Ceph cluster using Helm Charts. We configured dedicated storage nodes for Ceph OSDs and properly customized Rook’s values.yaml to set up MON/MGR/OSD/MDS/Object Store components.

Subsequently, we configured RBD-based StorageClass to enable Pods to use Ceph block storage, and finally activated the Ceph Dashboard for web UI-based cluster monitoring capabilities.

What is Cephadm?

Cephadm is Ceph’s latest deployment and management tool, introduced starting with the Ceph Octopus release.

It’s designed to simplify deploying, configuring, managing, and scaling Ceph clusters. It can bootstrap a cluster with a single command and deploys Ceph services using container technology.

Cephadm doesn’t rely on external configuration tools like Ansible, Rook, or Salt. However, these external configuration tools can be used to automate tasks not performed by cephadm itself.

What is Rook-Ceph?

Ceph is a scalable distributed storage solution for block storage, object storage, and shared file systems, proven through years of production deployments.

Rook is an open-source orchestration framework for deploying, operating, and managing distributed storage systems on Kubernetes. It focuses on automatically managing Ceph within Kubernetes clusters and integrating storage solutions into cloud-native environments.

Using Rook, you can easily deploy and manage complex distributed storage systems like Ceph while leveraging Kubernetes’ automation and management capabilities.

Kubernetes Installation

Refer to the linked article for Kubernetes installation.

Important note: Storage nodes must have at least 32GB of memory.

Infrastructure Requirements

Master Node (Control Plane)

Component	IP	CPU	Memory
test-server	10.77.101.18	16	32G

Master (control plane) node specification

Worker Nodes

Component	IP	CPU	Memory
test-server-agent	10.77.101.12	16	32G
test-server-storage	10.77.101.16	16	32G

Worker node specifications

Additional Storage Node Configuration

Add the following configuration for the storage node:

Verify Installation

kubectl get nodes
NAME                  STATUS   ROLES           AGE   VERSION
test-server           Ready    control-plane   55s   v1.29.1
test-server-agent     Ready    <none>          55s   v1.29.1
test-server-storage   Ready    <none>          55s   v1.29.1

Rook-Ceph Operator Installation

Clone Repository and Prepare Values

git clone https://github.com/rook/rook.git
cd ~/rook/deploy/charts/rook-ceph
cp values.yaml somaz-values.yaml

Install Rook Operator using Helm

# Add Rook Helm repository
helm repo add rook-release https://charts.rook.io/release
helm repo update

# Install using local git repository
helm install --create-namespace --namespace rook-ceph rook-ceph . -f somaz-values.yaml

# Upgrade (if needed)
helm upgrade --create-namespace --namespace rook-ceph rook-ceph . -f somaz-values.yaml

Verify Operator Installation

# Check Custom Resource Definitions
kubectl get crd | grep ceph

# Verify operator pod
kubectl get pods -n rook-ceph
NAME                                 READY   STATUS    RESTARTS   AGE
rook-ceph-operator-799cf9f45-946r6   1/1     Running   0          28s

Rook-Ceph Cluster Installation

Prepare Storage Disks

First, verify and initialize disks on the test-server-storage node:

# Check available disks
lsblk
NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0     7:0    0  63.9M  1 loop /snap/core20/2105
loop1     7:1    0 368.2M  1 loop /snap/google-cloud-cli/207
loop2     7:2    0  40.4M  1 loop /snap/snapd/20671
loop3     7:3    0  91.9M  1 loop /snap/lxd/24061
sda       8:0    0    50G  0 disk
├─sda1    8:1    0  49.9G  0 part /
├─sda14   8:14   0     4M  0 part
└─sda15   8:15   0   106M  0 part /boot/efi
sdb       8:16   0    50G  0 disk
sdc       8:32   0    50G  0 disk
sdd       8:48   0    50G  0 disk

# Initialize disks
sudo wipefs -a /dev/sdb
sudo wipefs -a /dev/sdc
sudo wipefs -a /dev/sdd

Configure Cluster Values

cd ~/rook/deploy/charts/rook-ceph-cluster
cp values.yaml somaz-values.yaml

Edit the somaz-values.yaml file with the following key configurations:

configOverride: |
  [global]
  mon_allow_pool_delete = true
  osd_pool_default_size = 1
  osd_pool_default_min_size = 1

# Enable debugging toolbox
toolbox:
  enabled: true

cephClusterSpec:
  mon:
    count: 1
    allowMultiplePerNode: false

  mgr:
    count: 1
    allowMultiplePerNode: false
    modules:
      - name: pg_autoscaler
        enabled: true

  storage:
    useAllNodes: false
    useAllDevices: false
    nodes:
      - name: "test-server-storage"
        devices:
          - name: "sdb"
          - name: "sdc"
          - name: "sdd"

cephBlockPools:
  - name: ceph-blockpool
    spec:
      failureDomain: host
      replicated:
        size: 1

cephFileSystems:
  - name: ceph-filesystem
    spec:
      metadataPool:
        replicated:
          size: 1
      dataPools:
        - failureDomain: host
          replicated:
            size: 1
    storageClass:
      enabled: false
      isDefault: false
      name: ceph-filesystem

cephObjectStores:
  - name: ceph-objectstore
    spec:
      metadataPool:
        failureDomain: host
        replicated:
          size: 1
      dataPool:
        failureDomain: host
        erasureCoded:
          dataChunks: 2
          codingChunks: 1
      preservePoolsOnDelete: true
      gateway:
        port: 80
        instances: 1
        priorityClassName: system-cluster-critical
    storageClass:
      enabled: false
      name: ceph-bucket
      reclaimPolicy: Delete
      volumeBindingMode: "Immediate"
      parameters:
        region: us-east-1

Install Ceph Cluster

# Install Ceph cluster using Helm
helm install --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph . -f somaz-values.yaml

# Upgrade (if needed)
helm upgrade --create-namespace --namespace rook-ceph rook-ceph-cluster --set operatorNamespace=rook-ceph . -f somaz-values.yaml

Troubleshooting Installation Issues

If installation fails, clean up and restart:

# Delete Ceph cluster
helm delete -n rook-ceph rook-ceph-cluster

# Delete resources
kubectl delete cephblockpool -n rook-ceph ceph-blockpool
kubectl delete cephobjectstore -n rook-ceph ceph-objectstore
kubectl delete cephfilesystem -n rook-ceph ceph-filesystem
kubectl delete cephfilesystemsubvolumegroups.ceph.rook.io -n rook-ceph ceph-filesystem-csi

# Delete Ceph CRDs
kubectl get crd -o name | grep 'ceph.' | xargs kubectl delete
kubectl get crd -o name | grep 'objectbucket.' | xargs kubectl delete

# Delete operator
helm delete -n rook-ceph rook-ceph

# Verify cleanup
kubectl get all -n rook-ceph

# Reinitialize disks
sudo wipefs -a /dev/sdb
sudo wipefs -a /dev/sdc
sudo wipefs -a /dev/sdd

# Reinstall
cd ~/rook/deploy/charts/rook-ceph
helm install --create-namespace --namespace rook-ceph rook-ceph \
  . -f somaz-values.yaml

cd ~/rook/deploy/charts/rook-ceph-cluster
helm install --create-namespace --namespace rook-ceph rook-ceph-cluster \
  --set operatorNamespace=rook-ceph . -f somaz-values.yaml

Monitor Installation Progress

Check operator logs for troubleshooting:

kubectl logs -n rook-ceph rook-ceph-operator-799cf9f45-wbxk5 | less

Verification and Testing

Verify Installation

Installation takes approximately 10 minutes. The warning “1 pool(s) have no replicas configured” is expected due to replica=1 configuration.

# Check all pods
kubectl get pods -n rook-ceph
NAME                                                            READY   STATUS      RESTARTS        AGE
csi-cephfsplugin-7ldgx                                          2/2     Running     1 (2m39s ago)   3m26s
csi-cephfsplugin-ffmgn                                          2/2     Running     0               3m26s
csi-cephfsplugin-provisioner-67b5c7f475-5s659                   5/5     Running     1 (2m32s ago)   3m26s
csi-cephfsplugin-provisioner-67b5c7f475-lrkpm                   5/5     Running     0               3m26s
# ... additional pods

# Check storage classes
kubectl get storageclass
NAME                   PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
ceph-block (default)   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   3m13s

# Check Ceph version
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph version
ceph version 18.2.1 (7fe91d5d5842e04be3b4f514d6dd990c54b29c76) reef (stable)

# Check Ceph cluster status
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
  cluster:
    id:     e0949fd6-77ca-4572-bc62-bfe06d5127b8
    health: HEALTH_WARN
            11 pool(s) have no replicas configured

  services:
    mon: 1 daemons, quorum a (age 4m)
    mgr: a(active, since 3m)
    osd: 3 osds: 3 up (since 3m), 3 in (since 3m)

  data:
    volumes: 1/1 healthy
    pools:   11 pools, 137 pgs
    objects: 22 objects, 595 KiB
    usage:   94 MiB used, 150 GiB / 150 GiB avail
    pgs:     137 active+clean

Advanced Ceph Commands

# Check health details (HEALTH_WARN can be ignored for replica=1)
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph health detail

# Check placement group status
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph pg dump_stuck

# Check OSD status
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd status
ID  HOST                  USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  test-server-storage  34.6M  49.9G      0        0       0        0   exists,up
 1  test-server-storage  38.6M  49.9G      0        0       2      106   exists,up
 2  test-server-storage  34.6M  49.9G      0        0       0        0   exists,up

# Check OSD tree
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd tree

# List OSD pools
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd lspools

# Check Ceph resources
kubectl get cephblockpool -n rook-ceph
kubectl get cephobjectstore -n rook-ceph
kubectl get cephfilesystem -n rook-ceph
kubectl get cephfilesystemsubvolumegroups.ceph.rook.io -n rook-ceph

Testing Storage Functionality

Create Test Pod with Persistent Volume

cat <<EOF > test-pod.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-rbd-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ceph-block
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: pod-using-ceph-rbd
spec:
  containers:
  - name: my-container
    image: nginx
    volumeMounts:
    - mountPath: "/var/lib/www/html"
      name: mypd
  volumes:
  - name: mypd
    persistentVolumeClaim:
      claimName: ceph-rbd-pvc
EOF

kubectl apply -f test-pod.yaml

Verify Test Resources

kubectl get pv,pod,pvc
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS   REASON   AGE
persistentvolume/pvc-27e58696-061e-40f3-afb4-f7c992179ffe   1Gi        RWO            Delete           Bound    default/ceph-rbd-pvc   ceph-block              25s

NAME                     READY   STATUS    RESTARTS   AGE
pod/pod-using-ceph-rbd   1/1     Running   0          25s

NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/ceph-rbd-pvc   Bound    pvc-27e58696-061e-40f3-afb4-f7c992179ffe   1Gi        RWO            ceph-block     25s

Ceph Dashboard Configuration

Create External Dashboard Service

cat <<EOF > rook-ceph-mgr-dashboard-external.yaml
apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-mgr-dashboard-external-https
  namespace: rook-ceph
  labels:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
spec:
  ports:
  - name: dashboard
    port: 8443
    protocol: TCP
    targetPort: 8443
  selector:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
    mgr_role: active
  sessionAffinity: None
  type: NodePort
EOF

kubectl apply -f rook-ceph-mgr-dashboard-external.yaml

Configure Firewall Access

## Firewall ##
resource "google_compute_firewall" "nfs_server_ssh" {
  name    = "allow-ssh-nfs-server"
  network = var.shared_vpc

  allow {
    protocol = "tcp"
    ports    = ["22", "31571"]
  }

  source_ranges = ["${var.public_ip}/32"]
  target_tags   = [var.kubernetes_server, var.kubernetes_client]

  depends_on = [module.vpc]
}

Dashboard Access Configuration

Get Default Password

# Retrieve dashboard password
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
.BzmuOklD<D)~T_@7,pr

Reset Password (Optional)

# Create new password file
cat <<EOF > password.txt
somaz@2024
EOF

# Get tools pod name
kubectl get pods -n rook-ceph | grep tools
rook-ceph-tools-d99b58dc-4xjpr                                  1/1     Running     0          55m

# Copy password file to tools pod
kubectl -n rook-ceph cp password.txt rook-ceph-tools-d99b58dc-4xjpr:/tmp/password.txt

# Update dashboard credentials
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph dashboard set-login-credentials admin -i /tmp/password.txt

Access Dashboard

URL: https://[NODE-IP]:[NODE-PORT]
Username: admin
Password: Retrieved from secret or reset password

The dashboard provides comprehensive monitoring including:

Cluster overview and health status
OSD performance and utilization
Pool statistics and configuration
Physical disk visualization
Performance metrics and graphs

Production Considerations

High Availability Configuration

For production deployments:

cephClusterSpec:
  mon:
    count: 3  # Odd number for quorum
    allowMultiplePerNode: false

  mgr:
    count: 2  # Active-standby configuration
    allowMultiplePerNode: false

  storage:
    config:
      osdsPerDevice: "1"
      metadataDevice: "md0"  # Use SSD for metadata

Resource Requirements

Memory: Minimum 32GB per storage node
CPU: 16+ cores recommended for storage nodes
Network: 10Gbps+ for optimal performance
Disks: Dedicated disks for OSDs, separate metadata devices

Security Best Practices

Network Security: Configure proper firewall rules
RBAC: Implement Kubernetes RBAC for Rook resources
Encryption: Enable encryption at rest and in transit
Dashboard Security: Use strong passwords and HTTPS

Monitoring and Alerting

Prometheus Integration: Enable Ceph metrics collection
Grafana Dashboards: Import Ceph-specific dashboards
Alert Rules: Configure alerts for cluster health
Log Aggregation: Centralize Ceph logs for analysis

Conclusion

This comprehensive guide covered the complete deployment and operation of Rook-Ceph on Kubernetes, from infrastructure preparation through dashboard configuration.

Key Achievements:

Infrastructure Setup: Properly configured Kubernetes cluster with dedicated storage nodes
Operator Deployment: Successfully installed Rook operator using Helm
Cluster Configuration: Deployed Ceph cluster with proper resource allocation
Storage Integration: Configured storage classes and tested persistent volumes
Management Interface: Activated dashboard for monitoring and management

Best Practices Learned:

Resource Planning: Adequate memory and CPU allocation is crucial
Disk Management: Proper disk initialization and configuration
Configuration Management: Using Helm charts for reproducible deployments
Monitoring: Dashboard provides essential cluster visibility
Troubleshooting: Systematic approach to problem resolution

Future Enhancements:

Implement multi-node Ceph cluster for production
Configure backup and disaster recovery procedures
Integrate with monitoring stack (Prometheus/Grafana)
Implement automated capacity management
Set up cross-cluster replication for geo-redundancy

Rook-Ceph provides a powerful, Kubernetes-native solution for distributed storage, enabling organizations to run stateful applications with confidence in cloud-native environments.

Deploying and Operating Rook-Ceph on Kubernetes: Complete Implementation Guide

Overview

What is Cephadm?

Related Resources:

What is Rook-Ceph?

Kubernetes Installation

Infrastructure Requirements

Master Node (Control Plane)

Worker Nodes

Additional Storage Node Configuration

Verify Installation

Rook-Ceph Operator Installation

Clone Repository and Prepare Values

Install Rook Operator using Helm

Verify Operator Installation

Rook-Ceph Cluster Installation

Prepare Storage Disks

Configure Cluster Values

Install Ceph Cluster

Troubleshooting Installation Issues

Monitor Installation Progress

Verification and Testing

Verify Installation

Advanced Ceph Commands

Testing Storage Functionality

Create Test Pod with Persistent Volume

Verify Test Resources

Ceph Dashboard Configuration

Create External Dashboard Service

Configure Firewall Access

Dashboard Access Configuration

Get Default Password

Reset Password (Optional)

Access Dashboard

The dashboard provides comprehensive monitoring including:

Production Considerations

High Availability Configuration

Resource Requirements

Security Best Practices

Monitoring and Alerting

Conclusion

Key Achievements:

Best Practices Learned:

Future Enhancements:

References

Pulumi: Modern Infrastructure as Code with Programming Languages

Share

Somaz

Comments