Kubernetes Operators and Custom Resource Definitions (CRDs)

Extending Kubernetes with domain-specific automation and custom APIs

Featured image



Overview

Kubernetes Operators and Custom Resource Definitions (CRDs) are powerful mechanisms for extending Kubernetes functionality beyond its built-in capabilities. This guide covers what they are, how they work together, and how to develop your own extensions to Kubernetes.

graph TD A[Kubernetes API Server] --> B[Built-in Resources] A --> C[Custom Resources] B --> D[Pods] B --> E[Services] B --> F[Deployments] C --> G[CRDs] G --> H[Custom Controller/Operator] H --> I[Domain-specific Logic] I --> J[Automated Operations] style A fill:#326ce5,stroke:#fff,color:#fff style G fill:#ffa726,stroke:#fff,color:#000 style H fill:#ffa726,stroke:#fff,color:#000 style I fill:#ffa726,stroke:#fff,color:#000
Key Concepts

At its core, Kubernetes operates on a simple principle: define the desired state of your system, and controllers will work to maintain that state. Operators extend this paradigm to complex, application-specific operations by encoding operational knowledge into software.


Kubernetes Operators: Automation for Complex Applications

What is an Operator?

An Operator is a Kubernetes-native application that watches and takes action on specific resources, implementing domain-specific operational knowledge in code.

sequenceDiagram participant U as User participant A as Kubernetes API participant O as Operator participant CR as Custom Resource participant K as Kubernetes Objects U->>A: Create/Update Custom Resource A->>CR: Store in etcd A->>O: Notify via Watch API O->>CR: Read desired state O->>K: Create/Update required objects O->>CR: Update status with current state U->>A: Get Custom Resource status A->>U: Return current status

Operator Components:

  1. Custom Resource Definition (CRD):
    • Defines a new resource type in the Kubernetes API
    • Describes the schema and validation rules
  2. Controller:
    • Implements the control loop logic
    • Watches for changes to custom resources
    • Takes actions to reconcile actual state with desired state
  3. Domain-specific Knowledge:
    • Encoded operational expertise
    • Application lifecycle management rules
    • Complex operational procedures automated in code
The Origin of Operators

The concept of Operators was introduced by CoreOS (now part of Red Hat) in 2016. They were designed to solve the challenge of managing stateful applications in Kubernetes, which require domain-specific knowledge for operations like scaling, upgrading, and backup/restore.

The name "Operator" comes from the human operators who traditionally managed such complex applications manually.


The Operator Pattern

Operators follow the Kubernetes reconciliation pattern:

  1. Observe: Monitor the state of the custom resource
  2. Analyze: Compare current state with desired state
  3. Act: Execute changes to reach the desired state
  4. Update: Record the current state in the resource’s status
graph LR A[Observe] --> B[Analyze] B --> C[Act] C --> D[Update Status] D --> A style A fill:#4CAF50,stroke:#fff,color:#fff style B fill:#2196F3,stroke:#fff,color:#fff style C fill:#FFC107,stroke:#000,color:#000 style D fill:#9C27B0,stroke:#fff,color:#fff

Why Use Operators?

Benefits of Operators
  • Automation: Codify complex operational tasks to reduce manual intervention
  • Consistency: Ensure identical operations across environments
  • Reliability: Reduce human error through tested, automated procedures
  • Domain-specific: Encode specialized knowledge for specific applications
  • Kubernetes-native: Integrate with existing Kubernetes tools and workflows

Real-world Applications

Operators are particularly valuable for managing:

Application Type Operational Challenges Example Operators
Databases - Replication configuration
- Scaling with data integrity
- Backup and restore
- Version upgrades
- PostgreSQL Operator
- MongoDB Community Operator
- Redis Operator
- Cassandra Operator
Message Queues - Cluster configuration
- Topic management
- Network partitioning
- Kafka Operator
- RabbitMQ Operator
Infrastructure - Certificate management
- Network policy enforcement
- Monitoring configuration
- cert-manager
- Prometheus Operator
- Istio Operator


Operator Maturity Model

Operators range in sophistication according to the Red Hat Operator Maturity Model:

graph TD A[Level 1: Basic Install] --> B[Level 2: Seamless Upgrades] B --> C[Level 3: Full Lifecycle] C --> D[Level 4: Deep Insights] D --> E[Level 5: Auto Pilot] style A fill:#c5e1a5,stroke:#333,stroke-width:1px style B fill:#aed581,stroke:#333,stroke-width:1px style C fill:#8bc34a,stroke:#333,stroke-width:1px style D fill:#689f38,stroke:#333,stroke-width:1px style E fill:#33691e,stroke:#333,stroke-width:1px,color:#fff
  1. Level 1 - Basic Install: Automated application installation and configuration
  2. Level 2 - Seamless Upgrades: Patch and minor version upgrades
  3. Level 3 - Full Lifecycle: Application backups, failure recovery
  4. Level 4 - Deep Insights: Application metrics, alerts, optimized configurations based on usage
  5. Level 5 - Auto Pilot: Automatic scaling, auto-tuning, anomaly detection and resolution


Custom Resource Definitions (CRDs): Extending the Kubernetes API

What are CRDs?

CRDs allow you to define new resource types that extend the Kubernetes API, making custom resources feel like native Kubernetes objects.

Custom Resources vs. Built-in Resources

Custom resources work just like native Kubernetes resources:

  • Accessed through the same API endpoints
  • Managed with kubectl and other Kubernetes clients
  • Secured through RBAC and admission controls
  • Stored in etcd alongside built-in resources
  • Watched for changes through the same mechanisms

CRD Architecture

graph TD A[CustomResourceDefinition] -->|defines| B[Custom Resource Type] B -->|instances of| C[Custom Resource Objects] D[Controller/Operator] -->|watches| C D -->|creates/updates| E[Kubernetes Resources] style A fill:#FFC107,stroke:#333,stroke-width:1px style B fill:#FF9800,stroke:#333,stroke-width:1px style C fill:#FF5722,stroke:#333,stroke-width:1px style D fill:#673AB7,stroke:#333,stroke-width:1px,color:#fff style E fill:#2196F3,stroke:#333,stroke-width:1px,color:#fff

CRD Components

1. The CRD Itself

Defines the schema, validation, and naming for your custom resource type:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  names:
    kind: Database
    plural: databases
    singular: database
    shortNames:
    - db
  scope: Namespaced
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        required: ["spec"]
        properties:
          spec:
            type: object
            required: ["engine", "version"]
            properties:
              engine:
                type: string
                enum: ["postgres", "mysql", "mongodb"]
              version:
                type: string
              replicas:
                type: integer
                minimum: 1
                default: 1
              storage:
                type: string
                pattern: "^\\d+Gi$"
          status:
            type: object
            properties:
              phase:
                type: string
              readyReplicas:
                type: integer
    subresources:
      status: {}
    additionalPrinterColumns:
    - name: Engine
      type: string
      jsonPath: .spec.engine
    - name: Version
      type: string
      jsonPath: .spec.version
    - name: Replicas
      type: integer
      jsonPath: .spec.replicas
    - name: Status
      type: string
      jsonPath: .status.phase

2. Custom Resources

Instances of your custom type that users create:

apiVersion: example.com/v1
kind: Database
metadata:
  name: my-production-db
spec:
  engine: postgres
  version: "13.4"
  replicas: 3
  storage: "20Gi"

3. Controller/Operator

Software that watches for custom resources and takes actions to align the actual state with the desired state.


Advanced CRD Features

Schema Validation

The schema section of a CRD defines the structure and validation rules for your custom resource:

Additional Printer Columns

Define custom columns for kubectl get output:

$ kubectl get databases
NAME              ENGINE     VERSION   REPLICAS   STATUS
my-production-db  postgres   13.4      3          Running
dev-database      mysql      8.0       1          Provisioning

Subresources

Subresources provide special endpoints for custom resources:


Building Operators: Tools and Frameworks

Multiple frameworks exist to simplify operator development:

Framework Description Best For
Kubebuilder - Developed by Kubernetes SIG API Machinery
- Uses Go programming language
- Includes scaffolding, testing tools, and libraries
- Complex operators
- Large-scale projects
- Deep integration
Operator SDK - Part of the Operator Framework by Red Hat
- Supports Go, Ansible, and Helm
- Includes Operator Lifecycle Manager integration
- Multi-language support
- Quick prototyping
- OLM deployment
KUDO - CNCF sandbox project
- Declarative approach
- Operator building for non-programmers
- Simple operators
- Non-developers
- Quick adoption
Kopf - Python-based framework
- Lightweight design
- Focus on simplicity
- Python developers
- Rapid prototyping
- Smaller projects


Deep Dive: Kubebuilder Framework

Kubebuilder is a popular framework for building Kubernetes APIs and controllers using Go.

Project Structure

.
├── Dockerfile                # Container image definition
├── Makefile                  # Build, test, deploy commands
├── PROJECT                   # Project metadata
├── api/                      # CRD API definitions
│   └── v1/
│       ├── database_types.go # Custom resource type definition
│       ├── groupversion_info.go
│       └── zz_generated.deepcopy.go
├── config/                   # Kubernetes manifests
│   ├── crd/                  # Generated CRD YAML
│   ├── rbac/                 # Role-based access control
│   ├── manager/              # Controller manager deployment
│   └── samples/              # Example custom resources
├── controllers/              # Controller implementation
│   ├── database_controller.go
│   └── suite_test.go
└── main.go                   # Entry point

Core Components in Code

1. API Type Definition:

// Database is the Schema for the databases API
type Database struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`

    Spec   DatabaseSpec   `json:"spec,omitempty"`
    Status DatabaseStatus `json:"status,omitempty"`
}

// DatabaseSpec defines the desired state of Database
type DatabaseSpec struct {
    Engine   string `json:"engine"`
    Version  string `json:"version"`
    Replicas int    `json:"replicas,omitempty"`
    Storage  string `json:"storage,omitempty"`
}

// DatabaseStatus defines the observed state of Database
type DatabaseStatus struct {
    Phase         string `json:"phase,omitempty"`
    ReadyReplicas int    `json:"readyReplicas,omitempty"`
}

2. Controller Implementation:

// DatabaseReconciler reconciles a Database object
type DatabaseReconciler struct {
    client.Client
    Log    logr.Logger
    Scheme *runtime.Scheme
}

// +kubebuilder:rbac:groups=example.com,resources=databases,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=example.com,resources=databases/status,verbs=get;update;patch

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    log := r.Log.WithValues("database", req.NamespacedName)

    // Fetch the Database instance
    var database examplecomv1.Database
    if err := r.Get(ctx, req.NamespacedName, &database); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Your reconciliation logic here
    // 1. Check if StatefulSet exists, create if not
    // 2. Ensure StatefulSet matches desired state
    // 3. Update status with current state

    return ctrl.Result{}, nil
}

3. Main Function:

func main() {
    // Setup manager, scheme, controllers
    mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
        Scheme:             scheme,
        MetricsBindAddress: metricsAddr,
        Port:               9443,
        LeaderElection:     enableLeaderElection,
    })
    if err != nil {
        setupLog.Error(err, "unable to start manager")
        os.Exit(1)
    }

    // Register controller
    if err = (&controllers.DatabaseReconciler{
        Client: mgr.GetClient(),
        Log:    ctrl.Log.WithName("controllers").WithName("Database"),
        Scheme: mgr.GetScheme(),
    }).SetupWithManager(mgr); err != nil {
        setupLog.Error(err, "unable to create controller", "controller", "Database")
        os.Exit(1)
    }

    // Start manager
    setupLog.Info("starting manager")
    if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
        setupLog.Error(err, "problem running manager")
        os.Exit(1)
    }
}

Common Development Commands

# Initialize new project
kubebuilder init --domain example.com

# Create API (generates CRD and controller)
kubebuilder create api --group example --version v1 --kind Database

# Generate manifests
make manifests

# Install CRDs
make install

# Run controller locally (development)
make run

# Build and push container image
make docker-build docker-push IMG=example.com/operator:v0.1.0

# Deploy to cluster
make deploy IMG=example.com/operator:v0.1.0


Development Workflow

graph LR A[Design CRD] --> B[Define Types in Go] B --> C[Implement Controller Logic] C --> D[Test Locally] D -->|Issues| C D -->|Success| E[Build & Deploy] E --> F[Monitor & Iterate] style A fill:#bbdefb,stroke:#333,stroke-width:1px style B fill:#90caf9,stroke:#333,stroke-width:1px style C fill:#64b5f6,stroke:#333,stroke-width:1px style D fill:#42a5f5,stroke:#333,stroke-width:1px style E fill:#2196f3,stroke:#333,stroke-width:1px,color:#fff style F fill:#1976d2,stroke:#333,stroke-width:1px,color:#fff

Stages of Operator Development

Stage Activities Best Practices
Design - Define CRD specs
- Plan controller logic
- Design API schema
- Keep schema simple and focused
- Define clear validation rules
- Consider versioning strategy early
- Document field purposes and constraints
Implementation - Write controller code
- Implement reconciliation
- Add validation
- Follow idempotent design patterns
- Use owner references for created resources
- Handle edge cases and error conditions
- Leverage finalizers for cleanup logic
Testing - Unit tests
- Integration tests
- E2E testing
- Mock external dependencies
- Test scenarios for error handling
- Use envtest for controller tests
- Create test fixtures for common scenarios
Deployment - Build container image
- Deploy to cluster
- Monitor performance
- Use minimal base images
- Set resource requests/limits
- Configure proper RBAC permissions
- Implement health checks and metrics


Best Practices

CRD Design:
  • Follow Kubernetes conventions for naming, structure, and versioning
  • Use semantic versioning for your API versions
  • Make fields optional when possible to improve forward compatibility
  • Set defaults for optional fields to simplify usage
  • Use strong validation to catch misconfigurations early
  • Document all fields with clear descriptions
Controller Implementation:
  • Design for idempotency - controllers should safely retry operations
  • Handle errors gracefully with appropriate retries and backoffs
  • Implement informative logging for debugging and auditing
  • Update status frequently to reflect the current state
  • Add finalizers for proper cleanup on deletion
  • Set owner references for automatic cleanup of dependent resources
Testing and Reliability:
  • Write comprehensive tests for controller logic
  • Test edge cases and recovery scenarios
  • Implement leader election for high availability
  • Export metrics for performance monitoring
  • Add thorough documentation for users of your operator

Real-world Example: Simple Namespace Synchronization Operator

This example demonstrates a basic operator that synchronizes configurations across namespaces.

CRD Definition

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: namespacesynchronizations.sync.example.com
spec:
  group: sync.example.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                sourceNamespace:
                  type: string
                targetNamespaces:
                  type: array
                  items:
                    type: string
                resources:
                  type: array
                  items:
                    type: object
                    properties:
                      kind:
                        type: string
                      name:
                        type: string
            status:
              type: object
              properties:
                conditions:
                  type: array
                  items:
                    type: object
                    properties:
                      type:
                        type: string
                      status:
                        type: string
                      lastTransitionTime:
                        type: string
                      reason:
                        type: string
                      message:
                        type: string
  scope: Cluster
  names:
    plural: namespacesynchronizations
    singular: namespacesynchronization
    kind: NamespaceSynchronization
    shortNames:
    - nssync

Custom Resource Example

apiVersion: sync.example.com/v1
kind: NamespaceSynchronization
metadata:
  name: config-sync
spec:
  sourceNamespace: base-config
  targetNamespaces:
  - team-a
  - team-b
  - team-c
  resources:
  - kind: ConfigMap
    name: shared-config
  - kind: Secret
    name: common-credentials

Controller Logic Overview

The controller would:

  1. Watch for NamespaceSynchronization resources
  2. List the specified resources in the source namespace
  3. For each target namespace:
    • Create or update the resources
    • Handle differences in metadata (e.g., namespace)
    • Apply owner references for tracking
  4. Update the status with sync results



References