Who Ran kubectl edit? — Building a Kubernetes Cluster Drift Detection Tool

Detect and report Kubernetes cluster drift with kube-diff CLI and GitHub Action

Who Ran kubectl edit? — Building a Kubernetes Cluster Drift Detection Tool



Overview

When operating Kubernetes clusters, a common problem arises: the actual cluster state diverges from the manifests defined in Git — known as Cluster Drift. Someone uses kubectl edit to manually change replicas, or kubectl scale to quickly adjust capacity, and suddenly the Git source and cluster state are out of sync.

While kubectl diff can compare manifests, it only supports plain YAML files, cannot directly handle Helm or Kustomize, and its raw unified diff output is difficult to read.

This post introduces two tools built to solve these problems:

kubectl diff kube-diff
Input YAML files only Helm / Kustomize / plain YAML
Output Raw unified diff Per-resource colorized diff + summary
New resources Full content dump NEW label
Deleted detection Not supported Detects resources only in cluster
CI integration Exit code only JSON / Markdown report output
Filtering None Namespace, kind, label selector filter


Table of Contents


1. Project Structure


kube-diff (CLI)

kube-diff/
├── cmd/
│   ├── main.go                # Entry point
│   └── cli/
│       ├── root.go            # Cobra root command
│       ├── file.go            # file subcommand
│       ├── helm.go            # helm subcommand
│       ├── kustomize.go       # kustomize subcommand
│       ├── version.go         # version subcommand
│       └── run.go             # Shared comparison logic
├── internal/
│   ├── source/                # Manifest loaders (file, helm, kustomize)
│   ├── cluster/               # K8s dynamic client fetcher
│   ├── diff/                  # Normalization & unified diff
│   └── report/                # Color/JSON/Markdown output
├── examples/
│   ├── file/                  # Plain YAML examples
│   ├── helm/                  # Helm chart examples
│   └── kustomize/             # Kustomize overlay examples
├── scripts/
│   ├── demo.sh                # Demo script
│   └── demo-clean.sh          # Demo cleanup
├── .goreleaser.yml            # GoReleaser config
└── Makefile


kube-diff-action (GitHub Action)

kube-diff-action/
├── action.yml                 # Composite Action definition
├── scripts/
│   ├── install.sh             # kube-diff binary installation
│   ├── run.sh                 # kube-diff execution & output setup
│   └── comment.sh             # PR comment create/update
└── .github/workflows/
    ├── ci.yml                 # CI (ShellCheck, kind cluster tests)
    ├── release.yml            # Release automation
    └── use-action.yml         # Smoke Test (released action verification)


2. Core Design Decisions


2.1 Kubernetes Dynamic Client

Instead of depending on specific resource types, kube-diff uses a dynamic client with unstructured.Unstructured to handle any kind of Kubernetes resource.

// internal/cluster/fetcher.go
func (f *Fetcher) Get(ctx context.Context, apiVersion, kind, namespace, name string) (*unstructured.Unstructured, error) {
    gvr, err := f.resolveGVR(apiVersion, kind)
    if err != nil {
        return nil, err
    }

    var resource *unstructured.Unstructured
    if namespace != "" {
        resource, err = f.client.Resource(gvr).Namespace(namespace).Get(ctx, name, metav1.GetOptions{})
    } else {
        resource, err = f.client.Resource(gvr).Get(ctx, name, metav1.GetOptions{})
    }
    return resource, err
}


2.2 Shared Comparison Logic

All three subcommands — file, helm, and kustomize — share the same runDiff() function. Only the Source interface implementation differs.

// cmd/cli/run.go — Core flow
func runDiff(cmd *cobra.Command, src source.Source) error {
    // 1. Load local resources
    resources, err := src.Load()

    // 2. Filter (namespace, kind, label selector)
    // ...

    // 3. Fetch corresponding resources from cluster
    fetcher, _ := cluster.NewFetcher(kubeconfig, kubeContext)
    for _, r := range resources {
        clusterObj, err := fetcher.Get(ctx, r.APIVersion, r.Kind, r.Namespace, r.Name)
        // 4. Compare diff
        result, _ := diff.Compare(r.Object, clusterObj)
        results = append(results, result)
    }

    // 5. Output report (color, plain, json, markdown)
    summary := report.NewSummary(results)
    // ...
}


2.3 Kubernetes Default Value Normalization

Kubernetes automatically injects various default values into resources. Comparing these directly would generate a massive number of false positives.

For example, when a Deployment is applied, the cluster automatically adds fields like:

These defaults are normalized and removed per Kind:

// internal/diff/normalize.go
func Normalize(obj *unstructured.Unstructured) *unstructured.Unstructured {
    // Remove common metadata (managedFields, uid, resourceVersion, etc.)
    // ...

    // Remove Kind-specific defaults
    switch kind {
    case "Deployment", "StatefulSet":
        normalizeDeploymentSpec(spec)  // progressDeadlineSeconds, strategy, etc.
    case "Service":
        normalizeServiceSpec(spec)     // clusterIP, sessionAffinity, etc.
    case "Namespace":
        normalizeNamespaceSpec(obj)    // spec.finalizers, etc.
    case "Pod":
        normalizePodSpec(spec)
    case "Job":
        normalizeJobSpec(spec)
    case "DaemonSet":
        normalizeDaemonSetSpec(spec)
    }
    return normalized
}

Thanks to this normalization, only meaningful differences appear in the diff output.


2.4 Exit Code Design

Code Meaning
0 No changes detected
1 Changes detected
2 Error occurred

In CI, exit code 1 means “drift exists” — not an error. The GitHub Action treats both exit code 0 and 1 as success, and only considers 2 as failure.


3. Sample Application

Here are the sample resources used in the demo.


3.1 Plain YAML (file mode)

Namespace

# examples/file/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: kube-diff-demo

ConfigMap

# examples/file/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: demo-config
  namespace: kube-diff-demo
data:
  APP_ENV: production
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"

Deployment

# examples/file/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-app
  namespace: kube-diff-demo
  labels:
    app: demo-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: demo-app
  template:
    metadata:
      labels:
        app: demo-app
    spec:
      containers:
        - name: app
          image: nginx:1.25
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 200m
              memory: 256Mi
          envFrom:
            - configMapRef:
                name: demo-config

Service

# examples/file/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: demo-app
  namespace: kube-diff-demo
spec:
  selector:
    app: demo-app
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: ClusterIP


3.2 Helm Chart (helm mode)

Chart.yaml

# examples/helm/demo-chart/Chart.yaml
apiVersion: v2
name: demo-chart
description: Demo Helm chart for kube-diff examples
version: 0.1.0
type: application

values.yaml

# examples/helm/demo-chart/values.yaml
replicaCount: 2
image: nginx:1.25
namespace: kube-diff-demo

config:
  APP_ENV: production
  LOG_LEVEL: info
  MAX_CONNECTIONS: "100"

templates/deployment.yaml

templates/configmap.yaml

templates/service.yaml

values-drift.yaml (intentionally different values)

# examples/helm/values-drift.yaml
replicaCount: 3
image: nginx:1.26
namespace: kube-diff-demo

config:
  APP_ENV: staging
  LOG_LEVEL: debug
  MAX_CONNECTIONS: "200"
  NEW_KEY: "added"


3.3 Kustomize (kustomize mode)

base/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: kube-diff-demo

resources:
  - configmap.yaml
  - deployment.yaml
  - service.yaml

overlays/dev/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: kube-diff-demo

resources:
  - ../../base

patches:
  - target:
      kind: Deployment
      name: demo-app
    patch: |
      - op: replace
        path: /spec/replicas
        value: 3
      - op: replace
        path: /spec/template/spec/containers/0/image
        value: nginx:1.26

  - target:
      kind: ConfigMap
      name: demo-config
    patch: |
      - op: replace
        path: /data/LOG_LEVEL
        value: debug
      - op: add
        path: /data/NEW_KEY
        value: added


4. CLI Usage and Demo


4.1 Installation

# Homebrew
brew install somaz94/tap/kube-diff

# Krew (kubectl plugin)
kubectl krew install diff2

# Binary
curl -sL https://github.com/somaz94/kube-diff/releases/latest/download/kube-diff_linux_amd64.tar.gz | tar xz
sudo mv kube-diff /usr/local/bin/

# From source
go install github.com/somaz94/kube-diff/cmd@latest


4.2 Basic Usage

# Plain YAML comparison
kube-diff file ./manifests/ -n production

# Helm chart comparison
kube-diff helm ./my-chart --values values-prod.yaml --release my-release -n production

# Kustomize overlay comparison
kube-diff kustomize ./overlays/production -n production


4.3 Filtering

# Kind filter
kube-diff file ./manifests/ -n production -k Deployment,Service

# Label selector filter
kube-diff file ./manifests/ -n production -l app=nginx,env=prod

# Combined
kube-diff file ./manifests/ -n production -k Deployment -l app=nginx


4.4 Output Formats

# Colorized (default)
kube-diff file ./manifests/ -n production

# JSON
kube-diff file ./manifests/ -n production -o json

# Markdown
kube-diff file ./manifests/ -n production -o markdown

# Summary only
kube-diff file ./manifests/ -n production -s


4.5 Demo Results

Run the full demo with make demo-all. Below are the key results from each phase.

Phase 2 — No Drift

Right after deploying manifests to the cluster, all resources show as unchanged:

✓ OK     ConfigMap/demo-config (namespace: kube-diff-demo)
✓ OK     Deployment/demo-app (namespace: kube-diff-demo)
✓ OK     Namespace/kube-diff-demo
✓ OK     Service/demo-app (namespace: kube-diff-demo)

Summary: 4 resources — 4 unchanged

Phase 3 — Manual Changes on Cluster

# Manually change replicas
kubectl scale deploy/demo-app --replicas=5 -n kube-diff-demo

# Add a new key to ConfigMap
kubectl patch configmap demo-config -n kube-diff-demo --type merge -p '{"data":{"DEBUG_MODE":"true"}}'

Phase 4 — Drift Detected

~ CHANGED ConfigMap/demo-config (namespace: kube-diff-demo)
--- cluster
+++ local
@@ -1,7 +1,6 @@
 apiVersion: v1
 data:
     APP_ENV: production
-    DEBUG_MODE: "true"
     LOG_LEVEL: info
     MAX_CONNECTIONS: "100"

~ CHANGED Deployment/demo-app (namespace: kube-diff-demo)
--- cluster
+++ local
@@ -6,7 +6,7 @@
     name: demo-app
     namespace: kube-diff-demo
 spec:
-    replicas: 5
+    replicas: 2
     selector:

✓ OK     Namespace/kube-diff-demo
✓ OK     Service/demo-app (namespace: kube-diff-demo)

Summary: 4 resources — 2 changed, 2 unchanged

The manually added DEBUG_MODE and the changed replicas: 5 are accurately detected.

Phase 7 — JSON Output

{
  "total": 4,
  "changed": 2,
  "new": 0,
  "deleted": 0,
  "unchanged": 2,
  "resources": [
    {
      "kind": "ConfigMap",
      "name": "demo-config",
      "namespace": "kube-diff-demo",
      "status": "changed"
    },
    {
      "kind": "Deployment",
      "name": "demo-app",
      "namespace": "kube-diff-demo",
      "status": "changed"
    },
    {
      "kind": "Namespace",
      "name": "kube-diff-demo",
      "status": "unchanged"
    },
    {
      "kind": "Service",
      "name": "demo-app",
      "namespace": "kube-diff-demo",
      "status": "unchanged"
    }
  ]
}

Phase 7 — Markdown Output

## kube-diff Report

**4** resources — **2** changed, 2 unchanged

| Status | Resource | Namespace |
|--------|----------|-----------|
| CHANGED | ConfigMap/demo-config | kube-diff-demo |
| CHANGED | Deployment/demo-app | kube-diff-demo |
| OK | Namespace/kube-diff-demo | - |
| OK | Service/demo-app | kube-diff-demo |


5. GitHub Action Implementation


5.1 Why Composite Action?

kube-diff-action is implemented as a Composite Action rather than a Docker Action. The reasoning is straightforward:


5.2 action.yml


5.3 install.sh — Binary Installation

Automatically detects OS/Arch and downloads the latest binary from GitHub Releases:

Note
  • With set -euo pipefail, an unset VERSION environment variable causes an unbound variable error. The VERSION="${VERSION:-latest}" default is essential.


5.4 run.sh — Execution & Output Setup

#!/usr/bin/env bash
set -euo pipefail

# Build command
CMD="kube-diff ${INPUT_SOURCE} ${INPUT_PATH}"

# Helm-specific flags
if [[ "${INPUT_SOURCE}" == "helm" ]]; then
  [[ -n "${INPUT_VALUES}" ]] && # Add values files
  [[ -n "${INPUT_RELEASE}" ]] && CMD+=" -r ${INPUT_RELEASE}"
fi

# Global flags
[[ -n "${INPUT_NAMESPACE}" ]] && CMD+=" -n ${INPUT_NAMESPACE}"
[[ -n "${INPUT_KIND}" ]] && CMD+=" -k ${INPUT_KIND}"
[[ -n "${INPUT_SELECTOR}" ]] && CMD+=" -l ${INPUT_SELECTOR}"

# Execute & capture result
set +e
RESULT=$(eval "${CMD}" 2>&1)
EXIT_CODE=$?
set -e

# Set GitHub Output (multiline)
{
  echo "result<<KUBE_DIFF_EOF"
  echo "${RESULT}"
  echo "KUBE_DIFF_EOF"
} >> "${GITHUB_OUTPUT}"

# Exit code 0 (no changes), 1 (changes detected) → success / 2 (error) → failure
if [[ ${EXIT_CODE} -eq 2 ]]; then
  exit 1
fi


5.5 comment.sh — PR Comment

Uses a <!-- kube-diff-action --> marker to update existing comments or create new ones:


5.6 kube-diff-action Usage Example

- name: Check drift
  id: diff
  uses: somaz94/kube-diff-action@v1
  with:
    source: file
    path: ./manifests/
    namespace: production
    output: markdown

- name: Fail if drift
  if: steps.diff.outputs.has-changes == 'true'
  run: |
    echo "::error::Drift detected — review the PR comment for details"
    exit 1


6. CI/CD Pipeline


6.1 kube-diff CI

# .github/workflows/ci.yml (excerpt)
test-unit:
  steps:
    - uses: actions/checkout@v6
    - uses: actions/setup-go@v5
      with:
        go-version-file: go.mod
    - run: go test ./... -v -race -cover

e2e:
  steps:
    - uses: actions/checkout@v6
    - uses: actions/setup-go@v5
    - uses: helm/kind-action@v1      # Create kind cluster
    - run: make build
    - run: make demo-all             # Run full demo


6.2 kube-diff-action CI

Tests file/helm modes in a real Kubernetes environment using a kind cluster:

# .github/workflows/ci.yml (excerpt)
test-action-file:
  steps:
    - uses: actions/checkout@v4
    - name: Setup kind cluster
      uses: helm/kind-action@v1
    - name: Apply test manifests
      run: kubectl apply -f /tmp/test-manifests/
    - name: Run kube-diff action
      id: diff
      uses: ./                       # Test local action
      with:
        source: file
        path: /tmp/test-manifests/
        namespace: default
        comment: 'false'


6.3 Smoke Test (Released Action)

A workflow that verifies the released somaz94/kube-diff-action@v1 works correctly:

# .github/workflows/use-action.yml
on:
  workflow_dispatch:
  workflow_run:
    workflows: ["Create release"]
    types: [completed]

jobs:
  smoke-test-file:
    steps:
      - uses: somaz94/kube-diff-action@v1  # Use released version
        with:
          source: file
          path: /tmp/test-manifests/
          namespace: default
          comment: 'false'


7. Distribution — GoReleaser, Homebrew, Krew


7.1 GoReleaser Configuration


When a tag is pushed, GoReleaser automatically:

  1. Builds binaries for 6 platforms (linux/darwin/windows × amd64/arm64)
  2. Creates a GitHub Release with attached binaries
  3. Updates the Homebrew tap (somaz94/homebrew-tap)
  4. Updates the Krew plugin index (somaz94/krew-index)


7.2 kube-diff-action Version Management

Since kube-diff-action downloads the latest kube-diff binary at runtime in install.sh, users automatically get the newest version on their next run when kube-diff is upgraded.

The v1 tag is kept pointing to the latest release commit using major-tag-action:


This allows somaz94/kube-diff-action@v1 to always use the latest patch version.


8. Real-World Use Cases


CLI Usage

Scenario Command
Pre-deploy drift check kube-diff file ./manifests/ -n production
Post-incident audit kube-diff file ./manifests/ -n production -o json > drift-report.json
GitOps sync verification kube-diff kustomize ./overlays/production -n production
Multi-cluster comparison kube-diff file ./manifests/ --context prod-cluster -n app
Helm upgrade preview kube-diff helm ./chart/ -f values-prod.yaml -r my-release -n production


GitHub Action Usage

PR Drift Gate

— Automatically check drift on manifest change PRs:

Scheduled Drift Monitoring — Check drift every 6 hours with Slack alerts:


Conclusion

Building kube-diff taught several valuable lessons:

  1. Kubernetes default value normalization is critical — Naively comparing local and cluster resources generates massive false positives. Defaults like progressDeadlineSeconds, revisionHistoryLimit, and dnsPolicy must be accurately removed per Kind to produce meaningful diffs.

  2. Exit code design matters — Without distinguishing “changes detected” from “error” in CI, workflows fail on every run. The exit code 0/1/2 scheme cleanly separates these cases.

  3. Composite Actions are lightweight and fast — When leveraging an existing CLI binary, Composite Actions are far more practical than Docker Actions. No build time — just download the binary and go.

  4. Runtime latest-version download strategy — Since kube-diff-action downloads the kube-diff binary at runtime, upgrading the CLI automatically gives Action users the latest features.

As of v0.2.1, kube-diff supports file/helm/kustomize comparison, namespace/kind/label filtering, and color/plain/json/markdown output. If you want to quickly detect drift in your Kubernetes clusters and automate monitoring in CI, give it a try.



References