21 min to read
Who Ran kubectl edit? — Building a Kubernetes Cluster Drift Detection Tool
Detect and report Kubernetes cluster drift with kube-diff CLI and GitHub Action
Overview
When operating Kubernetes clusters, a common problem arises: the actual cluster state diverges from the manifests defined in Git — known as Cluster Drift. Someone uses kubectl edit to manually change replicas, or kubectl scale to quickly adjust capacity, and suddenly the Git source and cluster state are out of sync.
While kubectl diff can compare manifests, it only supports plain YAML files, cannot directly handle Helm or Kustomize, and its raw unified diff output is difficult to read.
This post introduces two tools built to solve these problems:
- kube-diff — A Go CLI tool that compares plain YAML, Helm charts, and Kustomize overlays against live cluster state
- kube-diff-action — A GitHub Actions Composite Action that integrates kube-diff into CI/CD pipelines
kubectl diff |
kube-diff |
|
|---|---|---|
| Input | YAML files only | Helm / Kustomize / plain YAML |
| Output | Raw unified diff | Per-resource colorized diff + summary |
| New resources | Full content dump | NEW label |
| Deleted detection | Not supported | Detects resources only in cluster |
| CI integration | Exit code only | JSON / Markdown report output |
| Filtering | None | Namespace, kind, label selector filter |
Table of Contents
- 1. Project Structure
- 2. Core Design Decisions
- 3. Sample Application
- 4. CLI Usage and Demo
- 5. GitHub Action Implementation
- 6. CI/CD Pipeline
- 7. Distribution — GoReleaser, Homebrew, Krew
- 8. Real-World Use Cases
- Conclusion
- References
1. Project Structure
kube-diff (CLI)
kube-diff/
├── cmd/
│ ├── main.go # Entry point
│ └── cli/
│ ├── root.go # Cobra root command
│ ├── file.go # file subcommand
│ ├── helm.go # helm subcommand
│ ├── kustomize.go # kustomize subcommand
│ ├── version.go # version subcommand
│ └── run.go # Shared comparison logic
├── internal/
│ ├── source/ # Manifest loaders (file, helm, kustomize)
│ ├── cluster/ # K8s dynamic client fetcher
│ ├── diff/ # Normalization & unified diff
│ └── report/ # Color/JSON/Markdown output
├── examples/
│ ├── file/ # Plain YAML examples
│ ├── helm/ # Helm chart examples
│ └── kustomize/ # Kustomize overlay examples
├── scripts/
│ ├── demo.sh # Demo script
│ └── demo-clean.sh # Demo cleanup
├── .goreleaser.yml # GoReleaser config
└── Makefile
kube-diff-action (GitHub Action)
kube-diff-action/
├── action.yml # Composite Action definition
├── scripts/
│ ├── install.sh # kube-diff binary installation
│ ├── run.sh # kube-diff execution & output setup
│ └── comment.sh # PR comment create/update
└── .github/workflows/
├── ci.yml # CI (ShellCheck, kind cluster tests)
├── release.yml # Release automation
└── use-action.yml # Smoke Test (released action verification)
2. Core Design Decisions
2.1 Kubernetes Dynamic Client
Instead of depending on specific resource types, kube-diff uses a dynamic client with unstructured.Unstructured to handle any kind of Kubernetes resource.
// internal/cluster/fetcher.go
func (f *Fetcher) Get(ctx context.Context, apiVersion, kind, namespace, name string) (*unstructured.Unstructured, error) {
gvr, err := f.resolveGVR(apiVersion, kind)
if err != nil {
return nil, err
}
var resource *unstructured.Unstructured
if namespace != "" {
resource, err = f.client.Resource(gvr).Namespace(namespace).Get(ctx, name, metav1.GetOptions{})
} else {
resource, err = f.client.Resource(gvr).Get(ctx, name, metav1.GetOptions{})
}
return resource, err
}
2.2 Shared Comparison Logic
All three subcommands — file, helm, and kustomize — share the same runDiff() function. Only the Source interface implementation differs.
// cmd/cli/run.go — Core flow
func runDiff(cmd *cobra.Command, src source.Source) error {
// 1. Load local resources
resources, err := src.Load()
// 2. Filter (namespace, kind, label selector)
// ...
// 3. Fetch corresponding resources from cluster
fetcher, _ := cluster.NewFetcher(kubeconfig, kubeContext)
for _, r := range resources {
clusterObj, err := fetcher.Get(ctx, r.APIVersion, r.Kind, r.Namespace, r.Name)
// 4. Compare diff
result, _ := diff.Compare(r.Object, clusterObj)
results = append(results, result)
}
// 5. Output report (color, plain, json, markdown)
summary := report.NewSummary(results)
// ...
}
2.3 Kubernetes Default Value Normalization
Kubernetes automatically injects various default values into resources. Comparing these directly would generate a massive number of false positives.
For example, when a Deployment is applied, the cluster automatically adds fields like:
spec.progressDeadlineSeconds: 600spec.revisionHistoryLimit: 10spec.strategy.type: RollingUpdatespec.template.spec.dnsPolicy: ClusterFirstspec.template.spec.restartPolicy: Always- Container
terminationMessagePath,terminationMessagePolicy - Container port
protocol: TCP - And more…
These defaults are normalized and removed per Kind:
// internal/diff/normalize.go
func Normalize(obj *unstructured.Unstructured) *unstructured.Unstructured {
// Remove common metadata (managedFields, uid, resourceVersion, etc.)
// ...
// Remove Kind-specific defaults
switch kind {
case "Deployment", "StatefulSet":
normalizeDeploymentSpec(spec) // progressDeadlineSeconds, strategy, etc.
case "Service":
normalizeServiceSpec(spec) // clusterIP, sessionAffinity, etc.
case "Namespace":
normalizeNamespaceSpec(obj) // spec.finalizers, etc.
case "Pod":
normalizePodSpec(spec)
case "Job":
normalizeJobSpec(spec)
case "DaemonSet":
normalizeDaemonSetSpec(spec)
}
return normalized
}
Thanks to this normalization, only meaningful differences appear in the diff output.
2.4 Exit Code Design
| Code | Meaning |
|---|---|
0 |
No changes detected |
1 |
Changes detected |
2 |
Error occurred |
In CI, exit code 1 means “drift exists” — not an error. The GitHub Action treats both exit code 0 and 1 as success, and only considers 2 as failure.
3. Sample Application
Here are the sample resources used in the demo.
3.1 Plain YAML (file mode)
Namespace
# examples/file/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: kube-diff-demo
ConfigMap
# examples/file/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: demo-config
namespace: kube-diff-demo
data:
APP_ENV: production
LOG_LEVEL: info
MAX_CONNECTIONS: "100"
Deployment
# examples/file/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-app
namespace: kube-diff-demo
labels:
app: demo-app
spec:
replicas: 2
selector:
matchLabels:
app: demo-app
template:
metadata:
labels:
app: demo-app
spec:
containers:
- name: app
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
envFrom:
- configMapRef:
name: demo-config
Service
# examples/file/service.yaml
apiVersion: v1
kind: Service
metadata:
name: demo-app
namespace: kube-diff-demo
spec:
selector:
app: demo-app
ports:
- port: 80
targetPort: 80
protocol: TCP
type: ClusterIP
3.2 Helm Chart (helm mode)
Chart.yaml
# examples/helm/demo-chart/Chart.yaml
apiVersion: v2
name: demo-chart
description: Demo Helm chart for kube-diff examples
version: 0.1.0
type: application
values.yaml
# examples/helm/demo-chart/values.yaml
replicaCount: 2
image: nginx:1.25
namespace: kube-diff-demo
config:
APP_ENV: production
LOG_LEVEL: info
MAX_CONNECTIONS: "100"
templates/deployment.yaml
templates/configmap.yaml
templates/service.yaml
values-drift.yaml (intentionally different values)
# examples/helm/values-drift.yaml
replicaCount: 3
image: nginx:1.26
namespace: kube-diff-demo
config:
APP_ENV: staging
LOG_LEVEL: debug
MAX_CONNECTIONS: "200"
NEW_KEY: "added"
3.3 Kustomize (kustomize mode)
base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kube-diff-demo
resources:
- configmap.yaml
- deployment.yaml
- service.yaml
overlays/dev/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: kube-diff-demo
resources:
- ../../base
patches:
- target:
kind: Deployment
name: demo-app
patch: |
- op: replace
path: /spec/replicas
value: 3
- op: replace
path: /spec/template/spec/containers/0/image
value: nginx:1.26
- target:
kind: ConfigMap
name: demo-config
patch: |
- op: replace
path: /data/LOG_LEVEL
value: debug
- op: add
path: /data/NEW_KEY
value: added
4. CLI Usage and Demo
4.1 Installation
# Homebrew
brew install somaz94/tap/kube-diff
# Krew (kubectl plugin)
kubectl krew install diff2
# Binary
curl -sL https://github.com/somaz94/kube-diff/releases/latest/download/kube-diff_linux_amd64.tar.gz | tar xz
sudo mv kube-diff /usr/local/bin/
# From source
go install github.com/somaz94/kube-diff/cmd@latest
4.2 Basic Usage
# Plain YAML comparison
kube-diff file ./manifests/ -n production
# Helm chart comparison
kube-diff helm ./my-chart --values values-prod.yaml --release my-release -n production
# Kustomize overlay comparison
kube-diff kustomize ./overlays/production -n production
4.3 Filtering
# Kind filter
kube-diff file ./manifests/ -n production -k Deployment,Service
# Label selector filter
kube-diff file ./manifests/ -n production -l app=nginx,env=prod
# Combined
kube-diff file ./manifests/ -n production -k Deployment -l app=nginx
4.4 Output Formats
# Colorized (default)
kube-diff file ./manifests/ -n production
# JSON
kube-diff file ./manifests/ -n production -o json
# Markdown
kube-diff file ./manifests/ -n production -o markdown
# Summary only
kube-diff file ./manifests/ -n production -s
4.5 Demo Results
Run the full demo with make demo-all. Below are the key results from each phase.
Phase 2 — No Drift
Right after deploying manifests to the cluster, all resources show as unchanged:
✓ OK ConfigMap/demo-config (namespace: kube-diff-demo)
✓ OK Deployment/demo-app (namespace: kube-diff-demo)
✓ OK Namespace/kube-diff-demo
✓ OK Service/demo-app (namespace: kube-diff-demo)
Summary: 4 resources — 4 unchanged
Phase 3 — Manual Changes on Cluster
# Manually change replicas
kubectl scale deploy/demo-app --replicas=5 -n kube-diff-demo
# Add a new key to ConfigMap
kubectl patch configmap demo-config -n kube-diff-demo --type merge -p '{"data":{"DEBUG_MODE":"true"}}'
Phase 4 — Drift Detected
~ CHANGED ConfigMap/demo-config (namespace: kube-diff-demo)
--- cluster
+++ local
@@ -1,7 +1,6 @@
apiVersion: v1
data:
APP_ENV: production
- DEBUG_MODE: "true"
LOG_LEVEL: info
MAX_CONNECTIONS: "100"
~ CHANGED Deployment/demo-app (namespace: kube-diff-demo)
--- cluster
+++ local
@@ -6,7 +6,7 @@
name: demo-app
namespace: kube-diff-demo
spec:
- replicas: 5
+ replicas: 2
selector:
✓ OK Namespace/kube-diff-demo
✓ OK Service/demo-app (namespace: kube-diff-demo)
Summary: 4 resources — 2 changed, 2 unchanged
The manually added DEBUG_MODE and the changed replicas: 5 are accurately detected.
Phase 7 — JSON Output
{
"total": 4,
"changed": 2,
"new": 0,
"deleted": 0,
"unchanged": 2,
"resources": [
{
"kind": "ConfigMap",
"name": "demo-config",
"namespace": "kube-diff-demo",
"status": "changed"
},
{
"kind": "Deployment",
"name": "demo-app",
"namespace": "kube-diff-demo",
"status": "changed"
},
{
"kind": "Namespace",
"name": "kube-diff-demo",
"status": "unchanged"
},
{
"kind": "Service",
"name": "demo-app",
"namespace": "kube-diff-demo",
"status": "unchanged"
}
]
}
Phase 7 — Markdown Output
## kube-diff Report
**4** resources — **2** changed, 2 unchanged
| Status | Resource | Namespace |
|--------|----------|-----------|
| CHANGED | ConfigMap/demo-config | kube-diff-demo |
| CHANGED | Deployment/demo-app | kube-diff-demo |
| OK | Namespace/kube-diff-demo | - |
| OK | Service/demo-app | kube-diff-demo |
5. GitHub Action Implementation
5.1 Why Composite Action?
kube-diff-action is implemented as a Composite Action rather than a Docker Action. The reasoning is straightforward:
- All it does is download and run the kube-diff binary
- No need to build or maintain a Docker image
- Direct access to the runner’s
kubeconfigandkubectl - Faster execution
5.2 action.yml
5.3 install.sh — Binary Installation
Automatically detects OS/Arch and downloads the latest binary from GitHub Releases:
- With
set -euo pipefail, an unsetVERSIONenvironment variable causes an unbound variable error. TheVERSION="${VERSION:-latest}"default is essential.
5.4 run.sh — Execution & Output Setup
#!/usr/bin/env bash
set -euo pipefail
# Build command
CMD="kube-diff ${INPUT_SOURCE} ${INPUT_PATH}"
# Helm-specific flags
if [[ "${INPUT_SOURCE}" == "helm" ]]; then
[[ -n "${INPUT_VALUES}" ]] && # Add values files
[[ -n "${INPUT_RELEASE}" ]] && CMD+=" -r ${INPUT_RELEASE}"
fi
# Global flags
[[ -n "${INPUT_NAMESPACE}" ]] && CMD+=" -n ${INPUT_NAMESPACE}"
[[ -n "${INPUT_KIND}" ]] && CMD+=" -k ${INPUT_KIND}"
[[ -n "${INPUT_SELECTOR}" ]] && CMD+=" -l ${INPUT_SELECTOR}"
# Execute & capture result
set +e
RESULT=$(eval "${CMD}" 2>&1)
EXIT_CODE=$?
set -e
# Set GitHub Output (multiline)
{
echo "result<<KUBE_DIFF_EOF"
echo "${RESULT}"
echo "KUBE_DIFF_EOF"
} >> "${GITHUB_OUTPUT}"
# Exit code 0 (no changes), 1 (changes detected) → success / 2 (error) → failure
if [[ ${EXIT_CODE} -eq 2 ]]; then
exit 1
fi
5.5 comment.sh — PR Comment
Uses a <!-- kube-diff-action --> marker to update existing comments or create new ones:
5.6 kube-diff-action Usage Example
- name: Check drift
id: diff
uses: somaz94/kube-diff-action@v1
with:
source: file
path: ./manifests/
namespace: production
output: markdown
- name: Fail if drift
if: steps.diff.outputs.has-changes == 'true'
run: |
echo "::error::Drift detected — review the PR comment for details"
exit 1
6. CI/CD Pipeline
6.1 kube-diff CI
# .github/workflows/ci.yml (excerpt)
test-unit:
steps:
- uses: actions/checkout@v6
- uses: actions/setup-go@v5
with:
go-version-file: go.mod
- run: go test ./... -v -race -cover
e2e:
steps:
- uses: actions/checkout@v6
- uses: actions/setup-go@v5
- uses: helm/kind-action@v1 # Create kind cluster
- run: make build
- run: make demo-all # Run full demo
6.2 kube-diff-action CI
Tests file/helm modes in a real Kubernetes environment using a kind cluster:
# .github/workflows/ci.yml (excerpt)
test-action-file:
steps:
- uses: actions/checkout@v4
- name: Setup kind cluster
uses: helm/kind-action@v1
- name: Apply test manifests
run: kubectl apply -f /tmp/test-manifests/
- name: Run kube-diff action
id: diff
uses: ./ # Test local action
with:
source: file
path: /tmp/test-manifests/
namespace: default
comment: 'false'
6.3 Smoke Test (Released Action)
A workflow that verifies the released somaz94/kube-diff-action@v1 works correctly:
# .github/workflows/use-action.yml
on:
workflow_dispatch:
workflow_run:
workflows: ["Create release"]
types: [completed]
jobs:
smoke-test-file:
steps:
- uses: somaz94/kube-diff-action@v1 # Use released version
with:
source: file
path: /tmp/test-manifests/
namespace: default
comment: 'false'
7. Distribution — GoReleaser, Homebrew, Krew
7.1 GoReleaser Configuration
When a tag is pushed, GoReleaser automatically:
- Builds binaries for 6 platforms (linux/darwin/windows × amd64/arm64)
- Creates a GitHub Release with attached binaries
- Updates the Homebrew tap (
somaz94/homebrew-tap) - Updates the Krew plugin index (
somaz94/krew-index)
7.2 kube-diff-action Version Management
Since kube-diff-action downloads the latest kube-diff binary at runtime in install.sh, users automatically get the newest version on their next run when kube-diff is upgraded.
The v1 tag is kept pointing to the latest release commit using major-tag-action:
This allows somaz94/kube-diff-action@v1 to always use the latest patch version.
8. Real-World Use Cases
CLI Usage
| Scenario | Command |
|---|---|
| Pre-deploy drift check | kube-diff file ./manifests/ -n production |
| Post-incident audit | kube-diff file ./manifests/ -n production -o json > drift-report.json |
| GitOps sync verification | kube-diff kustomize ./overlays/production -n production |
| Multi-cluster comparison | kube-diff file ./manifests/ --context prod-cluster -n app |
| Helm upgrade preview | kube-diff helm ./chart/ -f values-prod.yaml -r my-release -n production |
GitHub Action Usage
PR Drift Gate
— Automatically check drift on manifest change PRs:
Scheduled Drift Monitoring — Check drift every 6 hours with Slack alerts:
Conclusion
Building kube-diff taught several valuable lessons:
-
Kubernetes default value normalization is critical — Naively comparing local and cluster resources generates massive false positives. Defaults like
progressDeadlineSeconds,revisionHistoryLimit, anddnsPolicymust be accurately removed per Kind to produce meaningful diffs. -
Exit code design matters — Without distinguishing “changes detected” from “error” in CI, workflows fail on every run. The exit code 0/1/2 scheme cleanly separates these cases.
-
Composite Actions are lightweight and fast — When leveraging an existing CLI binary, Composite Actions are far more practical than Docker Actions. No build time — just download the binary and go.
-
Runtime latest-version download strategy — Since kube-diff-action downloads the kube-diff binary at runtime, upgrading the CLI automatically gives Action users the latest features.
As of v0.2.1, kube-diff supports file/helm/kustomize comparison, namespace/kind/label filtering, and color/plain/json/markdown output. If you want to quickly detect drift in your Kubernetes clusters and automate monitoring in CI, give it a try.
Comments