Upgrading a Kubernetes Cluster with Kubespray (2026V.)

A complete guide to upgrading Kubernetes v1.33 to v1.34 using Kubespray with etcd backup, certificate renewal, and rollback strategies

Upgrading a Kubernetes Cluster with Kubespray (2026V.)



Overview

Operating a Kubernetes cluster requires periodic upgrades for security patches, new features, and certificate renewal.

Kubespray supports automated upgrades through the upgrade-cluster.yml playbook, following the order of etcd → Control Plane → Worker. Instead of manually connecting to each node and running kubeadm upgrade, you can upgrade the entire cluster with a single Ansible command.

This guide covers the following key areas:

For cluster installation, refer to the previous post.


Upgrade Principles

There are critical rules that must be followed when upgrading Kubernetes through Kubespray.

Principle Description
One minor version at a time v1.33 → v1.34 ✅, v1.33 → v1.35 ❌
Patch versions are free v1.33.3 → v1.33.5 ✅
Upgrade order etcd → Control Plane → Worker (handled automatically by Kubespray)
Worker upgrade method drain (Pod eviction) → upgrade → uncordon (restore)
etcd backup required Always create a snapshot before upgrading
Staging first Validate in a test environment before applying to production if possible


Kubespray ↔ Kubernetes Version Mapping

Each Kubespray version supports a specific range of Kubernetes versions. Always verify before upgrading.

Kubespray K8s Support Range
v2.25.x v1.29.x ~ v1.31.x
v2.26.x v1.30.x ~ v1.32.x
v2.27.x v1.31.x ~ v1.33.x
v2.28.x v1.32.x ~ v1.34.x
v2.30.0 v1.32.x ~ v1.34.x

For exact mappings, check Kubespray Releases. You can also verify the supported versions for a specific tag with:

git show <tag>:roles/kubespray_defaults/defaults/main/main.yml | head -5


Upgrade Procedure


Step 1. Pre-check — Verify Current Versions

ssh somaz@10.10.10.17
cd ~/kubespray

# Check current Kubespray version
git describe --tags

# Check current K8s version
kubectl get nodes -o wide

# Check certificate expiration
sudo kubeadm certs check-expiration


Step 2. etcd Backup (Required)

Always create an etcd snapshot before upgrading. This is the only rollback mechanism if something goes wrong.


Step 3. Switch Kubespray Version

To upgrade Kubernetes, you must first switch Kubespray itself to a tag that supports the target K8s version.

# Activate venv
source ~/kubespray/venv/bin/activate
cd ~/kubespray

# Fetch latest tags
git fetch --tags
git tag -l | sort -V | tail -20

# Stash any local changes
git stash

# Checkout desired version
git checkout v2.30.0

# Update dependencies
pip install -r requirements.txt


Step 4. Review Configuration Changes (Diff)

When the Kubespray version changes, default configuration values may also change. Compare the sample inventory with your inventory.

# Compare k8s-cluster.yml
diff ~/kubespray/inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml ~/kubespray/inventory/somaz-cluster/group_vars/k8s_cluster/k8s-cluster.yml

# Compare containerd.yml
diff ~/kubespray/inventory/sample/group_vars/all/containerd.yml ~/kubespray/inventory/somaz-cluster/group_vars/all/containerd.yml


Key Changes from v2.28.0 → v2.30.0

File Change Impact
etcd.yml Removed (etcd settings moved elsewhere) No impact if no custom etcd settings
k8s-net-weave.yml Removed (Weave CNI deprecated) No impact when using Cilium
k8s-net-cilium.yml kube_proxy_replacement: partial → false, cilium_extra_values added No impact (default values)
k8s-cluster.yml Cilium-related note added to kube_owner comment Minor change
containerd.yml engine/root → options.Root structure change Comments only (examples)
addons.yml local_path_provisioner_image_tag removed Now auto-managed


Step 5. Set kube_version (Optional)

To specify a particular K8s version, modify the following file. If not set, the default latest version for the Kubespray tag will be used.

vi ~/kubespray/inventory/somaz-cluster/group_vars/k8s_cluster/k8s-cluster.yml
# Add or modify
kube_version: v1.34.3


Step 6. Run the Upgrade


Important

Use upgrade-cluster.yml, not cluster.yml. The latter is for fresh installations only.

The upgrade typically takes 20–40 minutes. Expect API server downtime of 10–30 seconds, and services with only 1 replica may experience brief interruptions during Worker drain.


Step 7. Verify the Upgrade

# Check all node versions
kubectl get nodes -o wide

# Check for unhealthy Pods
kubectl get pods -A | grep -v Running | grep -v Completed

# Cluster info
kubectl cluster-info

# Component status
kubectl get --raw='/readyz?verbose'

# Verify containerd insecure registry configuration is preserved
ssh somaz@10.10.10.18 "cat /etc/containerd/certs.d/harbor.example.com/hosts.toml"


Upgrade Automation Script

The following GitHub repository contains automation scripts that cover the above steps:

somaz94/script-collection - kubespray


Certificate Auto-Renewal

When you run upgrade-cluster.yml with Kubespray, kubeadm automatically renews certificates.


Auto-Renewed Certificates

Certificate Purpose Renewal
apiserver.crt API Server TLS Automatic
apiserver-kubelet-client.crt API → kubelet communication Automatic
front-proxy-client.crt API aggregation Automatic
etcd/server.crt etcd TLS Automatic
etcd/peer.crt etcd peer communication Automatic
admin.conf kubectl access Automatic

kubeadm automatically renews certificates with less than 1 year remaining during upgrades. The default validity period is 1 year.


Check Current Certificate Expiration

sudo kubeadm certs check-expiration


What Happens If You Skip Upgrades for Over 1 Year?

Certificates will expire and the cluster will become unresponsive. In that case, manual renewal is required:

sudo kubeadm certs renew all
sudo systemctl restart kubelet

Recommendation: Performing at least one upgrade within a year automatically renews certificates, eliminating the need for separate maintenance.

You can also use the following script: k8s_certs_renew.sh


Upgrade Error Recovery


health-check Job Failure

[ERROR CreateJob]: Job "upgrade-health-check-xxxxx" in the namespace "kube-system"
did not complete in 15s: no condition of type Complete

This occurs when the health check Job created by kubeadm upgrade does not complete within 15 seconds. It is typically caused by the image being pulled for the first time. Re-running usually resolves the issue since the image will be cached.


Diagnosis

# Check failed Jobs
kubectl get jobs -n kube-system | grep upgrade-health
kubectl describe job <job-name> -n kube-system

# Check Pod status
kubectl get pods -n kube-system | grep upgrade-health


Resolution

# Control Plane may be in SchedulingDisabled state — uncordon first
kubectl uncordon k8s-control-01

# Re-run upgrade
ansible-playbook -i inventory/somaz-cluster/inventory.ini upgrade-cluster.yml -b --become-user=root


containerd Insecure Registry Reset

During upgrades, containerd may be reconfigured and manually modified hosts.toml files can be overwritten. If the configuration is included in the Kubespray inventory’s containerd.yml, it will be automatically preserved.

# Verify configuration is preserved after upgrade
for node in 10.10.10.17 10.10.10.18 10.10.10.19 10.10.10.22; do
  echo "=== $node ==="
  ssh somaz@$node "cat /etc/containerd/certs.d/harbor.example.com/hosts.toml 2>/dev/null || echo 'NOT FOUND'"
  echo ""
done

If the configuration was lost, re-run the containerd tag to restore it:

ansible-playbook -i inventory/somaz-cluster/inventory.ini cluster.yml --tags containerd -b --become-user=root

Important: Never modify insecure registry settings directly on servers. Always manage them through the Kubespray inventory (containerd.yml) to ensure they survive upgrades.


Rollback

Kubespray does not support automatic rollback. If an upgrade fails, you must restore from the etcd backup created in Step 2.

# Restore etcd
sudo ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup-YYYYMMDD.db --data-dir=/var/lib/etcd-restore

# Manual etcd data directory replacement is required afterward
# Reference: https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#restoring-an-etcd-cluster

Rollback is a complex and risky operation. Always create an etcd backup before upgrading, and test in a staging environment whenever possible.


Post-Upgrade Inventory Synchronization

After the upgrade is complete, back up the Control Plane’s inventory to your local machine (GitLab, etc.).

# Run from local machine
scp -r somaz@10.10.10.17:~/kubespray/inventory/somaz-cluster ~/my-project/kubespray/inventory-somaz-cluster


Version Check Script

Create a script for quick before/after comparison.

#!/bin/bash
# check-version.sh

echo "=== K8s Version ==="
kubectl get nodes -o wide

echo ""
echo "=== Kubespray Version ==="
cd ~/kubespray && git describe --tags

echo ""
echo "=== Containerd Version ==="
containerd --version

echo ""
echo "=== Cilium Version ==="
kubectl get ds cilium -n kube-system -o=jsonpath='{.spec.template.spec.containers[0].image}'
echo ""

echo ""
echo "=== Certificate Expiration ==="
sudo kubeadm certs check-expiration 2>/dev/null | head -15


Real-World Upgrade Case: v1.33.3 → v1.34.3

The following documents an actual upgrade performed on April 7, 2026.


Environment

Component Before After
Kubespray v2.28.0 v2.30.0
Kubernetes v1.33.3 v1.34.3
Cilium v1.17.3 v1.19.1
containerd v2.1.3 v2.2.1


Process

  1. Created etcd backup
  2. Reviewed configuration changes with upgrade-kubespray.sh --diff-only
  3. Switched Kubespray with upgrade-kubespray.sh --target v2.30.0
  4. Ran upgrade-cluster.yml
  5. Re-ran once due to health-check Job failure
  6. Verified all nodes upgraded to v1.34.3
  7. Confirmed containerd insecure registry settings preserved
  8. Synchronized inventory to local backup


Issue Encountered

health-check Job 15-second timeout

[ERROR CreateJob]: Job "upgrade-health-check-1775526129268" in the namespace "kube-system"
did not complete in 15s

The Control Plane was already upgraded to v1.34.3 but remained in SchedulingDisabled state. Resolved by running kubectl uncordon k8s-control-01 and re-executing the playbook.


Time Breakdown

Phase Duration
etcd backup 1 min
Kubespray switch (git checkout + pip install) 3 min
upgrade-cluster.yml 1st run (failed) 11 min
upgrade-cluster.yml 2nd run (success) ~25 min
Verification 5 min
Total ~45 min


Post-Upgrade Verification

k get nodes
NAME             STATUS   ROLES           AGE    VERSION
k8s-compute-01   Ready    <none>          252d   v1.34.3
k8s-compute-02   Ready    <none>          252d   v1.34.3
k8s-compute-03   Ready    <none>          245d   v1.34.3
k8s-control-01   Ready    control-plane   252d   v1.34.3


Important Notes Summary

Item Description
etcd Backup Always create a snapshot before upgrading — the only rollback mechanism
Minor Version Only one minor version upgrade at a time
Playbook Use upgrade-cluster.yml, not cluster.yml
venv Always ensure Kubespray's Python venv is activated (source venv/bin/activate)
StatefulSet Verify data for DB and other StatefulSet Pods before draining
PDB Check PodDisruptionBudget settings — drain may be blocked
Replicas Maintain at least 2 Deployment replicas for zero-downtime services
Insecure Registry Manage through inventory to survive upgrades safely
Certificates At least one upgrade per year ensures auto-renewal
Off-Peak Timing Perform upgrades during low-traffic periods when possible


Conclusion

Kubernetes cluster upgrades may seem like a burden, but with Kubespray, you can upgrade the entire cluster with a single Ansible command.

Key takeaways:

Building a habit of regular upgrades lets you simultaneously address security patches, certificate renewal, and new feature adoption.



References