Installing Kubernetes with Kubespray and Adding Worker Nodes (2024V.)

A comprehensive guide to setting up Kubernetes using Kubespray on GCP

Featured image

Image Reference link



Overview

Kubespray is a powerful tool that combines the flexibility of Ansible with the robustness of Kubernetes, enabling the deployment of production-ready Kubernetes clusters on various infrastructures. This guide provides detailed instructions for installing a Kubernetes cluster using Kubespray on Google Cloud Platform (GCP) and demonstrates how to add worker nodes to scale your cluster.

What is Kubespray?

Kubespray is a composition of Ansible playbooks, inventory, provisioning tools, and domain knowledge for deploying a production-ready Kubernetes cluster. It allows for highly customizable deployments and supports multiple cloud providers, bare metal installations, and virtualized environments.

Key features include:

  • Composable attributes
  • Multiple network plugin support (Calico, Flannel, Cilium, etc.)
  • HA cluster setup
  • Configurable addons
  • Support for most popular Linux distributions
graph TD A[Kubespray Repository] --> B[Ansible Playbooks] B --> C[Kubernetes Components] C --> D[Control Plane Setup] C --> E[Node Setup] C --> F[Networking] C --> G[Add-ons] D --> H[API Server] D --> I[Controller Manager] D --> J[Scheduler] D --> K[etcd] E --> L[kubelet] E --> M[kube-proxy] F --> N[Network Plugin] F --> O[Service Networking] G --> P[DNS] G --> Q[Dashboard] G --> R[Metrics Server] style A fill:#bbdefb,stroke:#333,stroke-width:1px style B fill:#90caf9,stroke:#333,stroke-width:1px style C fill:#64b5f6,stroke:#333,stroke-width:1px style D fill:#42a5f5,stroke:#333,stroke-width:1px style E fill:#42a5f5,stroke:#333,stroke-width:1px style F fill:#42a5f5,stroke:#333,stroke-width:1px style G fill:#42a5f5,stroke:#333,stroke-width:1px


System Configuration

Environment Details

Component Specification
Operating System Ubuntu 20.04 LTS (Focal)
Cloud Provider Google Compute Engine (GCP)
Kubernetes Version v1.28.6 (deployed by Kubespray)
CNI Plugin Calico (default)
Container Runtime containerd
Python Version 3.10.13

Node Specifications

Control Plane Node
  • Hostname: test-server
  • IP: 10.77.101.62
  • CPU: 2 cores
  • Memory: 8096MB
  • Role: Control Plane + etcd
Worker Nodes
  • Node 1: test-server-agent (10.77.101.57, 2 CPU, 8096MB RAM)
  • Node 2: test-server-agent2 (10.77.101.200, 2 CPU, 8096MB RAM)
  • Role: Worker nodes running application workloads


Infrastructure Setup

Infrastructure as Code (IaC)

We use Terraform to provision our GCP infrastructure. Here are the key resources:

Control Plane Node Configuration

resource "google_compute_address" "test_server_ip" {
  name = var.test_server_ip
}

resource "google_compute_instance" "test_server" {
  name         = var.test_server
  machine_type = "n2-standard-2"
  zone         = "${var.region}-a"
  
  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2004-lts"
      size  = 10
    }
  }

  network_interface {
    network    = var.shared_vpc
    subnetwork = "${var.subnet_share}-mgmt-a"
    access_config {
      nat_ip = google_compute_address.test_server_ip.address
    }
  }
  
  # Recommended metadata for Kubernetes nodes
  metadata = {
    "startup-script" = <<-EOF
      #!/bin/bash
      swapoff -a
      sed -i '/swap/d' /etc/fstab
      
      # Set system parameters for Kubernetes
      cat > /etc/sysctl.d/99-kubernetes.conf <<EOF2
      net.bridge.bridge-nf-call-iptables = 1
      net.ipv4.ip_forward = 1
      EOF2
      sysctl --system
    EOF
  }
}

Worker Node Configuration

resource "google_compute_address" "test_server_agent_ip" {
  name = var.test_server_agent_ip
}

resource "google_compute_instance" "test_server_agent" {
  name         = var.test_server_agent
  machine_type = "n2-standard-2"
  zone         = "${var.region}-a"
  
  boot_disk {
    initialize_params {
      image = "ubuntu-os-cloud/ubuntu-2004-lts"
      size  = 10
    }
  }

  network_interface {
    network    = var.shared_vpc
    subnetwork = "${var.subnet_share}-mgmt-a"
    access_config {
      nat_ip = google_compute_address.test_server_agent_ip.address
    }
  }
  
  # Same startup script as control plane for Kubernetes prerequisites
  metadata = {
    "startup-script" = <<-EOF
      #!/bin/bash
      swapoff -a
      sed -i '/swap/d' /etc/fstab
      
      # Set system parameters for Kubernetes
      cat > /etc/sysctl.d/99-kubernetes.conf <<EOF2
      net.bridge.bridge-nf-call-iptables = 1
      net.ipv4.ip_forward = 1
      EOF2
      sysctl --system
    EOF
  }
}


Prerequisites

Before starting the Kubespray installation, you need to prepare your environment.

System Requirements

Node Preparation

SSH Key Setup

# Generate SSH key if needed
ssh-keygen -t rsa -b 4096 -C "kubespray-deployment"

# Copy SSH key to all nodes
ssh-copy-id somaz@10.77.101.62  # Control plane
ssh-copy-id somaz@10.77.101.57  # Worker 1
ssh-copy-id somaz@10.77.101.200 # Worker 2

# Update /etc/hosts for easier node access
cat << EOF | sudo tee -a /etc/hosts
10.77.101.62 test-server
10.77.101.57 test-server-agent
10.77.101.200 test-server-agent2
EOF

Package Installation

# Update package lists
sudo apt-get update

# Install Python 3.10
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get -y update
sudo apt install -y python3.10 python3-pip git python3.10-venv

# Verify Python version
python3.10 --version  # Should show Python 3.10.13

System Configuration

# Disable swap (required for Kubernetes)
sudo swapoff -a
sudo sed -i '/swap/d' /etc/fstab

# Load required kernel modules
cat << EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Configure kernel parameters
cat << EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

sudo sysctl --system


Kubespray Deployment Process

sequenceDiagram participant Admin as Operator participant Deploy as Deployment Host participant CP as Control Plane participant Worker as Worker Nodes Admin->>Deploy: Clone Kubespray repository Admin->>Deploy: Setup Python virtual environment Admin->>Deploy: Install dependencies Admin->>Deploy: Configure inventory Deploy->>CP: Check SSH connectivity Deploy->>Worker: Check SSH connectivity Admin->>Deploy: Run Ansible playbook Deploy->>CP: Install control plane components Deploy->>Worker: Install worker components Deploy->>CP: Initialize cluster Deploy->>Worker: Join nodes to cluster CP->>Worker: Establish cluster communication Admin->>CP: Configure kubectl Admin->>CP: Verify cluster status

1. Clone Repository and Setup Environment

# Clone the Kubespray repository
git clone https://github.com/kubernetes-sigs/kubespray.git

# Setup virtual environment
VENVDIR=kubespray-venv
KUBESPRAYDIR=kubespray
python3.10 -m venv $VENVDIR
source $VENVDIR/bin/activate
cd $KUBESPRAYDIR

# Install dependencies
pip install -U -r requirements.txt

# Check Ansible version
ansible --version

2. Prepare Ansible Inventory

# Copy sample inventory
cp -rfp inventory/sample inventory/somaz-cluster

# Update inventory with nodes
declare -a IPS=(10.77.101.62 10.77.101.57)
CONFIG_FILE=inventory/somaz-cluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

3. Configure Inventory

The inventory generator creates a basic configuration, but we’ll make additional adjustments for our setup.

# inventory/somaz-cluster/inventory.ini
[all]
test-server ansible_host=10.77.101.62  ip=10.77.101.62
test-server-agent ansible_host=10.77.101.57  ip=10.77.101.57

# Control plane node(s)
[kube_control_plane]
test-server

# etcd cluster member(s)
[etcd]
test-server

# Kubernetes worker node(s)
[kube_node]
test-server-agent

# All groups with assigned roles
[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

Advanced Inventory Configuration (Optional)

You can further customize your deployment by editing these additional files:

# Edit group variables for all nodes
vi inventory/somaz-cluster/group_vars/all/all.yml

# Customize Kubernetes-specific parameters
vi inventory/somaz-cluster/group_vars/k8s_cluster/k8s-cluster.yml

# Configure network plugin options
vi inventory/somaz-cluster/group_vars/k8s_cluster/k8s-net-*.yml

Common customizations:

4. Verify Ansible Connectivity

# Test connection to all nodes
ansible all -i inventory/somaz-cluster/inventory.ini -m ping

# Optional: Update apt cache on all nodes
ansible all -i inventory/somaz-cluster/inventory.ini -m apt -a 'update_cache=yes' --become

5. Run Playbook

Now we’re ready to deploy the Kubernetes cluster using Kubespray’s Ansible playbooks.

# Deploy the cluster (this will take 15-30 minutes)
ansible-playbook -i inventory/somaz-cluster/inventory.ini cluster.yml --become
Important:

The cluster deployment can take 15-30 minutes depending on your internet connection speed and server performance. During the deployment process, Ansible will:

  • Install container runtime (containerd by default)
  • Deploy etcd cluster
  • Install Kubernetes components (kubeadm, kubelet, kubectl)
  • Initialize the Kubernetes control plane
  • Join worker nodes to the cluster
  • Deploy network plugins and add-ons

Be patient and monitor the output for any errors. Most issues can be resolved by looking at the Ansible error messages.

6. Configure kubectl

After successful deployment, you need to configure kubectl to interact with your new cluster.

# Create kubectl config directory
mkdir -p ~/.kube

# Copy admin configuration
sudo cp /etc/kubernetes/admin.conf ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

# Setup kubectl autocomplete (optional but recommended)
echo '# kubectl completion and alias' >> ~/.bashrc
echo 'source <(kubectl completion bash)' >> ~/.bashrc
echo 'alias k=kubectl' >> ~/.bashrc
echo 'complete -F __start_kubectl k' >> ~/.bashrc
source ~/.bashrc

# Test kubectl
kubectl get nodes


Adding Worker Nodes

One of the key advantages of Kubernetes is its scalability. Let’s add another worker node to our cluster.

1. Update Inventory

First, we need to update our Ansible inventory to include the new worker node.

# Add new node to IPS array
declare -a IPS=(10.77.101.62 10.77.101.57 10.77.101.200)
CONFIG_FILE=inventory/somaz-cluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}

2. Modify inventory.ini

After using the inventory builder, it’s good to check and make sure the new node was properly added to the correct groups.

# inventory/somaz-cluster/inventory.ini
[all]
test-server ansible_host=10.77.101.62  ip=10.77.101.62
test-server-agent ansible_host=10.77.101.57  ip=10.77.101.57
test-server-agent2 ansible_host=10.77.101.200  ip=10.77.101.200

[kube_control_plane]
test-server

[etcd]
test-server

[kube_node]
test-server-agent
test-server-agent2  # Make sure the new node is in the kube_node group

3. Run Scale Playbook

Kubespray includes a special scale.yml playbook specifically for adding new worker nodes without affecting the existing cluster.

# Add new nodes to the cluster
ansible-playbook -i inventory/somaz-cluster/inventory.ini scale.yml --become
Tip:

The scale.yml playbook is designed to only operate on new nodes, which makes it much faster than running the full cluster.yml playbook. It will install the necessary Kubernetes components on the new node and join it to the existing cluster.

Scaling Down (Removing Nodes)

If you need to remove a node from the cluster, Kubespray also provides a removal playbook:

# To remove nodes, first update your inventory to reflect the desired state
# Then run the remove-node playbook
ansible-playbook -i inventory/somaz-cluster/inventory.ini remove-node.yml -e node=test-server-agent2 --become


Verification and Monitoring

Verify Cluster Status

# Check nodes
kubectl get nodes -o wide
NAME                 STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
test-server          Ready    control-plane   21m   v1.28.6   10.77.101.62    <none>        Ubuntu 20.04.6 LTS   5.15.0-1053-gcp    containerd://1.7.1
test-server-agent    Ready    <none>          20m   v1.28.6   10.77.101.57    <none>        Ubuntu 20.04.6 LTS   5.15.0-1053-gcp    containerd://1.7.1
test-server-agent2   Ready    <none>          65s   v1.28.6   10.77.101.200   <none>        Ubuntu 20.04.6 LTS   5.15.0-1053-gcp    containerd://1.7.1

# Check system namespace
kubectl get po -n kube-system
NAME                                          READY   STATUS    RESTARTS   AGE
calico-kube-controllers-76475c5546-vb58b      1/1     Running   0          20m
calico-node-hvx95                             1/1     Running   0          20m
calico-node-lq4tg                             1/1     Running   0          20m
calico-node-vftnr                             1/1     Running   0          1m
coredns-77f7cc69db-ctvk6                      1/1     Running   0          19m
coredns-77f7cc69db-h4bbx                      1/1     Running   0          19m
dns-autoscaler-5b576d9b75-pvbwj               1/1     Running   0          19m
kube-apiserver-test-server                    1/1     Running   0          21m
kube-controller-manager-test-server           1/1     Running   0          21m
kube-proxy-5n5tq                              1/1     Running   0          20m
kube-proxy-lx25t                              1/1     Running   0          1m
kube-proxy-s6x8h                              1/1     Running   0          20m
kube-scheduler-test-server                    1/1     Running   0          21m
kubernetes-dashboard-787dd78ffd-jl8bd         1/1     Running   0          19m
metrics-server-67df99fc7d-p8nzk               1/1     Running   0          19m
nginx-proxy-test-server-agent                 1/1     Running   0          20m
nginx-proxy-test-server-agent2                1/1     Running   0          1m
nodelocaldns-fwl5r                            1/1     Running   0          19m
nodelocaldns-hkvk2                            1/1     Running   0          1m
nodelocaldns-szfj6                            1/1     Running   0          19m

Monitoring Cluster Components



Cluster Maintenance

Upgrading the Cluster

Kubespray makes it easy to upgrade your Kubernetes cluster to newer versions.

# Update Kubespray repository
cd $KUBESPRAYDIR
git fetch --all
git checkout <desired_version_tag>

# Update dependencies
pip install -U -r requirements.txt

# Update inventory parameters for the new version
# Edit inventory/somaz-cluster/group_vars/k8s_cluster/k8s-cluster.yml
# Set kube_version to the desired version

# Run the upgrade playbook
ansible-playbook -i inventory/somaz-cluster/group_vars/all/all.yml upgrade-cluster.yml --become

Backup and Restore

It’s crucial to back up your etcd data regularly:

# Backup etcd data
ansible-playbook -i inventory/somaz-cluster/inventory.ini etcd_backup.yml --become

# This creates a backup in /tmp/etcd_backup on the etcd host(s)


Troubleshooting

Common Issues and Solutions

Issue Cause Solution
SSH connection failures SSH keys not properly set up Verify SSH keys with ssh-copy-id and test connections manually
Python dependencies errors Incompatible Python version or packages Use the recommended Python version and ensure the virtual environment is activated
Network plugin failures Network configuration issues Check node connectivity and firewall rules; verify pods in kube-system namespace
Node NotReady status kubelet not running properly Check systemctl status kubelet and kubelet logs
etcd cluster issues etcd member communication problems Verify etcd health with etcdctl endpoint health and check etcd logs

Kubespray Debug Tips

# Run Ansible in verbose mode for detailed output
ansible-playbook -i inventory/somaz-cluster/inventory.ini cluster.yml --become -vvv

# Check logs on nodes
ansible all -i inventory/somaz-cluster/inventory.ini -m shell -a "journalctl -xeu kubelet" --become

# Reset the cluster to start fresh
ansible-playbook -i inventory/somaz-cluster/inventory.ini reset.yml --become


Advanced Configuration

Customizing the Deployment

Kubespray offers extensive customization options through the Ansible inventory. Here are some common customizations:

High Availability Setup

# For HA setup, add multiple control plane nodes in inventory

# Then in group_vars/all/all.yml
loadbalancer_apiserver:
  address: <VIP address>
  port: 6443

Custom Network Configuration

# In group_vars/k8s_cluster/k8s-net-calico.yml (for Calico)
calico_ipip_mode: "Always"
calico_vxlan_mode: "Never"
calico_network_backend: "bird"

# Pod CIDR customization
kube_pods_subnet: 10.233.64.0/18

# Service CIDR customization
kube_service_addresses: 10.233.0.0/18

Add-on Configuration

# In group_vars/k8s_cluster/addons.yml
dashboard_enabled: true
metrics_server_enabled: true
ingress_nginx_enabled: true


Conclusion

You have successfully deployed a Kubernetes cluster using Kubespray and added a worker node to scale your infrastructure. This flexible deployment method allows you to create a production-ready Kubernetes environment on various infrastructures, including cloud providers like GCP and on-premises environments.

Kubespray strikes a balance between the simplicity of kubeadm and the flexibility of full custom deployments, making it an excellent choice for teams that need a customizable yet standardized Kubernetes setup.

Next Steps

Now that your Kubernetes cluster is up and running, consider:

  • Setting up persistent storage with CSI drivers
  • Implementing proper networking with an Ingress Controller
  • Configuring monitoring with Prometheus and Grafana
  • Establishing proper backup procedures for the cluster
  • Setting up CI/CD pipelines for your applications



References