Updating Kubernetes and Kubespray
With my current cluster, I was running Kubernetes 1.31.1 with Kubespray that is slightly newer than 2.27.0. I wanted to bring the cluster up to 1.34.2. To do that though, I have to move to 1.32.10 (or any 1.32 version), then 1.33.5, and finally 1.34.2.
Now, to do that, I have to use Kubespray versions that have checksums for the desired Kubernetes images (see roles/kubespray_defaults/vars/main/checksums.yml), and that version has to support the min Kubernetes version. For example, if I were to use Kubespray 2.29.0, it would not allow upgrading from older than 1.32.0, so I cannot jump to the latest Kubespray version. In addition, because of the checksums for any Kubespray version, it may not have the desired Kubernetes release.
In the past, I used to always specify the version of kube_version in k8s_cluster.yml in my inventory, to specify a version. However, if one omits this, Kubespray will use the most recent version of Kubernetes that it knows about.
I checked out the 2.28.0 tag of Kubespray and did an upgrade. This brought Kubernetes to 1.32.5. Next, I upgraded to the 2.29.0 tag of Kubespray and did another upgrade, and that brought Kubernetes to 1.33.5. Finally, I checked out the latest of Kubespray commit and saw that it had 1.34.2, so I did another upgrade to that version.
Upgrade Process
The first thing I did (once) was to do a “brew update” and “brew upgrade” so that I had the latest tools (and kubectl client).
Next, because I’m using Longhorn, there can be situations where a node has only one replica (even though each feature I have will use three replicas). When an upgrade occurs, the drain can be delayed long enough to cause a timeout. To prevent this, before doing any upgrade process, we can change the node drain policy to allow a quick switch to another node, and the drain to proceed…
kubectl -n longhorn-system patch setting node-drain-policy \
--type=merge -p '{"value":"block-for-eviction-if-contains-last-replica"}'
For a single upgrade, I do the following:
- Checkout the kubespray commit desired (e.g. git checkout v2.29.0)
- Check the Kubespray requirements.txt file and make sure you have exactly same versions installed in your environment (e.g. poetry add ansible@10.7.0).
- Rename your inventory directory for Kubernetes (~/workspace/kubernetes/picluster/inventory) to a different name (e.g. mv mycluster{,.previous}).
- Copy the Kubespray inventory/sample directory to the Kubernetes inventory area (e.g. cp -r inventory/sample ../picluster/inventory/mycluster
- Do a recursive diff, and apply the customizations from your current inventory to the new one. This will involve copying some files (like hosts.yml), and changing config settings in others. I usually add a comment with my initials so that I know what lines are modified.
- Keep in mind that sometimes the default settings change between Kubespray versions, so you will need to adjust accordingly. For example, ingress_nginx_default in 2.27.0 was true, so no change was made, but in 2.28.0 and 2.29.0 the default is false, so it had to be set to true.
Now, the upgrade can be done. I use the command:
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} \
-b -v --private-key=~/.ssh/id_ed25519 upgrade-cluster.yml \
-e "serial=1" -e upgrade_cluster_setup=true
Once completed, assuming success, you can check that all the resources are running. If there is an issue with a node, you can run the update again, with the argument:
--limit "node1,node4"
Where node4 is the node that had failed, and node1 is one of the control plane nodes (I guess you need to include a control plane node).
I’ve also had occasions where I just did an install again, and it corrected the nodes that had issues (I had one worker node that was “Not Ready” after the upgrade. Lastly, you can always delete failed pods, in hopes of the pod restarting and working.
When you are all done with upgrades, you can restore the Longhorn node drain policy:
kubectl -n longhorn-system patch setting node-drain-policy \
--type=merge -p '{"value":"block-if-contains-last-replica"}'