December 11

Updating Kubernetes nodes’ OS

With nine nodes in my cluster right now, each running Ubuntu 24.04, I want to ensure that the latest updates are present on the nodes.

I know I can remove the node from the cluster, update the OS, and then re-add the node, but I’m hoping there is an easier way.

I asked ChatGPT, and the two best methods suggested were to create a custom Ansible playbook to do the updates, or to use the Kubernetes Cluster API. The Cluster API would take a lot of effort to setup, so I’m opting for the playbook approach.

The steps suggested are:

  • cordon the node

  • drain the node

  • apply apt updates

  • reboot

  • wait for node to be ready

  • uncordon

ChatGPT provided an example playbook with these steps. For my cluster, however, which uses Longhorn storage, I want to change the node drain policy before the updates are done, so that the drain command doesn’t timeout, waiting for any sole replica. After the upgrade, the drain mode can be restore.

The revised playbook (rolling_apt_upgrade.yaml) looks like this:

---
- hosts: kube_node
serial: 1
become: yes

pre_tasks:
- name: "Set Longhorn node-drain-policy BEFORE rolling updates"
command: >
kubectl -n longhorn-system patch setting node-drain-policy
--type=merge -p '{"value":"block-for-eviction-if-contains-last-replica"}'
delegate_to: "{{ groups['kube_control_plane'][0] }}"
run_once: true

tasks:
- name: Cordon the node
command: kubectl cordon {{ inventory_hostname }}
delegate_to: "{{ groups['kube_control_plane'][0] }}"

- name: Drain the node
command: >
kubectl drain {{ inventory_hostname }}
--ignore-daemonsets
--delete-emptydir-data
--grace-period=30
delegate_to: "{{ groups['kube_control_plane'][0] }}"

- name: Apply apt upgrades
apt:
upgrade: dist
update_cache: yes

- name: Reboot the node
reboot:

- name: Wait for node to return to Ready
command: kubectl get node {{ inventory_hostname }} -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}'
register: node_ready
retries: 40
delay: 10
until: node_ready.stdout == "True"
delegate_to: "{{ groups['kube_control_plane'][0] }}"

- name: Uncordon the node
command: kubectl uncordon {{ inventory_hostname }}
delegate_to: "{{ groups['kube_control_plane'][0] }}"

post_tasks:
- name: "Restore Longhorn node-drain-policy AFTER rolling updates"
command: >
kubectl -n longhorn-system patch setting node-drain-policy
--type=merge -p '{"value":"block-if-contains-last-replica"}'
delegate_to: "{{ groups['kube_control_plane'][0] }}"
run_once: true

From my ~/workspace/picluster area, with the playbook in the sub-dir playbooks, I invoked with:

ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/rolling_apt_upgrade.yaml

I was having issues on one node, where it was not becoming ready. What I saw was that the node did not know the IP of the API (lb-apiserver.kubernetes.local), and to resolve I had to add an entry to /etc/hosts mapping the IP to that name. I guess the problem was that, on reboot, kubelet is not up, so it cannot get the DNS info for the API. I don’t have a separate DNS server.

I added an ansible playbook to do this in playbooks/update_host_tmpl.yaml and it can be run with –limit to specify the node, if desired. Adding this to the node prep steps in Part IV of my series on Raspberry PI clusters.

 

 

 


Copyright 2017-2024. All rights reserved.

Posted December 11, 2025 by pcm in category "bare-metal", "Kubernetes", "Linux", "Raspberry PI