June 3

Kubespray Add-Ons

In Part IV of the PI cluster series, I mention how to setup Kubespray to create a cluster. You can look there for how to setup your inventory, and the basic configuration settings for Kubespray. In that series, I mention about how to add more features, after the cluster is up. Some are pretty simple, and some require some manual steps to get everything set up.

However, you can also have Kubespray install some “add-on” components, as part of the cluster bring-up. In many cases, this makes the process more automated, and “easier”, but it does have some limitations.

First, you will be using the version and configuration that is defined in Kubespray’s Ansible templates and roles. Granted, you can always customize Kubespray, with the caveat of having to keep your changes up to date with upstream.

Second, removing the feature on a running cluster can be more difficult. You’ll have to manually delete all the resources (e.g. daemonsets, deployments, etc.), of which, some may be hard to identify (CRDs, RoleBindings, secrets, etc). Looking in the Kubespray templates may provide some insight into the resources that were created.

You may be able to find manifests for the feature and version from the feature’s repo, and pull them and use “kubectl delete” on the manifests to remove the feature. Just note, that there may be some differences, between what is in the repo manifests for a version, and what are in the manifests that Kubespray used. I haven’t tried it, but if there is a Helm based version of the feature that matches what Kubespray installed, you might be able to “helm install” the already installed feature, and then “helm delete”?

Kube VIP (Virtual IP and Service Load Balancing)

To add Kube-VIP as part of the Kubespray add-on, I did these steps, before creating the cluster.

First, I modified the inventory, so that etcd would run on each of my control-plane nodes (versus a mix of control-plane and worker nodes).

Second, in inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml, I enabled strict ARP, used IPVS (instead of iptables) for kube-proxy, and excluded my local network from kube-proxy (so that kube-proxy would not clear entries that were created by IPVS):

kube_proxy_strict_arp: true
kube_proxy_mode: ipvs
kube_proxy_exclude_cidrs: ["CIDR_FOR_MY_LOCAL_NETWORK",]

Third, I enabled kube-vip in inventory/mycluster/group_vars/k8s_cluster/addons.yml. I turned on ARP (vs BGP), and setup to do VIP for control plane and specified the API to use. I also selected to do load balancing of that VIP. I did not enable load-balancing for services, but that is an option too:

kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: VIP_ON_MY_NETWORK
loadbalancer_apiserver:
 address: "{{ kube_vip_address }}"
 port: 6443
kube_vip_lb_enable: true

# kube_vip_services_enabled: false
# kube_vip_enableServicesElection: true

I had tried this out, but found that the kube-vip container was showing connection refused and permission problems, so leader election was not working for the virtual IP chosen.

I finally found a bug report on the issue when using Kubernetes 1.29 with kube-vip. Essentially, when the first control plane node is starting up, the admin.conf file used for kubectl commands, does not have the permissions needed for kube-vip at that point in the process. The kube-vip team needs to create their own config file for kubectl. In the meantime, the bug report is trying a work-around fix in Kubespray, by switching to the super-admin.conf file, which will have the needed permissions at that point in time. However, the patch they have does not work. I did more hacking to it, and have this change, which works:

diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index f7b04a624..b5acdac8c 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,10 @@
- kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
- kube_vip_arp_enabled

+- name: Kube-vip | Check if first control plane
+ set_fact:
+ is_first_control_plane: "{{ inventory_hostname == groups['kube_control_plane'] | first }}"
+
- name: Kube-vip | Write static pod
template:
src: manifests/kube-vip.manifest.j2
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..7b59bca4c 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
hostNetwork: true
volumes:
- hostPath:
- path: /etc/kubernetes/admin.conf
+ path: /etc/kubernetes/{% if is_first_control_plane %}super-{% endif %}admin.conf
name: kubeconfig
status: {}

UPDATE: There is a fix that is in progress, which is a streamlined version of my change. Once that is merged, no patch will be needed.

With this change to Kubespray, I did a cluster create:

cd ~/workspace/kubernetes/picluster
poetry shell
cd ../kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -vvv --private-key=~/.ssh/id_ed25519 cluster.yml

Everything was up and running, but kubectl commands were failing on my Mac, because the ~/.kube/config file uses the FQDN https://lb-apiserver.kubernetes.local:6443 for the server, and there is no DNS info on my Mac for this host name (it does work on the nodes, however). The simple fix was to repace the FQDN with the IP address selected for the VIP.

Now, all requests to that IP are redirected to the node that is currently running the API server. If the node is not available, IPVS will redirect to another control plane node.

Update 01/07/2025: The fix has been released and no patching is needed. However, I did find one problem with kube-vip. If a node is rebooted (intentionally or after a power outage or crash), the node will fail to become “Ready”. The issue is that the node will try to register with the cluster, but is using the lb-apiserver.kubernetes.local name, and because the Kubernetes DNS server is not available on the node yet, it cannot resolve that name to the VIP that is configured.

The solution is to create a mapping of VIP to lb-apiserver.kubernetes.local in /etc/hosts. However, this file is automatically created (and recreated on reboot), so the mapping needs to be added to the template file /etc/cloud/templates/hosts.debian.tmpl. This should be done on each node, and ideally, one probably wants to create a playbook to automatically do this on every node in the inventory. I haven’t done it yet, so it is an exercise for the reader. 🙂

MetalLB Load Balancer

Instead of setting this up after the cluster was created, you can opt to let Kubespray do this as well. In the inventory/mycluster/group_vars/k8s_cluster/addons.yml, I did these changes:

metallb_enabled: true
metallb_speaker_enabled: "{{ metallb_enabled }}"
metallb_namespace: "metallb-system"


metallb_protocol: "layer2"
metallb_config:

 address_pools:
 primary:
 ip_range:
 - FIRST_IP_IN_RANGE-LAST_IP_IN_RANGE
 auto_assign: true
 layer2:
 - primary

Besides enabling the feature, I made sure that it was using layer two vs layer three, and under the config, setup an address pool with the range of IPs on my local network that I wanted to use for load balanced IPs. You can specify as a CIDR, if desired.

Now, when the cluster is created with Kubespray, MetalLB will be set up and you can change pods/services to use the networking type “LoadBalancer” and an IP from the pool will be assigned.

As mentioned in the disclaimer above, with the version of Kubespray I have, it installs MetalLB 0.13.9. I could have overridden the ‘metallb_version’ to a newer version, like ‘v0.14.5’, but the templates for MetalLB in Kubespray are using the older v0.11.0 kubebuilder image in several places. To get the same versioning as used when installing MetalLB via Helm, I would have to modify the templates to specify v0.14.0. I did see other configuration differences with the CRDs used in the Helm version, like setting the tls_min_version argument and not setting some priority nor priorityClassName configurations.

NGINX Ingress

This one is pretty easy to enable, by changing this setting in inventory/mycluster/group_vars/k8s_cluster/addons.yml:

ingress_nginx_enabled: true

When the cluster comes up, there will be an ingress daemonset, which created ingress controller pods on each node, and a NGINX ingress service with an IP from the MetalLB address pool.

There are example YAML files in the MetalLB/NGINX Ingress post, that will allow you to create pods and services, and an ingress resource that allows access via path prefixes.

Posted June 3, 2024 by pcm in category "bare-metal", "Kubernetes", "Raspberry PI