High Availability?
OK, so I have a cluster with three control plane nodes and four worker nodes (currently). However, if I shutdown the control plane node that is hosting the API server, I lose API access. 🙁
I’ve been digging around and it looks like kube-vip would be a good solution, as it allows me to create a virtual IP for the API server, and then does load balancing and leader election between the control plane nodes so that the failure of the node providing the API can switch to another control plane node. In addition, kube-vip can do load balancing between services (I’m not sure if that makes metalLB redundant).
Before installing kube-vip, I needed to change the cluster configuration. I changed the inventory, so that etcd is running ONLY on the control-plane nodes (and not a mix of control plane and worker nodes).
Next, I made these changes to inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml:
kube_proxy_mode: ipvs
kube_proxy_strict_arp: true
kube_proxy_exclude_cidrs: ["CIDR_OF_LOCAL_NETWORK",]
This had kube-proxy also using IPVS (versus iptables), and running in strict ARP mode (needed for kube-vip). Lastly, to prevent kube-proxy from clearing IPVS settings made by kube-vip, the local network IPs must be excluded. With those changes, I re-created a cluster, and was ready to install kube-vip…
There was Medium article by Chris Kirby to use a Helm install of kube-vip for HA. It used an older version of kube-vip (0.6.4) and used value.yaml settings for K3s. I added the Helm repo for kube-vip, and pulled the values.yaml file to be able to customize it:
mkdir ~/workspace/kubernetes/kube-vip
cd ~/workspace/kubernetes/kube-vip
helm repo add kube-vip https://kube-vip.github.io/helm-charts
helm repo update
wget https://raw.githubusercontent.com/kube-vip/helm-charts/main/charts/kube-vip/values.yaml
Here are the changes I made to the values.yaml, saving it as values-revised.yaml:
6c6
< pullPolicy: IfNotPresent
---
> pullPolicy: Always
8c8
< # tag: "v0.7.0"
---
> tag: "v0.8.0"
11c11
< address: ""
---
> address: "VIP_ON_LOCAL_NETWORK"
20c20
< cp_enable: "false"
---
> cp_enable: "true"
22,23c22,24
< svc_election: "false"
< vip_leaderelection: "false"
---
> svc_election: "true"
> vip_leaderelection: "true"
> vip_leaseduration: "5"
61c62
< name: ""
---
> name: "kube-vip"
86c87,88
< nodeSelector: {}
---
> nodeSelector:
> node-role.kubernetes.io/control-plane: ""
91a94,97
> - effect: NoExecute
> key: node-role.kubernetes.io/control-plane
> operator: Exists
>
93,101c99,104
< # nodeAffinity:
< # requiredDuringSchedulingIgnoredDuringExecution:
< # nodeSelectorTerms:
< # - matchExpressions:
< # - key: node-role.kubernetes.io/master
< # operator: Exists
< # - matchExpressions:
< # - key: node-role.kubernetes.io/control-plane
< # operator: Exists
---
> nodeAffinity:
> requiredDuringSchedulingIgnoredDuringExecution:
> nodeSelectorTerms:
> - matchExpressions:
> - key: node-role.kubernetes.io/control-plane
> operator: Exists
Besides using a newer kube-vip version, this enabled load balancing for control plane nodes and services, selects nodes that have the control-plane attribute (but not a value, like the article), and sets the node affinity.
With this custom values file, I could do the install:
helm install my-kube-vip kube-vip/kube-vip -n kube-system -f values-revised.yaml
With this, all the kube-vip pods were up, and the daemonset showed three desired, current, and ready. However, when I changed the server IP to my VIP in ~/.kube/config and tried kubectl commands, they failed saying that there was a x509 certificate for each of the control plane nodes, and a cluster IP, but not for the VIP I’m using.
This can be fixed by re-generating the certificates on every control plane node:
sudo su
cd
kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' --insecure-skip-tls-verify > kubeadm.yaml
mv /etc/kubernetes/pki/apiserver.{crt,key} ~
kubeadm init phase certs apiserver --config kubeadm.yaml
In the output, I saw the IPs of the control plane nodes AND the VIP I defined. Next, the kube-apiserver container needs to be stopped and removed, so that a new one is started.
crictl ps | grep kube-apiserver
crictl stop <ID-of-apiserver>
crictl rm <ID-of-apiserver>
Now, kubectl commands using the VIP will be redirected to the control plane node running the API server, and if that node is unavailable, the requests will be redirected to another control plane node. You can see that by doing arping of the VIP and, when the leadership changes, the MAC displayed will change.
Kind of involved, but this works!
One Slight Complication…
I did have some problems, when playing with HA for the API. I had rebooted the control plane node that was actively providing the API. Kube-vip did its job, and IPVS redirected API requests to another control plane node that was “elected” as the new leader. All good so far.
However, when that control plane node came back up, it would appear in the “kubectl get node” output, but showed as “NotReady”, and it never seemed to become ready. It appeared that the network was not ready, and the calico-node pod was showing an error. I played around a bit, but couldn’t seem to clear the error.
One thing I did was a Kubespray upgrade-cluster.yml with the –limit argument, specifying the node and one of the other control plane nodes (so that control plane “facts” were specified). The kube-vip pod for the node was still failing with a connection refused error. On the node, I stopped/removed the kube-apiserver container and then kube-vip container, and then kube-vip no longer had any errors.
The only thing was that ipvsadm on the node, did not show a load balancing entry for the VIP, and the other two control plane nodes only had their IPs in the load balancing entry for the VIP. I didn’t try rebooting another control-plane node.