May 2024 – Off The Record

OpenVPN

After looking at several posts on OpenVPN, I decided to go with this one, which uses Helm, works with Kubernetes (versus just Docker), supports ARM64 processors, and had some easy configuration built-in. It hasn’t been updated in over a year, so I forked the repo and made some changes (see details below).

Here are the steps to set this up…

Pull Repo

To start, pull my version of the k4kratik k8s-openvpn repository:

cd ~/workspace/kubernetes/
git clone https://github.com/pmichali/k8s-openvpn.git
cd k8s-openvpn

Build

When working on a Mac, you can install Docker Desktop to run docker commands from the command line. You can alter the Dockerfile.aarch64 to use a newer Alpine image (and hence a newer OpenVPN image). Build a local copy of the openvpn image:

cd build/
docker build -f Dockerfile.aarch64 -t ${YOUR_DOCKER_ID}/openvpn:latest .

Setup a Docker account at hub.docker.com and create an access token so that you can log in. Push your image up to DockerHub:

docker login
docker push ${YOUR_DOCKER_ID}/openvpn:latest
cd ../deploy/openvpn

Customize

In k8s-openvpn/deploy/openvpn there is a values.yaml file, copy it to ${USER}-values.yaml and customize for your needs. In my case, I did the following changes:

Under ‘image’ ‘repository’, set the username to YOUR_DOCKER_ID, so that it loads your image.
Under the ‘service’ section, used a custom ‘externalPort’ number.
Under the service section, set a ‘loadBalancerIP’ address that is in my local network.
Set ‘DEFAULT_ROUTE_ENABLED: false’ so not using pod’s host route. Instead, will provide route later.
Decided to limit the number of clients by un-commenting ‘max-clients 5’
Under ‘serverConf’ section:
- Added a route to my local network using ‘push “route <NETWORK>/<PREFIX>”‘.
- Added my local DNS server with ‘push “dhcp-option DNS <IP>”‘.
- Added OpenDNS as a backup DNS with ‘push “dhcp-option DNS 208.67.222.222″‘.

You can also change server and client configuration settings in deploy/openvpn/templates/config-openvpn.yaml, if desired.

Deploy

With the desired changes, use helm to deploy OpenVPN:

helm upgrade --install openvpn . -n k8s-openvpn -f ${USER}-values.yaml --create-namespace

Check that the pods, services, deployment, replicas are all up:

kubectl get all -n k8s-openvpn

This will take quite some time (15+ minutes), as it builds all the certificates and keys for the server. Once running, you can log into the pod and check the server config settings in /etc/openvpn/openvpn.conf.

Create Users

With the server running, you can create client configuration files:

cd ../../manage
bash create_user.sh NAME [DOMAIN-NAME]

Once the client config is created, the config file can be imported into your OpenVPN client and you can test connecting. I use the OpenVPN client, which is available on several platforms.

There are two options when creating the client config. With just a (arbitrary) name for the device, it will create a config file (NAME.ovpn) where the client OpenVPN will connect to the OpenVPN server on the local network. In my case, that is the IP address that I specified in the customized values.yaml file with the ‘loadbalancerIP’ setting.

For example, if you set loadbalancerIP to 10.10.10.200 and ‘externalIP’ to 6666, the client will try to connect to 10.10.10.200:6666. Obviously, you can do that only from your local network. To use the, when out at Wi-Fi hot-spots, you can use the next option.

If you also add a domain name argument, then the OpenVPN client will try to connect to a server at that domain. You can purchase a domain name that maps the domain to your home router’s WAN IP address and use a service, like DynDNS to keep the IP updated for the domain (typically you get an IP from your ISP via DHCP and that can change over time). On your router, you can port forward from the ‘externalPort’ specified in the customized values.yaml to that same port on OpenVPN server, which is at the IP specified by ‘loadbalancerIP’.

For example, with loadBalancerIP set to 10.10.10.200 and ‘externalPort’ set to 6666, and a domain mydomain.com, the client would try to connect to mydomain.com:6666, which could be done from anywhere. You would need to make sure the dynamic IP for mydomain.com is pointing to your WAN IP address of your router, and do port forwarding for port 6666 to 10.10.10.200 port 6666.

Ciphers/Digests

When I upgraded the Apline OS for the VPN container, which in turn selects the version of OpenVPN (2.6.10 at the time of this posting), I wanted to make sure that the configuration settings for ciphers/digests were current.

In deploy/openvpn/templates/config-openvpn.yaml there is a section called openvpn.conf, which has the server configuration settings. Here are the pertinent entries in that section:

 auth SHA512
 ...
 tls-version-min 1.2
 ...
 tls-cipher TLS-ECDHE-ECDSA-WITH-AES-256-GCM-SHA384:TLS-ECDHE-RSA-WITH-AES-256-GCM-SHA384:TLS-DHE-RSA-WITH-AES-256-GCM-SHA384:TLS-ECDHE-ECDSA-WITH-CHACHA20-POLY1305-SHA256

With the running OpenVPN pod, you can exec into the pod and run these commands to see the ciphers that are available. For TLS ciphers, you can use this command to see the ciphers for TLS 1.3 and newer, and TLS 1.2 and older:

/usr/sbin/openvpn --show-tls

In my case, as I was supporting TLS 1.2 as a minimum, the existing set of ciphers were in the 1.2 list, so I left it alone. Likewise the following command can show the digests available:

/usr/sbin/openvpn --show-digests

Again, I saw SHA512 in the list, so I left this alone. Lastly, in the values.yaml file where you can customize the ‘cipher’ clause, it now has:

cipher: AES-256-CBC

Prevoiously, it have the value ‘AES-256-GCM’, however, this is not used, when using TLS authentication. Also, I did change the protocol from TCP to UDP, which, as I understand, is more robust.

Details of Modifications Made

build/Dockerfile.aarch64

Using newer alpine image (based on edge tag 20240329)
Updated repo added, to use the newer test repo location – main and community already exist.

deploy/openvpn/templates/config-openvpn.yaml

Removed client config settings that were generating warning log messages with opt-verify set.
Setting auth to sha512 on client and server.
Disabled allowing compression on server and used of compression (security risk).
Added settings that were on client to server for mute, user, group, etc.
Set opt-verify for testing, but then commented out, as it is deprecated.
Specifying TLS min 1.2 on server.

deploy/openvpn/templates/openvpn-deployment.yaml

Turned off node affinity for lifecyle=ondemand. Does not exist on my bare metal cluster.
Newer busybox version 1.35 for init container.

deploy/openvpn/values.yaml

Using my docker hub repo image for openvpn.
Altered ports used for loadbalancer service (arbitrary) and fixed IP.
Using Longhorn for storage class.
Using different client network (arbitrary).
Using udp protocol.
Changed K8s pod and service subnets to match what I use (arbitrary).
Set to redirect all traffic through gateway.
Using AES-256-CBC as default cipher.
Pushed route for DNS servers I wanted.

manage/create_user.sh

Allow to pass domain name vs using published service IP.
Fixed namespace.
Fixed kubectl exec syntax for newer K8s.

manage/revoke_user.sh

Fixed incorrect usage message.
Fixed namespace
Fixed kubectl exec syntax for newer K8s.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

May 13

Cluster Upgrade – Challenge

With my cluster running a kubespray version around 2.23.3 and kubernetes 1.28.2, I wanted to give a try at updating my cluster, as there were newer versions available. There were all sorts of problems along the way, so I’ll try to cover what I did, and what (finally) worked.

For reference, my cluster has longhorn storage, prometheus/grafana/loki, metalLB, nginx-ingress, and velero installed, as well.

But, before doing anything, I decided to move things around a bit in my directory structures, so that I didn’t have git repos inside of my ~/workspace/picluster git repo. I created a ~/workspace/kubernetes and placed several directories as peers in that area:

kubernetes
├── grafana-dashboards-kubernetes
├── ingress
├── kubespray
├── mysql
├── nginx-ingress
├── picluster
└── velero

The rest of the components remained in the picluster area:

kubernetes/picluster
├── inventory
├── longhorn
├── metallb
├── minio
├── minio-k8s
├── monitoring
└── playbooks

With this setup, I proceeded to identify what kubespray version to upgrade to, and whether or not this was a multi-version upgrade or not. I found that the latest release tag was 2.24.0, but there were many more commits since then, so I created a tag at my current version (0f243d751), checked out and created a tag at the desired version (fdf5988ea).

Next, I wanted to make sure that all the tools I’m using match what Kubespray is expecting for the commit that I’m using. There is a requirements.txt file that calls out all the versions. I used ‘poetry show’ to see what versions I had, and then used ‘poetry add COMPONENT==VERSION’ with a version to make sure that there were compatible versions. For example:

poetry add ansible==9.5.1

I copied the sample inventory area into my ~/workspace/kubernetes/picluster/inventory area and merged in my existing hosts.yaml, so that I had any customizations that were originally made in k8s-cluster.yml).

With this, I was ready to go to the kubespray directory and do the upgrade using…

cd ~/workspace/kubernetes/kubespray
ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml  -u ${USER} -v --private-key=~/.ssh/id_ed25519 -e upgrade_cluster_setup=true

THAT’S WHEN I STARTED TO RUN INTO PROBLEMS!

Initially, I saw that the calico-node pods were stuck in a crash loop…

calico-node: error while loading shared libraries: libpcap.so.0.8: cannot open shared object file: No such file or directory

It turns out that the 2.24.0+ release of kubespray uses calico v3.72.2, which has issues on arm64 processors. The choice was to go to v3.72.0, which apparently has a memory leak, or go to v3.72.3, where the problem with the library was fixed. I decided to do the later, but when I overrode calico_version, the upgrade failed, because there is no checksum for that version.

Updating KubeSpray Checksums

I found out that in the kubespray area, there is a scripts directory, with a download_hash.sh script, which would read the updated calico_version in ./roles/kubespray-defaults/defaults/main/download.yml and update the roles/kubespray-defaults/defaults/main/checksums.yml file. Well, it wasn’t as easy as that, because I was using a MacBook and the grep command does not have a -P (perl) option, used in the script. So…

I copied the Dockerfile to HashMaker.Dockerfile, and trimmed it to this:

# syntax=docker/dockerfile:1

FROM ubuntu:22.04@sha256:149d67e29f765f4db62aa52161009e99e389544e25a8f43c8c89d4a445a7ca37

ENV LANG=C.UTF-8 \
    DEBIAN_FRONTEND=noninteractive \
    PYTHONDONTWRITEBYTECODE=1

WORKDIR /kubespray

# hadolint ignore=DL3008
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    apt-get update -q \
    && apt-get install -yq --no-install-recommends \
    curl \
    python3 \
    python3-pip \
    sshpass \
    vim \
    openssh-client \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /var/log/*

RUN --mount=type=bind,source=requirements.txt,target=requirements.txt \
    --mount=type=cache,sharing=locked,id=pipcache,mode=0777,target=/root/.cache/pip \
    pip install --no-compile --no-cache-dir -r requirements.txt \
    && find /usr -type d -name '*__pycache__' -prune -exec rm -rf {} \;

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

COPY scripts ./scripts

I copied scripts/download_shas.sh to scripts/download_shas_pcm.sh and made these changes (as inside the container there is no git repo:

9c9
< checksums_file="$(git rev-parse --show-toplevel)/roles/kubespray-defaults/defaults/main/checksums.yml"
---
> checksums_file="./roles/kubespray-defaults/defaults/main/checksums.yml"
11c11
< default_file="$(git rev-parse --show-toplevel)/roles/kubespray-defaults/defaults/main/main.yml"
---
> default_file="./roles/kubespray-defaults/defaults/main/main.yml"

With these changes, I did the following to build and run the container, where I could run the scripts/download_hash_pcm.sh script to update the checksum.yml file with the needed checksums…

docker buildx build --platform linux/arm64 -f HashMaker.Dockerfile -t hashmaker:latest .

docker run --rm -it --mount type=bind,source="$(pwd)"/roles,dst=/kubespray/roles   --mount type=bind,source="${HOME}"/.ssh/id_ed25519,dst=/root/.ssh/id_ed25519 hashmaker:latest bash
./scripts/download_shas_pcm.sh
exit

(Yeah, I could have invoked the script instead of running bash and then invoking the script inside the container).

With this one would think that we are ready to do the upgrade. Well, I tried, but I hit some other issues…

Some nodes were updated to 1.29.3 kubernetes, but some were still at 1.28.2
The prometheus/grafana pods were in a crash loop, complaining that there were multiple default datasources.
Longhorn was older 1.5.3, and I figured it would be simple to helm upgrade to 1.6.1 – it wasn’t

Kubernetes Versions (SOLVED)

Someone on Slack said that I need to do the kubespray upgrade with the “-c upgrade_cluster_setup=true” added. I did that, but it did not work and I still have three nodes with 1.29.3 and four with 1.28.2.

I found the problem with the versions. On the four older nodes, at some point kubeadm and/or kubelet were installed (as Ubuntu package). As a result, there was the newer /usr/local/bin/kubelet (v1.29.3), and the package installed /usr/bin/kublet (v1.28.2). For systemd, in addition to the /etc/systemd/system/kubelet.service, which used the /usr/local./bin/kubelet in ExecStart, there was a kubelet.service.d directory with 10-kubeadm.conf file that used /usr/bin/kubelet in ExecStart. This one seemed to take precedence.

To resolve, I removed the Ubuntu kubeadm package, which depended on kubelet, and I removed the kubelet.service.d directory and reloaded systemd. My only guess is that at one point I tried installing kubeadm. Now, upgrades will show all nodes using the newer 1.29.3 kubernetes.

Multiple Default Datasources

I got into real trouble with this one. I tried deleting pods, removing replicasets that were no longer in use, and then tried to helm upgrade kube-prometheus-stack. That caused even more problems, as the upgrade failed and now I had a whole bunch of failing pods and replicasets not ready. The Prometheus pods were complaining about multiple attachments to the same PV (I was using Longhorn storage). I couldn’t clear the errors and could remove PVCs. I’m not sure if the problem was that I didn’t use all the arguments that I used, when I initially installed Prometheus.

I tried updating Longhorn (pulling the 1.6.1 values.yaml, changing policy from Delete to Retain and type from ClusterIP to NodePort, and then helm update with the modified values.yaml), and that was a mess too. Crash loops, and replicasets not working.

I ended up deleting the cluster entirely. I was concerned that maybe there was an issue with upgrading in general, so I installed the older kubespray/kubernetes cluster, without installing any other components (Longhorn, Prometheus), and did an upgrade. Everything worked fine.

I need to retry this, maybe with the upgrade of Prometheus using the same args as install did. I’m also worried about the multiple attachment issue with the PV.

In the meantime, I wanted to trying updating Longhorn…

Longhorn Upgrade

With the original, Longhorn was at 1.5.3, and 1.6.1 is available. I had tried a helm upgrade (after I had upgraded the cluster), and had all sorts of problems. So, I created a new cluster, with the latest Kubernetes, made sure everything was up, and then helm installed 1.5.3, using the modified values.yaml I had with Retain policy and NodePort:

helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.5.3 --values values-1.5.3.yaml

I then did a helm upgrade to 1.6.1…

helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --version 1.6.1

There were some pods in crash loops, and items not ready. I deleted the older replicasets. It looked like the deployment had annotation for 1.6.1, but was still calling out an image of 1.5.3. Looking at Longhorn notes, I saw that I could use kubectl to upgrade, and even knowing that I did use Helm install/upgrade before, I decided to try it.

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.1/deploy/longhorn.yaml

There were a bunch of warnings when running the command, but all pods came up and the deployment showed 1.6.1 for the image.

I’m not sure if there was something wrong with doing the helm update, if it was because I was customizing the values.yaml files, or if it was because I was using NodePort. With the kubectl apply, the type was set to clusterIP.

I’ve got more research to do here to isolate this issue.

Upgrading again…

I tested installing a 1.28.2 cluster, and then upgraded to 1.29.3 (doing just control plane nodes and etcd node first, and then all the worker nodes). Every pod was up and running, daemonsets/replicasets/deployments were all working, and things were looking pretty good.

There were some pre-upgrade replicatesets that were present (no needed/available instances), so I deleted them. I did a snapshot and backup of a Longhorn volume and that worked as well. I do see two problems so far.

First, under Grafana, the data sources were gone. I could not modify the Loki instance (as built-in), but I created another one. The original was giving connection refused errors. I think that the IP it uses, is the old one. There also was no Prometheus data source. I created one, and used the cluster IP and it works as well.

Second, I tried to do a backup of the Kubernetes cluster using Velero, and it failed. I tried viewing the log, but there was none. When checking ‘velero backup-location get’, it shows that the backup location is not available. It seems like various components are using older IPs/ports?

PROBLEM FOUND… It appears that when an upgrade occurs and the coredns version has changed, a new deployment, replicaset, service, and pods are created with the new version AND they get a new nameserver IP (10.133.0.10). However, the existing pods (and new ones created) are still referring to the old nameserver IP (default is 10.133.0.3). There is a service for that old nameserver IP, but it is not resolving addresses. If you do nslookup and specify the new nameserver IP, it will work, but that doens’t help everything that is running or new pods created, which are using the old nameserver IP.

WORKAROUND: If an install (cluster.yml) is done again, using the exact same settings, the first DNS service becomes active again. One can then delete the newly created service, and the unused replicasets. I tried repeating the upgrade, but that did not resolve the issue.

There does appear to be a download of the new coredns and restart of the systemd-resolved service. I don’t know if there is some mechanism to switch pods to use the new IP or if somehow the new service should have replaced the original and use the same IP.

Things Improving…

After messing with things over a few weeks I found out quite a bit of things…

CoreDNS: I see that with the newer Kubespray master branch versions, they now have checksums for coredns 3.72.3. AS a result, I don’t need to go through the contortions of creating my own branch of Kubespray and building the checksums or coredns 3.72.3. I just picked a newer commit of Kubespray (not the current tagged version, as it still did not have the checksums for coredns 3.72.3.

Upgrading with CoreDNS changes: I found out that with the newer Ubuntu versions the kernels actually have the “dummy” kernel module. I see it in the current 6.5.0-1015-raspi kernel, and I think it was in 1013 and 1014. The implication of this is that, I was unable, in the past, to enable node local DNS in Kubespray, because this module was needed. After updating the OS on my nodes to have this newer kernel, I could then run Kubespray installs and upgrades with ‘enable_nodelocaldns’ setting and now upgrades had a working DNS, even when the version of coredns changed. There were some replicasets that remained and were not active, but the upgrades are working.

Scheduling Disabled: I was seeing several issues when doing upgrades. In one case, I found that a worker node status that was “Ready”, but had “SchedulingDisabled” indicated. I did a “kubectl uncordon NODENAME” and that enabled scheduling. Not sure why it was not completely upgraded.

Upgrading single node: I found that with Kubespray, you can use the command line argument on upgrade (and other commands) –limit “NODE1,NODE2,NIODE3”, to limit the nodes that are affected by the command to one or more that are specified in the limit clause. However, when I did an upgrade, specifying ONLY a worker node, the process failed at this step:

TASK [kubernetes-apps/network_plugin/multus : Multus | Start resources] ********
fatal: [niobe -> {{ groups['kube_control_plane'][0] }}]: FAILED! => {"msg": "Error in jmespath.search in json_query filter plugin:\n'ansible.vars.hostvars.HostVarsVars object' has no attribute 'multus_manifest_2'"}

The problem is, that I don’t have Multus enabled! It turns out that there is a bug in Kubespray, such that you need to have a control plane node included in the limit clause, so that it will parse that Multus is disable and will not attempt to start it up on the worker node. I just re-ran the upgrade specifying one control plane node (already upgraded) and the worker node I wanted to update..

Node name changes: OK, this was stupid. I named my nodes after characters from the movie “The Matrix” (Apoc, Cypher, Morpheus,…). Since the original install, I’ve been playing with updating Kubespray versions, updating Kubernetes, installing things like Prometheus and Longhorn, and working through the problem I had with CoreDNS version changing during upgrades. Recently, I realized that one of my worker nodes was actually named incorrectly. It was “niobi” and not “niobe”. I changed my inventory and rename the hostname on the node. At one point, I decided to retest upgrades (with the node local DNS enabled). I did this by checking out tags that I had created for my repo and the Kubespray repo, performing a clean install, updating the repos to newer tags or the latest commit, updating the Poetry environment so that the correct tool versions were used with the Kubespray version I was trying, and then doing an upgrade. The upgrade was failing on node “niobe”, and it took me a while to realize that when I did the install, the node was named “niobi”, but when I did the upgrade, it was named “niobe” (with the same IP). The (simple) fix, was to do fix the hostname in the inventory, before doing the initial install.

Observations

In the future, I think it is probably best to do the kubernetes/kubespray update separate from other components. In addition, I think the update should be done a node at a time, starting with control plane nodes, and then worker nodes. Kubespray does have a limit option to restrict to a node. They say to run facts.yml to update info on all nodes, update control plane/etcd nodes, and then do worker nodes:

ansible-playbook playbooks/facts.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -e kube_version=v1.29.3 --limit "kube_control_plane:etcd" -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -e kube_version=v1.29.3 --limit "morpheus:niobi:switch" -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

I used this on a re-try of the upgrade and the facts and control plane/etc steps worked fine, but I hit an error in the downloading step for the worker nodes. Just note that, with the current Kubespray, you probably should include one control plane node, when upgrading one or more worker nodes, so that the configuration is handled correctly.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off