June 12

Ad-Blocking With PI-Hole

I had Pi-Hole running on a standalone Raspberry PI, but wanted to move this to my Kubernetes cluster. Digging around, I found a useful article on how add PI-Hole to Kubernetes, which not only talked about using PI-Hole, but having redundant instances with info on keeping them in-sync. It used MetalLB, ingress, and CertManager for Let’s Encrypt certifications – something I was interested in.

There was another article, based on using Helm and having some monitoring setup. I may try someday.


A Few Things

First, as expected, this article had an older version of pi-hole (2022.12.1). I tried the latest version (at this time 2024.05.0), but the pods were stuck in crash loops. What I found out, was that for liveness/readiness, the YAML specified to do an HTTP get at the root of the Lighttp web server. When using the 2023.02.1 pihole image it worked, but with 2023.02.2 it failed.

Trying curl inside the pod showed a 403 Forbidden error. If I tried to access, I’d get a 301 Moved Permanently with a ‘/admin/’ path. If I did, I’d get a 302 Found response with path ‘login.php’. When I did, I’d get a 200 OK result with content.

So, I changed the liveness and health probe configuration to add a path field with ‘/admin/login.php’ and then the pods would come up successfully.

Second, For the PI-Hole admin web pages, I chose to use a network type of LoadBalancer (instead of ClusterIP and then setting up an ingress IP). Accessing locally is fine, as I just use the IP assigned by the load balancer. The article talks about setting up a certificate using Let’s Encrypt to be able to access remotely.

I already have a domain name, and I’m using Dynamic DNS to redirect that domain to my router’s WAN IP. But, I’m currently port forwarding external HTTP/HTTPS traffic to my standalone Raspberry PI for a music server that uses Let’s Encrypt for certificates.

For now, I think I’ll just access my PI-Hole admin page locally. I will, however, have to figure out how to setup Let’s Encrypt, once I move my music server and other web apps to the Kubernetes cluster, so it will be useful to keep this info in mind.


Setting Up PI-Hole

I’m doing the same thing as the article, running three replicas of the PI-Hole pods, and I altered the liveness/readiness check. Here is my manifest.yaml in pieces:

apiVersion: v1
kind: Namespace
name: pihole
apiVersion: v1
kind: ConfigMap
name: pihole-configmap
namespace: pihole
TZ: "America/New_York"

This sets up a namespace for PI-Hole, defines the timezone I'm using, and the upstream DNS servers that I wanted to use (OpenDNS). You can customize, as desired.
apiVersion: v1
kind: Secret
name: pihole-password
namespace: pihole
type: Opaque

This is the password that will be used when logging into the PI-Hole admin page. You should encode this using “echo -n ‘MY PASSWORD’ | base64” and place the encoded string in the WEBPASSWORD attribute.

apiVersion: apps/v1
kind: StatefulSet
name: pihole
namespace: pihole
app: pihole
serviceName: pihole
replicas: 3
app: pihole
- name: pihole
image: pihole/pihole:2024.05.0
- configMapRef:
name: pihole-configmap
- secretRef:
name: pihole-password
- name: svc-80-tcp-web
containerPort: 80
protocol: TCP
- name: svc-53-udp-dns
containerPort: 53
protocol: UDP
- name: svc-53-tcp-dns
containerPort: 53
protocol: TCP
port: svc-80-tcp-web
path: /admin/login.php
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 5
port: svc-80-tcp-web
path: /admin/login.php
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 10
- name: pihole-etc-pihole
mountPath: /etc/pihole
- name: pihole-etc-dnsmasq
mountPath: /etc/dnsmasq.d
- metadata:
name: pihole-etc-pihole
namespace: pihole
- "ReadWriteOnce"
storage: 3Gi
- metadata:
name: pihole-etc-dnsmasq
namespace: pihole
- "ReadWriteOnce"
storage: 3Gi

This is the stateful set that will create three replicas of the PI-Hole pods. I’m using the latest version at this time (2024.05.0), have modified the liveness/readiness checks as mentioned above, and am using PVs (longhorn) for storing configuration.

apiVersion: v1
kind: Service
name: pihole
namespace: pihole
app: pihole
clusterIP: None
app: pihole
kind: Service
apiVersion: v1
name: pihole-web-svc
namespace: pihole
app: pihole
statefulset.kubernetes.io/pod-name: pihole-0
type: LoadBalancer
- name: svc-80-tcp-web
port: 80
targetPort: 80
protocol: TCP
kind: Service
apiVersion: v1
name: pihole-dns-udp-svc
namespace: pihole
metallb.universe.tf/allow-shared-ip: "pihole"
app: pihole
type: LoadBalancer
- name: svc-53-udp-dns
port: 53
targetPort: 53
protocol: UDP
kind: Service
apiVersion: v1
name: pihole-dns-tcp-svc
namespace: pihole
metallb.universe.tf/allow-shared-ip: "pihole"
app: pihole
type: LoadBalancer
- name: svc-53-tcp-dns
port: 53
targetPort: 53
protocol: TCP

These are the services for the UI and for DNS. Of note, we are using the same laod balancer IP for the TCP and UDP DNS services. I used load balancer for the web UI as well (instead of using ClusterIP and setting up an ingress – maybe that will bite me later).

With this manifest, you can “kubectl apply -f manifest.yaml” and then look for all three of the pods to start up. You should be able to do nslookup/dig commands using the IP of the service as the server to verify that DNS is working, and you can use the IP for the pihole-web-svc service with a path of /admin/ (e.g. Use the password you defined in the manifest, to log in and see operation of the Ad Blocker.

Keeping The PI-Hole Pods In Sync

As mentioned in the article, we have three PI-Hole pods (one primary, two secondary), but need to keep the database in sync. To do this, Orbital Sync is used to backup the primary pod’s database, and then restore it to the secondary pods’ databases. Here is the orbital-sync.yaml manifest:

kind: ConfigMap
name: orbital-sync-config
namespace: pihole
PRIMARY_HOST_BASE_URL: “http://pihole-0.pihole.pihole.svc.cluster.local”
SECONDARY_HOST_1_BASE_URL: “http://pihole-1.pihole.pihole.svc.cluster.local”
SECONDARY_HOST_2_BASE_URL: “http://pihole-2.pihole.pihole.svc.cluster.local”

apiVersion: apps/v1
kind: Deployment
name: orbital-sync
namespace: pihole
app: orbital-sync
app: orbital-sync
– name: orbital-sync
image: mattwebbio/orbital-sync:latest
– configMapRef:
name: orbital-sync-config
name: pihole-password
name: pihole-password
name: pihole-password

It runs every minute, and uses the secret that was created with the password to access PI-Hole. You can look at the orbital sync pod log to see that it is backing up and restoring the database among the PI-Holes.


Finishing Touches

Under the UI’s local DNS entries section, I manually entered the hostname (with a .home suffix) and IP address for each of my devices on the local network, so that I can access them by :”name.home”.

I did not setup DHCP on PI-Hole, as I used my router’s DHCP configuration.

To use the PI-Hole as the DNS server for all systems in your network, you can specify the IP of the PI-Hole on each host as the only DNS server. If you specify more than one DNS server, based on your OS, it may use the other server(s) at times and bypass the ad-blocking.

For me, I have all my hosts using the router as the primary DNS server. The router is configured to use the PI-Hole as the primary server, and then a public server as the secondary server. Normally, requests would always go to the Pi-Hole, unless for some reason it was down. This was advantageous for two reasons. First, when I had my standalone PI-Hole, if it crashed, there still was DNS resolution. Second, it made it easy to switch from the standalone PI-Hole to the Kubernetes one, by just changing the router configuration.

The only odd thing with this setup, is that when I use my laptop away from the network, my router’s IP is (obviously) not available. I’ve been getting around this, by using the “Location” feature of the MacOS, to setup the “Home” location to use my router’s IP for DNS, and to use a public DNS server for the “Roaming” location.

I guess I could setup so that the ports used for DNS on my domain name (which points to my router using Dynamic DNS), would port forward to the PI-Hole IP, but I didn’t want to expose that to the Internet.



Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Ad-Blocking With PI-Hole
June 3

Kubespray Add-Ons

In Part IV of the PI cluster series, I mention how to setup Kubespray to create a cluster. You can look there for how to setup your inventory, and the basic configuration settings for Kubespray. In that series, I mention about how to add more features, after the cluster is up. Some are pretty simple, and some require some manual steps to get everything set up.

However, you can also have Kubespray install some “add-on” components, as part of the cluster bring-up. In many cases, this makes the process more automated, and “easier”, but it does have some limitations.

First, you will be using the version and configuration that is defined in Kubespray’s Ansible templates and roles.  Granted, you can always customize Kubespray, with the caveat of having to keep your changes up to date with upstream.

Second, removing the feature on a running cluster can be more difficult. You’ll have to manually delete all the resources (e.g. daemonsets, deployments, etc.), of which, some may be hard to identify (CRDs, RoleBindings, secrets, etc). Looking in the Kubespray templates may provide some insight into the resources that were created.

You may be able to find manifests for the feature and version from the feature’s repo, and pull them and use “kubectl delete” on the manifests to remove the feature. Just note, that there may be some differences, between what is in the repo manifests for a version, and what are in the manifests that Kubespray used. I haven’t tried it, but if there is a Helm based version of the feature that matches what Kubespray installed, you might be able to “helm install” the already installed feature, and then “helm delete”?


Kube VIP (Virtual IP and Service Load Balancing)

To add Kube-VIP as part of the Kubespray add-on, I did these steps, before creating the cluster.

First, I modified the inventory, so that etcd would run on each of my control-plane nodes (versus a mix of control-plane and worker nodes).

Second, in inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml, I enabled strict ARP, used IPVS (instead of iptables) for kube-proxy, and excluded my local network from kube-proxy (so that kube-proxy would not clear entries that were created by IPVS):

kube_proxy_strict_arp: true
kube_proxy_mode: ipvs
kube_proxy_exclude_cidrs: ["CIDR_FOR_MY_LOCAL_NETWORK",]

Third, I enabled kube-vip in inventory/mycluster/group_vars/k8s_cluster/addons.yml. I turned on ARP (vs BGP), and setup to do VIP for control plane and specified the API to use. I also selected to do load balancing of that VIP. I did not enable load-balancing for services, but that is an option too:

kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: VIP_ON_MY_NETWORK
address: "{{ kube_vip_address }}"
port: 6443
kube_vip_lb_enable: true

# kube_vip_services_enabled: false
# kube_vip_enableServicesElection: true

I had tried this out, but found that the kube-vip container was showing connection refused and permission problems, so leader election was not working for the virtual IP chosen.

I finally found a bug report on the issue when using Kubernetes 1.29 with kube-vip. Essentially, when the first control plane node is starting up, the admin.conf file used for kubectl commands, does not have the permissions needed for kube-vip at that point in the process. The kube-vip team needs to create their own config file for kubectl. In the meantime, the bug report is trying a work-around fix in Kubespray, by switching to the super-admin.conf file, which will have the needed permissions at that point in time. However, the patch they have does not work. I did more hacking to it, and have this change, which works:

diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index f7b04a624..b5acdac8c 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,10 @@
- kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
- kube_vip_arp_enabled

+- name: Kube-vip | Check if first control plane
+ set_fact:
+ is_first_control_plane: "{{ inventory_hostname == groups['kube_control_plane'] | first }}"
- name: Kube-vip | Write static pod
src: manifests/kube-vip.manifest.j2
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..7b59bca4c 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
hostNetwork: true
- hostPath:
- path: /etc/kubernetes/admin.conf
+ path: /etc/kubernetes/{% if is_first_control_plane %}super-{% endif %}admin.conf
name: kubeconfig
status: {}


UPDATE: There is a fix that is in progress, which is a streamlined version of my change. Once that is merged, no patch will be needed.

With this change to Kubespray, I did a cluster create:

cd ~/workspace/kubernetes/picluster
poetry shell
cd ../kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -vvv --private-key=~/.ssh/id_ed25519 cluster.yml

Everything was up and running, but kubectl commands were failing on my Mac, because the ~/.kube/config file uses the FQDN https://lb-apiserver.kubernetes.local:6443 for the server, and there is no DNS info on my Mac for this host name (it does work on the nodes, however). The simple fix was to repace the FQDN with the IP address selected for the VIP.

Now, all requests to that IP are redirected to the node that is currently running the API server. If the node is not available, IPVS will redirect to another control plane node.

MetalLB Load Balancer

Instead of setting this up after the cluster was created, you can opt to let Kubespray do this as well. In the inventory/mycluster/group_vars/k8s_cluster/addons.yml, I did these changes:

metallb_enabled: true
metallb_speaker_enabled: "{{ metallb_enabled }}"
metallb_namespace: "metallb-system"

metallb_protocol: "layer2"

 auto_assign: true
- primary

Besides enabling the feature, I made sure that it was using layer two vs layer three, and under the config, setup an address pool with the range of IPs on my local network that I wanted to use for load balanced IPs. You can specify as a CIDR, if desired.

Now, when the cluster is created with Kubespray, MetalLB will be set up and you can change pods/services to use the networking type “LoadBalancer” and an IP from the pool will be assigned.

As mentioned in the disclaimer above, with the version of Kubespray I have, it installs MetalLB 0.13.9. I could have overridden the ‘metallb_version’ to a newer version, like ‘v0.14.5’, but the templates for MetalLB in Kubespray are using the older v0.11.0 kubebuilder image in several places. To get the same versioning as used when installing MetalLB via Helm, I would have to modify the templates to specify v0.14.0. I did see other configuration differences with the CRDs used in the Helm version, like setting the tls_min_version argument and not setting some priority nor priorityClassName configurations.

NGINX Ingress

This one is pretty easy to enable, by changing this setting in inventory/mycluster/group_vars/k8s_cluster/addons.yml:

ingress_nginx_enabled: true

When the cluster comes up, there will be an ingress daemonset, which created ingress controller pods on each node, and a NGINX ingress service with an IP from the MetalLB address pool.

There are example YAML files in the MetalLB/NGINX Ingress post, that will allow you to create pods and services, and an ingress resource that allows access via path prefixes.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Kubespray Add-Ons
June 1

High Availability?

OK, so I have a cluster with three control plane nodes and four worker nodes (currently). However, if I shutdown the control plane node that is hosting the API server, I lose API access. 🙁

I’ve been digging around and it looks like kube-vip would be a good solution, as it allows me to create a virtual IP for the API server, and then does load balancing and leader election between the control plane nodes so that the failure of the node providing the API can switch to another control plane node. In addition, kube-vip can do load balancing between services (I’m not sure if that makes metalLB redundant).

Before installing kube-vip, I needed to change the cluster configuration. I changed the inventory, so that etcd is running ONLY on the control-plane nodes (and not a mix of control plane and worker nodes).

Next, I made these changes to inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml:

kube_proxy_mode: ipvs
kube_proxy_strict_arp: true
kube_proxy_exclude_cidrs: ["CIDR_OF_LOCAL_NETWORK",]

This had kube-proxy also using IPVS (versus iptables), and running in strict ARP mode (needed for kube-vip). Lastly, to prevent kube-proxy from clearing IPVS settings made by kube-vip, the local network IPs must be excluded. With those changes, I re-created a cluster, and was ready to install kube-vip…

There was Medium article by Chris Kirby to use a Helm install of kube-vip for HA. It used an older version of kube-vip (0.6.4) and used value.yaml settings for K3s. I added the Helm repo for kube-vip, and pulled the values.yaml file to be able to customize it:

mkdir ~/workspace/kubernetes/kube-vip
cd ~/workspace/kubernetes/kube-vip
helm repo add kube-vip https://kube-vip.github.io/helm-charts
helm repo update

wget https://raw.githubusercontent.com/kube-vip/helm-charts/main/charts/kube-vip/values.yaml

Here are the changes I made to the values.yaml, saving it as values-revised.yaml:

< pullPolicy: IfNotPresent
> pullPolicy: Always
< # tag: "v0.7.0"
> tag: "v0.8.0"
< address: ""
< cp_enable: "false"
> cp_enable: "true"
< svc_election: "false"
< vip_leaderelection: "false"
> svc_election: "true"
> vip_leaderelection: "true"
> vip_leaseduration: "5"
< name: ""
> name: "kube-vip"
< nodeSelector: {}
> nodeSelector:
> node-role.kubernetes.io/control-plane: ""
> - effect: NoExecute
> key: node-role.kubernetes.io/control-plane
> operator: Exists
< # nodeAffinity:
< # requiredDuringSchedulingIgnoredDuringExecution:
< # nodeSelectorTerms:
< # - matchExpressions:
< # - key: node-role.kubernetes.io/master
< # operator: Exists
< # - matchExpressions:
< # - key: node-role.kubernetes.io/control-plane
< # operator: Exists
> nodeAffinity:
> requiredDuringSchedulingIgnoredDuringExecution:
> nodeSelectorTerms:
> - matchExpressions:
> - key: node-role.kubernetes.io/control-plane
> operator: Exists

Besides using a newer kube-vip version, this enabled load balancing for control plane nodes and services, selects nodes that have the control-plane attribute (but not a value, like the article), and sets the node affinity.

With this custom values file, I could do the install:

helm install my-kube-vip kube-vip/kube-vip -n kube-system -f values-revised.yaml

With this, all the kube-vip pods were up, and the daemonset showed three desired, current, and ready. However, when I changed the server IP to my VIP in ~/.kube/config and tried kubectl commands, they failed saying that there was a x509 certificate for each of the control plane nodes, and a cluster IP, but not for the VIP I’m using.

This can be fixed by re-generating the certificates on every control plane node:

sudo su
kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' --insecure-skip-tls-verify > kubeadm.yaml

mv /etc/kubernetes/pki/apiserver.{crt,key} ~
kubeadm init phase certs apiserver --config kubeadm.yaml

In the output, I saw the IPs of the control plane nodes AND the VIP I defined. Next, the kube-apiserver container needs to be stopped and removed, so that a new one is started.

crictl ps | grep kube-apiserver
crictl stop <ID-of-apiserver>
crictl rm <ID-of-apiserver>

Now, kubectl commands using the VIP will be redirected to the control plane node running the API server, and if that node is unavailable, the requests will be redirected to another control plane node. You can see that by doing arping of the VIP and, when the leadership changes, the MAC displayed will change.

Kind of involved, but this works!

I did have some problems, when playing with HA for the API. I had rebooted the control plane node that was actively providing the API. Kube-vip did its job, and IPVS redirected API requests to another control plane node that was “elected” as the new leader. All good so far.

However, when that control plane node came back up, it would appear in the “kubectl get node” output, but showed as “NotReady”, and it never seemed to become ready. It appeared that the network was not ready, and the calico-node pod was showing an error. I played around a bit, but couldn’t seem to clear the error.

One thing I did was a Kubespray upgrade-cluster.yml with the –limit argument, specifying the node and one of the other control plane nodes (so that control plane “facts” were specified). The kube-vip pod for the node was still failing with a connection refused error. On the node, I stopped/removed the kube-apiserver container and then kube-vip container, and then kube-vip no longer had any errors.

The only thing was that ipvsadm on the node, did not show a load balancing entry for the VIP, and the other two control plane nodes only had their IPs in the load balancing entry for the VIP. I didn’t try rebooting another control-plane node.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on High Availability?
May 14


After looking at several posts on OpenVPN, I decided to go with this one, which uses Helm, works with Kubernetes (versus just Docker), supports ARM64 processors, and had some easy configuration built-in. It hasn’t been updated in over a year, so I forked the repo and made some changes (see details below).

Here are the steps to set this up…

Pull Repo

To start, pull my version of the k4kratik k8s-openvpn repository:

cd ~/workspace/kubernetes/
git clone https://github.com/pmichali/k8s-openvpn.git
cd k8s-openvpn



When working on a Mac, you can install Docker Desktop to run docker commands from the command line. You can alter the Dockerfile.aarch64 to use a newer Alpine image (and hence a newer OpenVPN image). Build a local copy of the openvpn image:

cd build/
docker build -f Dockerfile.aarch64 -t ${YOUR_DOCKER_ID}/openvpn:latest .

Setup a Docker account at hub.docker.com and create an access token so that you can log in. Push your image up to DockerHub:

docker login
docker push ${YOUR_DOCKER_ID}/openvpn:latest
cd ../deploy/openvpn


In k8s-openvpn/deploy/openvpn there is a values.yaml file, copy it to ${USER}-values.yaml and customize for your needs. In my case, I did the following changes:

  • Under ‘image’ ‘repository’, set the username to YOUR_DOCKER_ID, so that it loads your image.
  • Under the ‘service’ section, used a custom ‘externalPort’ number.
  • Under the service section, set a ‘loadBalancerIP’ address that is in my local network.
  • Set ‘DEFAULT_ROUTE_ENABLED: false’ so not using pod’s host route. Instead, will provide route later.
  • Decided to limit the number of clients by un-commenting ‘max-clients 5’
  • Under ‘serverConf’ section:
    • Added a route to my local network using ‘push “route <NETWORK>/<PREFIX>”‘.
    • Added my local DNS server with ‘push “dhcp-option DNS <IP>”‘.
    • Added OpenDNS as a backup DNS with ‘push “dhcp-option DNS″‘.

You can also change server and client configuration settings in deploy/openvpn/templates/config-openvpn.yaml, if desired.



With the desired changes, use helm to deploy OpenVPN:

helm upgrade --install openvpn . -n k8s-openvpn -f ${USER}-values.yaml --create-namespace

Check that the pods, services, deployment, replicas are all up:

kubectl get all -n k8s-openvpn

This will take quite some time (15+ minutes), as it builds all the certificates and keys for the server. Once running, you can log into the pod and check the server config settings in /etc/openvpn/openvpn.conf.


Create Users

With the server running, you can create client configuration files:

cd ../../manage
bash create_user.sh NAME [DOMAIN-NAME]

Once the client config is created, the config file can be imported into your OpenVPN client and you can test connecting. I use the OpenVPN client, which is available on several platforms.

There are two options when creating the client config. With just a (arbitrary) name for the device, it will create a config file (NAME.ovpn) where the client OpenVPN will connect to the OpenVPN server on the local network. In my case, that is the IP address that I specified in the customized values.yaml file with the ‘loadbalancerIP’ setting.

For example, if you set loadbalancerIP to and ‘externalIP’ to 6666, the client will try to connect to Obviously, you can do that only from your local network. To use the, when out at Wi-Fi hot-spots, you can use the next option.

If you also add a domain name argument, then the OpenVPN client will try to connect to a server at that domain. You can purchase a domain name that maps the domain to your home router’s WAN IP address and use a service, like DynDNS to keep the IP updated for the domain (typically you get an IP from your ISP via DHCP and that can change over time). On your router, you can port forward from the ‘externalPort’ specified in the customized values.yaml to that same port on OpenVPN server, which is at the IP specified by ‘loadbalancerIP’.

For example, with loadBalancerIP set to and ‘externalPort’ set to 6666, and a domain mydomain.com, the client would try to connect to mydomain.com:6666, which could be done from anywhere. You would need to make sure the dynamic IP for mydomain.com is pointing to your WAN IP address of your router, and do port forwarding for port 6666 to port 6666.



When I upgraded the Apline OS for the VPN container, which in turn selects the version of OpenVPN (2.6.10 at the time of this posting), I wanted to make sure that the configuration settings for ciphers/digests were current.

In deploy/openvpn/templates/config-openvpn.yaml there is a section called openvpn.conf, which has the server configuration settings. Here are the pertinent entries in that section:

 auth SHA512
tls-version-min 1.2

With the running OpenVPN pod, you can exec into the pod and run these commands to see the ciphers that are available. For TLS ciphers, you can use this command to see the ciphers for TLS 1.3 and newer,  and TLS 1.2 and older:

/usr/sbin/openvpn --show-tls

In my case, as I was supporting TLS 1.2 as a minimum, the existing set of ciphers were in the 1.2 list, so I left it alone. Likewise the following command can show the digests available:

/usr/sbin/openvpn --show-digests

Again, I saw SHA512 in the list, so I left this alone. Lastly, in the values.yaml file where you can customize the ‘cipher’ clause, it now has:

cipher: AES-256-CBC

Prevoiously, it have the value ‘AES-256-GCM’, however, this is not used, when using TLS authentication. Also, I did change the protocol from TCP to UDP, which, as I understand, is more robust.


Details of Modifications Made


  • Using newer alpine image (based on edge tag 20240329)
  • Updated repo added, to use the newer test repo location – main and community already exist.


  • Removed client config settings that were generating warning log messages with opt-verify set.
  • Setting auth to sha512 on client and server.
  • Disabled allowing compression on server and used of compression (security risk).
  • Added settings that were on client to server for mute, user, group, etc.
  • Set opt-verify for testing, but then commented out, as it is deprecated.
  • Specifying TLS min 1.2 on server.


  • Turned off node affinity for lifecyle=ondemand. Does not exist on my bare metal cluster.
  • Newer busybox version 1.35 for init container.


  • Using my docker hub repo image for openvpn.
  • Altered ports used for loadbalancer service (arbitrary) and fixed IP.
  • Using Longhorn for storage class.
  • Using different client network (arbitrary).
  • Using udp protocol.
  • Changed K8s pod and service subnets to match what I use (arbitrary).
  • Set to redirect all traffic through gateway.
  • Using AES-256-CBC as default cipher.
  • Pushed route for DNS servers I wanted.


  • Allow to pass domain name vs using published service IP.
  • Fixed namespace.
  • Fixed kubectl exec syntax for newer K8s.


  • Fixed incorrect usage message.
  • Fixed namespace
  • Fixed kubectl exec syntax for newer K8s.
Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on OpenVPN
May 13

Cluster Upgrade – Challenge

With my cluster running a kubespray version around 2.23.3 and kubernetes 1.28.2, I wanted to give a try at updating my cluster, as there were newer versions available. There were all sorts of problems along the way, so I’ll try to cover what I did, and what (finally) worked.

For reference, my cluster has longhorn storage, prometheus/grafana/loki, metalLB, nginx-ingress, and velero installed, as well.

But, before doing anything, I decided to move things around a bit in my directory structures, so that I didn’t have git repos inside of my ~/workspace/picluster git repo. I created a ~/workspace/kubernetes and placed several directories as peers in that area:

├── grafana-dashboards-kubernetes
├── ingress
├── kubespray
├── mysql
├── nginx-ingress
├── picluster
└── velero

The rest of the components remained in the picluster area:

├── inventory
├── longhorn
├── metallb
├── minio
├── minio-k8s
├── monitoring
└── playbooks

With this setup, I proceeded to identify what kubespray version to upgrade to, and whether or not this was a multi-version upgrade or not. I found that the latest release tag was 2.24.0, but there were many more commits since then, so I created a tag at my current version (0f243d751), checked out and created a tag at the desired version (fdf5988ea).

Next, I wanted to make sure that all the tools I’m using match what Kubespray is expecting for the commit that I’m using. There is a requirements.txt file that calls out all the versions. I used ‘poetry show’ to see what versions I had, and then used ‘poetry add COMPONENT==VERSION’ with a version to make sure that there were compatible versions. For example:

poetry add ansible==9.5.1

I copied the sample inventory area into my ~/workspace/kubernetes/picluster/inventory area and merged in my existing hosts.yaml, so that I had any customizations that were originally made in k8s-cluster.yml).

With this, I was ready to go to the kubespray directory and do the upgrade using…

cd ~/workspace/kubernetes/kubespray
ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -v --private-key=~/.ssh/id_ed25519 -e upgrade_cluster_setup=true

Initially, I saw that the calico-node pods were stuck in a crash loop…

calico-node: error while loading shared libraries: libpcap.so.0.8: cannot open shared object file: No such file or directory

It turns out that the 2.24.0+ release of kubespray uses calico v3.72.2, which has issues on arm64 processors. The choice was to go to v3.72.0, which apparently has a memory leak, or go to v3.72.3, where the problem with the library was fixed. I decided to do the later, but when I overrode calico_version, the upgrade failed, because there is no checksum for that version.

I found out that in the kubespray area, there is a scripts directory, with a download_hash.sh script, which would read the updated calico_version in ./roles/kubespray-defaults/defaults/main/download.yml and update the roles/kubespray-defaults/defaults/main/checksums.yml file. Well, it wasn’t as easy as that, because I was using a MacBook and the grep command does not have a -P (perl) option, used in the script. So…

I copied the Dockerfile to HashMaker.Dockerfile, and trimmed it to this:

# syntax=docker/dockerfile:1

FROM ubuntu:22.04@sha256:149d67e29f765f4db62aa52161009e99e389544e25a8f43c8c89d4a445a7ca37

DEBIAN_FRONTEND=noninteractive \

WORKDIR /kubespray

# hadolint ignore=DL3008
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update -q \
&& apt-get install -yq --no-install-recommends \
curl \
python3 \
python3-pip \
sshpass \
vim \
openssh-client \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/log/*

RUN --mount=type=bind,source=requirements.txt,target=requirements.txt \
--mount=type=cache,sharing=locked,id=pipcache,mode=0777,target=/root/.cache/pip \
pip install --no-compile --no-cache-dir -r requirements.txt \
&& find /usr -type d -name '*__pycache__' -prune -exec rm -rf {} \;

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

COPY scripts ./scripts

I copied scripts/download_shas.sh to scripts/download_shas_pcm.sh and made these changes (as inside the container there is no git repo:

< checksums_file="$(git rev-parse --show-toplevel)/roles/kubespray-defaults/defaults/main/checksums.yml"
> checksums_file="./roles/kubespray-defaults/defaults/main/checksums.yml"
< default_file="$(git rev-parse --show-toplevel)/roles/kubespray-defaults/defaults/main/main.yml"
> default_file="./roles/kubespray-defaults/defaults/main/main.yml"

With these changes, I did the following to build and run the container, where I could run the scripts/download_hash_pcm.sh script to update the checksum.yml file with the needed checksums…

docker buildx build --platform linux/arm64 -f HashMaker.Dockerfile -t hashmaker:latest .

docker run --rm -it --mount type=bind,source="$(pwd)"/roles,dst=/kubespray/roles --mount type=bind,source="${HOME}"/.ssh/id_ed25519,dst=/root/.ssh/id_ed25519 hashmaker:latest bash

(Yeah, I could have invoked the script instead of running bash and then invoking the script inside the container).

With this one would think that we are ready to do the upgrade. Well, I tried, but I hit some other issues…

  • Some nodes were updated to 1.29.3 kubernetes, but some were still at 1.28.2
  • The prometheus/grafana pods were in a crash loop, complaining that there were multiple default datasources.
  • Longhorn was older 1.5.3, and I figured it would be simple to helm upgrade to 1.6.1 – it wasn’t

Someone on Slack said that I need to do the kubespray upgrade with the “-c upgrade_cluster_setup=true” added. I did that, but it did not work and I still have three nodes with 1.29.3 and four with 1.28.2.

I found the problem with the versions. On the four older nodes, at some point kubeadm and/or kubelet were installed (as Ubuntu package). As a result, there was the newer /usr/local/bin/kubelet (v1.29.3), and the package installed /usr/bin/kublet (v1.28.2). For systemd, in addition to the /etc/systemd/system/kubelet.service, which used the /usr/local./bin/kubelet in ExecStart, there was a kubelet.service.d directory with 10-kubeadm.conf file that used /usr/bin/kubelet in ExecStart. This one seemed to take precedence.

To resolve, I removed the Ubuntu kubeadm package, which depended on kubelet, and I removed the kubelet.service.d directory and reloaded systemd. My only guess is that at one point I tried installing kubeadm. Now, upgrades will show all nodes using the newer 1.29.3 kubernetes.

I got into real trouble with this one. I tried deleting pods, removing replicasets that were no longer in use, and then tried to helm upgrade kube-prometheus-stack. That caused even more problems, as the upgrade failed and now I had a whole bunch of failing pods and replicasets not ready. The Prometheus pods were complaining about multiple attachments to the same PV (I was using Longhorn storage). I couldn’t clear the errors and could remove PVCs. I’m not sure if the problem was that I didn’t use all the arguments that I used, when I initially installed Prometheus.

I tried updating Longhorn (pulling the 1.6.1 values.yaml, changing policy from Delete to Retain and type from ClusterIP to NodePort, and then helm update with the modified values.yaml), and that was a mess too. Crash loops, and replicasets not working.

I ended up deleting the cluster entirely. I was concerned that maybe there was an issue with upgrading in general, so I installed the older kubespray/kubernetes cluster, without installing any other components (Longhorn, Prometheus), and did an upgrade. Everything worked fine.

I need to retry this, maybe with the upgrade of Prometheus using the same args as install did. I’m also worried about the multiple attachment issue with the PV.

In the meantime, I wanted to trying updating Longhorn…

With the original, Longhorn was at 1.5.3, and 1.6.1 is available. I had tried a helm upgrade (after I had upgraded the cluster), and had all sorts of problems. So, I created a new cluster, with the latest Kubernetes, made sure everything was up, and then helm installed 1.5.3, using the modified values.yaml I had with Retain policy and NodePort:

helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.5.3 --values values-1.5.3.yaml

I then did a helm upgrade to 1.6.1…

helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --version 1.6.1

There were some pods in crash loops, and items not ready. I deleted the older replicasets. It looked like the deployment had annotation for 1.6.1, but was still calling out an image of 1.5.3. Looking at Longhorn notes, I saw that I could use kubectl to upgrade, and even knowing that I did use Helm install/upgrade before, I decided to try it.

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.1/deploy/longhorn.yaml

There were a bunch of warnings when running the command, but all pods came up and the deployment showed 1.6.1 for the image.

I’m not sure if there was something wrong with doing the helm update, if it was because I was customizing the values.yaml files, or if it was because I was using NodePort. With the kubectl apply, the type was set to clusterIP.

I’ve got more research to do here to isolate this issue.

I tested installing a 1.28.2 cluster, and then upgraded to 1.29.3 (doing just control plane nodes and etcd node first, and then all the worker nodes). Every pod was up and running, daemonsets/replicasets/deployments were all working, and things were looking pretty good.

There were some pre-upgrade replicatesets that were present (no needed/available instances), so I deleted them. I did a snapshot and backup of a Longhorn volume and that worked as well. I do see two problems so far.

First, under Grafana, the data sources were gone. I could not modify the Loki instance (as built-in), but I created another one. The original was giving connection refused errors. I think that the IP it uses, is the old one. There also was no Prometheus data source. I created one, and used the cluster IP and it works as well.

Second, I tried to do a backup of the Kubernetes cluster using Velero, and it failed. I tried viewing the log, but there was none. When checking ‘velero backup-location get’, it shows that the backup location is not available. It seems like various components are using older IPs/ports?

PROBLEM FOUND… It appears that when an upgrade occurs and the coredns version has changed, a new deployment, replicaset, service, and pods are created with the new version AND they get a new nameserver IP ( However, the existing pods (and new ones created) are still referring to the old nameserver IP (default is There is a service for that old nameserver IP, but it is not resolving addresses. If you do nslookup and specify the new nameserver IP, it will work, but that doens’t help everything that is running or new pods created, which are using the old nameserver IP.

WORKAROUND: If an install (cluster.yml) is done again, using the exact same settings, the first DNS service becomes active again. One can then delete the newly created service, and the unused replicasets. I tried repeating the upgrade, but that did not resolve the issue.

There does appear to be a download of the new coredns and restart of the systemd-resolved service. I don’t know if there is some mechanism to switch pods to use the new IP or if somehow the new service should have replaced the original and use the same IP.

After messing with things over a few weeks I found out quite a bit of things…

CoreDNS: I see that with the newer Kubespray master branch versions, they now have checksums for coredns 3.72.3. AS a result, I don’t need to go through the contortions of creating my own branch of Kubespray and building the checksums or coredns 3.72.3. I just picked a newer commit of Kubespray (not the current tagged version, as it still did not have the checksums for coredns 3.72.3.

Upgrading with CoreDNS changes: I found out that with the newer Ubuntu versions the kernels actually have the “dummy” kernel module. I see it in the current 6.5.0-1015-raspi kernel, and I think it was in 1013 and 1014. The implication of this is that, I was unable, in the past, to enable node local DNS in Kubespray, because this module was needed. After updating the OS on my nodes to have this newer kernel, I could then run Kubespray installs and upgrades with ‘enable_nodelocaldns’ setting and now upgrades had a working DNS, even when the version of coredns changed. There were some replicasets that remained and were not active, but the upgrades are working.

Scheduling Disabled: I was seeing several issues when doing upgrades. In one case, I found that a worker node status that was “Ready”, but had “SchedulingDisabled” indicated. I did a “kubectl uncordon NODENAME” and that enabled scheduling. Not sure why it was not completely upgraded.

Upgrading single node: I found that with Kubespray, you can use the command line argument on upgrade (and other commands) –limit “NODE1,NODE2,NIODE3”, to limit the nodes that are affected by the command to one or more that are specified in the limit clause. However, when I did an upgrade, specifying ONLY a worker node, the process failed at this step:

TASK [kubernetes-apps/network_plugin/multus : Multus | Start resources] ********
fatal: [niobe -> {{ groups['kube_control_plane'][0] }}]: FAILED! => {"msg": "Error in jmespath.search in json_query filter plugin:\n'ansible.vars.hostvars.HostVarsVars object' has no attribute 'multus_manifest_2'"}

The problem is, that I don’t have Multus enabled! It turns out that there is a bug in Kubespray, such that you need to have a control plane node included in the limit clause, so that it will parse that Multus is disable and will not attempt to start it up on the worker node. I just re-ran the upgrade specifying one control plane node (already upgraded) and the worker node I wanted to update..

Node name changes: OK, this was stupid. I named my nodes after characters from the movie “The Matrix” (Apoc, Cypher, Morpheus,…). Since the original install, I’ve been playing with updating Kubespray versions, updating Kubernetes, installing things like Prometheus and Longhorn, and working through the problem I had with CoreDNS version changing during upgrades. Recently, I realized that one of my worker nodes was actually named incorrectly. It was “niobi” and not “niobe”. I changed my inventory and rename the hostname on the node. At one point, I decided to retest upgrades (with the node local DNS enabled). I did this by checking out tags that I had created for my repo and the Kubespray repo, performing a clean install, updating the repos to newer tags or the latest commit, updating the Poetry environment so that the correct tool versions were used with the Kubespray version I was trying, and then doing an upgrade. The upgrade was failing on node “niobe”, and it took me a while to realize that when I did the install, the node was named “niobi”, but when I did the upgrade, it was named “niobe” (with the same IP). The (simple) fix, was to do fix the hostname in the inventory, before doing the initial install.

In the future, I think it is probably best to do the kubernetes/kubespray update separate from other components. In addition, I think the update should be done a node at a time, starting with control plane nodes, and then worker nodes. Kubespray does have a limit option to restrict to a node. They say to run facts.yml to update info on all nodes, update control plane/etcd nodes, and then do worker nodes:

ansible-playbook playbooks/facts.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -e kube_version=v1.29.3 --limit "kube_control_plane:etcd" -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -e kube_version=v1.29.3 --limit "morpheus:niobi:switch" -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

I used this on a re-try of the upgrade and the facts and control plane/etc steps worked fine, but I hit an error in the downloading step for the worker nodes. Just note that, with the current Kubespray, you probably should include one control plane node, when upgrading one or more worker nodes, so that the configuration is handled correctly.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Cluster Upgrade – Challenge
February 25

MySQL With Replicas on Raspberry PI Kubernetes

I Know that I’ll need a database for several projects that I want to run on my Raspberry PI based Kubernetes cluster, so I did some digging for blogs and tutorials on how to set this up.

I found some general articles on how to setup MySQL, and even one that talked about setting up multiple pods so that there are replicas for the database. Cool!

However, I had difficulty in finding information on doing this with ARM64 based processors. I found this link on how to run an MySQL operator and InnoDB with multiple replicas for ARM64 processors, but it had two problems. First, it used a fork of the upstream repository for the MySQL operator and had not been updated in over a year, so images (which were in a repo in that account) were older. Second, it made use of a “mysql-router” image, from a repo in the same account, but it didn’t exist!

So, I spent several days, trying to figure out how to get this to work, and then how to use it with the latest images that are available for ARM64 processors. I could not figure out how to build images from a forked repo, as it seems that the build scripts are setup for Oracle’s CI/CD system and there is no documentation on how to manually build. In any case, using information from this forked repo and after doing a lot of sleuthing, I have it working…

The MySQL Operator repo contains both the operator and the innodbcluster components. They are designed to work with AMD64 based processors, and there is currently no ARM64 support configured. When I asked on the MySQL operator Slack channel as of the February 2024, they indicated that the effort to support ARM64 has stalled, so I decided to figure out how to use this repo, customizing it to provide the needed support.

I used Helm versus manifests, to set things up. First, I setup an area to work and prepared to access my Raspberry PI Kubernetes cluster

cd ~/workspace/picluster
poetry shell

mkdir mysql
cd mysql

Add the mysql-operator repo:

helm repo add mysql-operator https://mysql.github.io/mysql-operator/
helm repo update

The operator chart can now be installed, but we need to tell it to use an ARM64 image of the Oracle community version of the operator. Here are the available operator versions to choose from. I’ll use the 8.3.0-2.1.2-aarch64 version:

helm install django-mysql-operator mysql-operator/mysql-operator -n mysql-operator --create-namespace --set image.tag="8.3.0-2.1.2-aarch64"

This creates a bunch of resources and most noticeable, a deployment, replica set, and pod for the operator, in the mysql-operator namespace. The name, django-mysql-operator’ is arbitrary. Check to make sure everything is running with:

kubectl get all -n mysql-operator
NAME                                  READY   STATUS    RESTARTS   AGE
pod/mysql-operator-6cc67fd566-v64dp   1/1     Running   0          7h21m

NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/mysql-operator   ClusterIP   <none>        9443/TCP   7h21m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql-operator   1/1     1            1           7h21m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-operator-6cc67fd566   1         1         1       7h21m

Next, we can install the helm chart for the MySQL InnoDBCluster. Again, we need to select from available ARM64 versions for the community operator, community router (should be able to use same version), and MySQL server (pick a tag that supports both AMD64 and ARM64 – I used 8.0). Since there are so many changes, we’ll use a values.yaml file, instead of command line –set arguments.

We can get the current values.yaml file with:

helm show values mysql-operator/mysql-innodbcluster > innodb-values.yaml

In that file, you can see the defaults that would be applied, like number of replicas, and can do some additoinal customizations too. In all cases, if you use a values.yaml file, you MUST provide a root password. For our case, we select to use self signed certificates, and specify arm images for the container, sidecar, and a bunch of init containers. Here are just the changes needed, using the versions I chose at the time of this writing:

cat innodb-values.yaml
    password: "PASSWORD YOU WANT"
# routerInstances: 1
# serverInstances: 3
  useSelfSigned: true
    - name: fixdatadir
      image: container-registry.oracle.com/mysql/community-operator:8.3.0-2.1.2-aarch64
    - name: initconf
      image: container-registry.oracle.com/mysql/community-operator:8.3.0-2.1.2-aarch64
    - name: initmysql
      image: mysql/mysql-server:8.0
    - name: mysql
      image: mysql/mysql-server:8.0
    - name: sidecar
      image: container-registry.oracle.com/mysql/community-operator:8.3.0-2.1.2-aarch64
      - name: router
        image: container-registry.oracle.com/mysql/community-router:8.3.0-aarch64

Using this file, we can create the pods for the three MySQL pods using the command:

helm install django-mysql mysql-operator/mysql-innodbcluster -f innodb-values.yaml

It’ll create a deployment, replica, a stateful set, services, three pods, along with three PVs and PVCs, and a new innodbcluster resource and instance. The name provided ‘django-mysql’, will be the prefix for resources. They will take a while to come up, so have patience. Once the pods and statefulset are up, you see a router pod created and started:

$ kubectl get all
NAME                                       READY   STATUS    RESTARTS       AGE
pod/django-mysql-0                         2/2     Running   0              6h55m
pod/django-mysql-1                         2/2     Running   0              6h55m
pod/django-mysql-2                         2/2     Running   0              6h55m

NAME                             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                                    AGE
service/django-mysql             ClusterIP   <none>        3306/TCP,33060/TCP,6446/TCP,6448/TCP,6447/TCP,6449/TCP,6450/TCP,8443/TCP   6h55m
service/django-mysql-instances   ClusterIP   None           <none>        3306/TCP,33060/TCP,33061/TCP                                               6h55m

NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/longhorn-iscsi-installation   7         7         7       7            7           <none>          51d

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/django-mysql-router   1/1     1            1           6h55m

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/django-mysql-router-696545f47b   1         1         1       6h55m

NAME                            READY   AGE
statefulset.apps/django-mysql   3/3     6h55m

When everything is running, you can access the zero instance of the MySQL pod with:

kubectl exec -it pod/django-mysql-0 -c mysql -- /bin/bash
bash-4.4$ mysqlsh -u root -p

USE todo_db;
INSERT INTO Todo (task, status) VALUES ('Hello','ongoing');

Enter in the password you defined in the innodb-values.yaml and you can now create a database, tables, and populate table entries. If you exec into one of the other MySQL pods, the information will be there as well, but will be read-only.

there are other customizations, like changing the number of replicas, the size of the PVs used, etc.

You can reverse the process, by first deleting the MySQL InnoDBCluster:

helm delete django-mysql

Wait until the pods are gone (it takes a while), and then delete the MySQL operator:

helm delete django-mysql-operator -n mysql-server

That should get rid of everything, but if, not here are other things that you can delete. Note: My storage class, Longhorn, is set to retain the PVs, so they must be manually deleted (I can’t think of an easier way):

kubectl delete sa default -n mysql-operator
kubectl delete sa mysql-operator-sa -n mysql-operator

kubectl delete pvc datadir-django-mysql-0
kubectl delete pvc datadir-django-mysql-1
kubectl delete pvc datadir-django-mysql-2
kubectl delete pv `kubectl get pv -A -o jsonpath='{.items[?(@.spec.claimRef.name=="datadir-django-mysql-0")].metadata.name}'`
kubectl delete pv  `kubectl get pv -A -o jsonpath='{.items[?(@.spec.claimRef.name=="datadir-django-mysql-1")].metadata.name}'`
kubectl delete pv  `kubectl get pv -A -o jsonpath='{.items[?(@.spec.claimRef.name=="datadir-django-mysql-2")].metadata.name}

I would like to figure out how to create a database and user, as part of the pod creation process, rather than having to exec into the pod and use mysql or mysqlsh apps.

I’d really like to be able to specify a secret for the root password, instead of including it into a vales.yaml file.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on MySQL With Replicas on Raspberry PI Kubernetes
February 12

S3 Storage In Kubernetes

In Part VII: Cluster Backup, I set up Minio running on my laptop to provide S3 storage that Velero can use to backup the cluster. In this piece, Minio will be setup “in cluster”, using Longhorn. There are a few links discussion how to do this. I didn’t try this method, but did give this a go (with a bunch of modifications), and am documenting it here.

For starters, I’m using the Helm chart for Minio from Bitnami:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

We’ll grab the configuration settings so that they can be modified:

mkdir -p ~/workspace/picluster/minio-k8s
cd ~/workspace/picluster/minio-k8s
helm show values bitnami/minio > minio.yaml

Create a secret to be used to access Minio:

kubectl create secret generic minio-root-user --namespace minio --from-literal=root-password="DESIRED-PASSWORD" --from-literal=root-user="minime"

In minio.yaml, set auth existingSecret to “minio-root-user” so that the secret will be used for authentication, set defaultBucket to “kubernetes”, and set service type to “NodePort”. The Minio deployment can be created:

helm install minio bitnami/minio --namespace minio --values minio.yaml

The Minio console can be accessed by using a browser, a node’s IP and the NodePort port:

kubectl get svc -n minio
NAME    TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                         AGE
minio   NodePort   <none>        9000:32602/TCP,9001:31241/TCP   78m

In this case, using a one of the node’s ( Use the username and password you defined above, when creating the secret.

Now, we can install Velero, using the default bucket we had created (one could create another bucket from the Minio UI), credentials file, and cluster IP for the Minio service:

cat minio-credentials
aws_access_key_id = minime
aws_secret_access_key = DESIRED-PASSWORD

velero install \
     --provider aws \
     --plugins velero/velero-plugin-for-aws:v1.8.2 \
     --bucket kubernetes \
     --secret-file minio-credentials \
     --use-volume-snapshots=false \
     --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=

The backup location can be checked (and should be available):

velero backup-location get
default   aws        kubernetes      Available   2024-02-12 20:43:23 -0500 EST   ReadWrite     true

Finally, you can test the backup and restore of a single deployment (using the example from Part VII, where we pulled the velero repo, which has an example NGINX app):

kubectl create namespace nginx-example
kubectl create deployment nginx --image=nginx -n nginx-example

velero backup create nginx-backup --selector app=nginx
velero backup describe nginx-backup
velero backup logs nginx-backup

kubectl delete namespace nginx-example

velero restore create --from-backup nginx-backup
velero restore describe nginx-backup-20240212194128

kubectl delete namespace nginx-example
velero backup delete nginx-backup
velero restore delete nginx-backup

There is a Minio client, although it seems to be designed for use with a cloud based back-end or local installation. It has predefined aliases for Minio, and is designed to run and terminate on each command. Unfortunately, we need to set a new alias, so that it can be used with later commands. We can hack a way into use it.

First, we need to know the Cluster IP address of the Minio service, so that it can be used later:

kubectl get svc -n minio
NAME    TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                         AGE
minio   NodePort   <none>        9000:32602/TCP,9001:31241/TCP   78m

We get the user/password, and then run the client so that an alias (using cluster IP, in this case) can be created and commands invoked.

export ROOT_USER=$(kubectl get secret --namespace minio minio-root-user -o jsonpath="{.data.root-user}" | base64 -d)
export ROOT_PASSWORD=$(kubectl get secret --namespace minio minio-root-user -o jsonpath="{.data.root-password}" | base64 -d)

kubectl run --namespace minio minio-client \
     --tty -i --rm --restart='Never' \
     --env MINIO_SERVER_HOST=minio \
     --image docker.io/bitnami/minio-client:2024.2.9-debian-11-r0 -- \
mc admin info myminio
Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on S3 Storage In Kubernetes
February 3

More Power! Adding nodes to cluster

I’ll document the process I used to add two more Raspberry Pi 4s to the cluster that I’ve created in this series.

Preparing The PIs

With two new Raspberry PI 4s, PoE+ hats, SSD drives (2 TB this time), and two more UCTRONICS RM1U-3 trays each with an OLED display, power button, SATA Shield card, and USB3 jumper, I set out to assemble the trays and image them with Ubuntu.

Everything went well, assembling the trays with the Raspberry PIs. In turn, I connected a keyboard, HDMI display, Ethernet cable, and power adapter (as I don’t have PoE hub in my study). Once booted, I followed the steps in Part II of the series, however there were some issues getting the OS installed.

First, the Raspberry PI Imager program has been updated to support PI 5s, so there were multiple menus, tabbed fields, etc. I decided to connect a mouse to the Raspberry PI, rather then enter a maze of tabs and enters and arrows to try to navigate everywhere.

Second, when I went to select the Storage Device, the SSD drive was not showing up. I didn’t know if this was an issue with the UCTRONICS SATA Shield, the different brand of drive, the larger capacity, the newer installer, or the Raspberry PI itself. I did a bunch of different things to try to find out the root cause, and finally found out that to make this work, I needed to image the SSD drive using the Raspberry PI Imager on my Mac, using a SATA to USB adapter, and then place it into the UCTRONICS tray along with the Raspberry PI and it would then boot to the SSD drive.

Third, for one of the two Raspberry PIs, this still did not work, and I ended up installing the Raspberry PI OS on an SD card, update the EEPROM and bootloader, and then net booted the Raspberry PI Installer, and then I was able to get the Raspberry PI to boot from the SSD drive. Probably a good idea to update the EEPROM and bootloader to the latest anyway.

Initial Setup

Like done in Part II of the series, I picked IP addresses for the two units, added their MAC addresses into my router so that those IPs were reserved, added the host names to my local DNS server, and create SSH keys for each and used “ssh-copy-id” to copy those keys to all the other nodes and my Mac, and vice versa. Connectivity was all set.

I decided NOT to do the repartitioning mentioned in Part III, and instead leave the drive as one large 2TB (1.8TB actually) drive. My hope is that with Kubernetes, I can monitor problems, so if I see log files getting out of hand, I can deal with it, rather than having fixed paritions for /tmp, /var, /home, etc.  I did create a /var/lib/longhorn directory – not sure if Longhorn would create this automatically.

Node Prep

With SSH access to each of the PIs, I could run through the same Ansible scripts that were used to setup all the other nodes as outlined in Part IV. Before running the scripts, I added the two nodes (morpheus, switch) to the hosts.yaml file in the inventory as worker nodes. There are currently, three master nodes, and four worker nodes.

When running these ansible scripts, I specified both hosts at once, rather than doing one at a time. For example:

cd ~/workspace/picluster
ansible-playbook -i "morpheus,switch" playbooks/passwordless_sudo.yaml -v --private-key=~/.ssh/id_ed25519 --ask-become-pass
ansible-playbook -i "morpheus,switch" playbooks/ssh.yaml -v --private-key=~/.ssh/id_ed25519

Now that the nodes are ready, they can be added to the cluster. For a control plane node, the cluster.yaml script is used:

cd ~/workspace/picluster/kubespray
ansible-playbook -i ../inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 cluster.yml

Then, on each node, restart the NGINX proxy pod with:

crictl ps | grep nginx-proxy | awk '{print $1}' | xargs crictl stop

In our case, these will be worker nodes, and would be added with these commands (using limit so other nodes are not affected:

ansible-playbook -i ../inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 --limit=morpheus scale.yml
ansible-playbook -i ../inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 --limit=switch scale.yml

These two nodes added just fine, with the Kubernetes version v1.28.5, just like the control plane node I added before (my older nodes are still v1.28.2, but not sure how to update them currently).

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on More Power! Adding nodes to cluster
January 21

Part X: OpenLENS

Ref: https://github.com/MuhammedKalkan/OpenLens

LENS gives you a way to look at numerous things in your cluster. It consists of the OpenLENS repository, with the core libraries developed by Team LENS and the community. There are other (some commercial) tools, like the IDE, which are built on top of OpenLENS. There are binaries of free OpenLENS product and the easiest way on the Mac is to use brew to install:

brew install --cask openlens

You can then run the app and connect to your Kubernetes cluster, by clicking on the “Browse Clusters In The Catalog” button on the home screen. It will show credentials from your ~/.kube directory, and since we installed a cluster and copied over the config to ~/.kube/config, you should see that listed.

You’ll be able to see a summary of the cluster (CPU, memory, pods), along with a list of resources that you can select on the left side of the window:

There are items to view the nodes, pods, secrets, network services, persistent volume claims, Helm charts, cluster roles, custom resource definitions (CRDs), etc. Clicking on an item will allow you to see all the details, and give you the ability to edit the item.

For example, here is part of the screen for the Loki service:

Showing you labels, annotations, IP info, and access info for the service. You can click on the ports link, to access the service.

Here is the Prometheus Helm chart:

It shows the version and a description. If you were to scroll down, you can see information about the Prometheus Helm repo, and how to install, uninstall, and upgrade the chart.

If you were to check on the Helm Releases, and pick an item, like Prometheus shown below, you can see all the settings:

In summary, LENS gives you a bunch of visibility into the cluster, from one point.

FYI, the Github page for OpenLENS mentions that after 6.3.0, some extensions are not included, but that you can go to the extensions menu and enter in “@alebcay/openlens-node-pod-menu” and install those extensions. I did that and the status of the extensions flipped between enable/disable for quite a while. I exited the app, restarted, and then went to extensions and Enabled this extension.

After, I did see that when I viewed a node, and selected a pod, the menu that allows you to edit and delete the pod, now also has buttons that allow you to attach to the log (didn’t seem to work), shell into log, and view the logs for the containers in the pod. Pretty handy features.

Category: Kubernetes | Comments Off on Part X: OpenLENS
January 19

Part IX: Load Balancer and Ingress

Ref: https://metallb.universe.tf/

In lieu of having a physical load balancer, this cluster will use MetalLB as a load balancer. In my network, I have a block of IP addresses reserved for DHCP, and picked a range of IPs to use for load balancer IPs in the cluster.

The first thing to do, is to get the latest release of MetalLB:

cd ~/workspace/picluster
poetry shell

mkdir -p ~/workspace/picluster/metallb
cd ~/workspace/picluster/metallb
MetalLB_RTAG=$(curl -s https://api.github.com/repos/metallb/metallb/releases/latest|grep tag_name|cut -d '"' -f 4|sed 's/v//')
echo $MetalLB_RTAG

Obtain the version, install it, and wait for everything to come up:

wget https://raw.githubusercontent.com/metallb/metallb/v${MetalLB_RTAG}/config/manifests/metallb-native.yaml -O metallb-native-${MetalLB_RTAG}.yaml

kubectl apply -f metallb-native-${MetalLB_RTAG}.yaml
kubectl get pods -n metallb-system --watch
kubectl get all -n metallb-system

Everything should be running, but needs to be configured for this cluster. Specifically, we need to setup and advertise the address pool(s), which can be a CIDR, address range, and IPv4 and/or IPv6 addresses. For our case, I’m reserving – for load balancer IPs and using L2 advertisement (ipaddress_pool.yaml):

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
  name: production
  namespace: metallb-system
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
  name: l2-advert
  namespace: metallb-system

Apply this configuration, and examine the configuration:

kubectl apply -f ipaddress_pools.yaml
ipaddresspool.metallb.io/production created
l2advertisement.metallb.io/l2-advert created

kubectl get ipaddresspools.metallb.io -n metallb-system
production   true          false             [""]

kubectl get l2advertisements.metallb.io -n metallb-system

kubectl describe ipaddresspools.metallb.io production -n metallb-system
Name:         production
Namespace:    metallb-system
Labels:       <none>
Annotations:  <none>
API Version:  metallb.io/v1beta1
Kind:         IPAddressPool
  Creation Timestamp:  2024-01-17T19:05:29Z
  Generation:          1
  Resource Version:    3648847
  UID:                 38491c8a-fdc1-47eb-9299-0f6626845e82
  Auto Assign:       true
  Avoid Buggy I Ps:  false
Events:              <none>

Note: if you don’t want IP addresses auto-assigned, you can add the clause “autoAssign: false”, to the “spec:” section of the IPAddressPool.

To use the load balancer, you can change the type under the “spec:” section from ClusterIP or NodePort to LoadBalancer, by editing the configuration. For example, to change Grafana from NodePort to LoadBalancer, one would use the following to edit the configuration:

kubectl edit -n monitoring svc/prometheusstack-grafana

This is located at the bottom of the file:

  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  - IPv4
  ipFamilyPolicy: SingleStack
  - name: http-web
    nodePort: 32589
    port: 80
    protocol: TCP
    targetPort: 3000
    app.kubernetes.io/instance: prometheusstack
    app.kubernetes.io/name: grafana
  sessionAffinity: None
  type: NodePort
  loadBalancer: {}

When you show the service, you’ll see the load balancer IP that was assigned:

kubectl get svc -n monitoring prometheusstack-grafana
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
prometheusstack-grafana   LoadBalancer   80:32589/TCP   5d23h

Here is a sample deployment (web-demo-test.yaml) to try. IUt has the LoadBalancer type specified:

apiVersion: v1
kind: Namespace
  name: web
apiVersion: apps/v1
kind: Deployment
  name: web-server
  namespace: web
      app: web
        app: web
      - name: httpd
        image: httpd:alpine
        - containerPort: 80
apiVersion: v1
kind: Service
  name: web-server-service
  namespace: web
    app: web
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer

Apply the configuration and check the IP address:

kubectl apply -f web-app-demo.yaml
kubectl get svc -n web

From the command line, you can do “curl http://IP_ADDRESS” to make sure it works. If you want a specific IP address, you can change the above web-app-demo.yaml to add the following line after the type (note the same indentation level):

  type: LoadBalancer

Before removing MetalLB, you should change any services that are using it, to go back to NodePort or ClusterIP as the type. Then, delete the configuration:

kubectl delete -f metallb-native-${MetalLB_RTAG}.yaml

Ref: https://docs.nginx.com/nginx-ingress-controller/technical-specifications/

Ref: https://kubernetes.github.io/ingress-nginx/deploy/

With Load Balancer setup and running, we’ll create an Ingress controller using NGINX. You can view the compatibility chart here to select the NGINX version desired. For our purposes, we’ll use helm chart install, so that we have sources and can delete/update CRDs. I’m currently running Kubernetes 1.28, so either 1.02 or 1.1.2 of the Helm Chart. Let’s pull the charts for 1.1.2:

cd ~/workspace/picluster/
helm pull oci://ghcr.io/nginxinc/charts/nginx-ingress --untar --version 1.1.2
cd nginx-ingress

Install NGINX Ingress with:

helm install my-nginx --create-namespace -n nginx-ingress .
NAME: my-nginx
LAST DEPLOYED: Fri Jan 19 11:14:16 2024
NAMESPACE: nginx-ingress
STATUS: deployed
The NGINX Ingress Controller has been installed.

If you want to customize settings, you can add the “–values values.yaml” argument, after first getting the list of options using the following command, and then modifying them:

helm show values ingress-nginx --repo https://kubernetes.github.io/ingress-nginx > values.yaml

The NGINX service will have an external IP address, as the type is LoadBalancer, and MetalLB will assign an address from the pool (note: you can specify an IP to use in values.yaml).

To test this out, we’ll great a web based app:

kubectl create deployment demo --image=httpd --port=80
kubectl expose deployment demo

We can then create an ingress entry for a local (dummy) domain and forward port 8080 to the default port (80) for the app:

kubectl create ingress demo-localhost --class=nginx --rule="demo.localdev.me/*=demo:80"
kubectl port-forward --namespace=nginx-ingress service/my-nginx-nginx-ingress-controller 8080:80 &

To test this out, you can try accessing the URL:

curl http://demo.localdev.me:8080
Handling connection for 8080
<html><body><h1>It works!</h1></body></html>

If you have a publicly visible domain, you can forward that to the app. I have not tried it, but it looks like the ingress command would look like:

kubectl create ingress demo --class=nginx  --rule YOUR.DOMAIN.COM/=demo:80

Here is an example if doing path based routing of requests. First, create two pods and services that would handle request:

In apple.yaml:

kind: Pod
apiVersion: v1
name: apple-app
app: apple
- name: apple-app
image: hashicorp/http-echo
- "-text=apple"


kind: Service
apiVersion: v1
name: apple-service
app: apple
- port: 5678 # Default port for image
In banana.yaml:
kind: Pod
apiVersion: v1
name: banana-app
app: banana
- name: banana-app
image: hashicorp/http-echo
- "-text=banana"


kind: Service
apiVersion: v1
name: banana-service
app: banana
- port: 5678 # Default port for image

Then, create an ingress-demo.yaml that will redirect requests:

apiVersion: networking.k8s.io/v1
kind: Ingress
name: example-ingress
- http:
- path: /apple
pathType: Prefix
name: apple-service
number: 5678
- path: /banana
pathType: Prefix
name: banana-service
number: 5678

Apply the three YAML files. To test, you can access Ingress service IP ( in this example) or any node IP with the prefix:

<head><title>404 Not Found</title></head>
<center><h1>404 Not Found</h1></center>

This redirects the request to the apple service/pod.

To remove NGINX Ingress, you can use “helm delete”.

You must manually update the CRDs, before upgrading NGINX. Pull the new release and then apply the updated CRDs:

cd ~/workspace/picluster
helm pull oci://ghcr.io/nginxinc/charts/nginx-ingress --untar --version VERSION_DESIRED
cd nginx-ingress
kubectl apply -f crds/

You may see this warning, but it can be ignored:

Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply

You should check the release notes, for any other specific actions needed for a new release. You can then upgrade NGINX:

helm upgrade my-nginx .

FYI: At the bottom of the NGINX install page, there are notes on how to upgrade without downtime.

To uninstall, remove the CRDs and then uninstall with Helm, using the name specified, when the cluster was created:

kubectl delete -f ~/workspace/picluster/nginx-ingress/crds/
helm uninstall my-nginx -n nginx-ingress
Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part IX: Load Balancer and Ingress