May 13

Cluster Upgrade – Challenge

With my cluster running a kubespray version around 2.23.3 and kubernetes 1.28.2, I wanted to give a try at updating my cluster, as there were newer versions available. There were all sorts of problems along the way, so I’ll try to cover what I did, and what (finally) worked.

For reference, my cluster has longhorn storage, prometheus/grafana/loki, metalLB, nginx-ingress, and velero installed, as well.

But, before doing anything, I decided to move things around a bit in my directory structures, so that I didn’t have git repos inside of my ~/workspace/picluster git repo. I created a ~/workspace/kubernetes and placed several directories as peers in that area:

kubernetes
├── grafana-dashboards-kubernetes
├── ingress
├── kubespray
├── mysql
├── nginx-ingress
├── picluster
└── velero

The rest of the components remained in the picluster area:

kubernetes/picluster
├── inventory
├── longhorn
├── metallb
├── minio
├── minio-k8s
├── monitoring
└── playbooks

With this setup, I proceeded to identify what kubespray version to upgrade to, and whether or not this was a multi-version upgrade or not. I found that the latest release tag was 2.24.0, but there were many more commits since then, so I created a tag at my current version (0f243d751), checked out and created a tag at the desired version (fdf5988ea).

Next, I wanted to make sure that all the tools I’m using match what Kubespray is expecting for the commit that I’m using. There is a requirements.txt file that calls out all the versions. I used ‘poetry show’ to see what versions I had, and then used ‘poetry add COMPONENT==VERSION’ with a version to make sure that there were compatible versions. For example:

poetry add ansible==9.5.1

I copied the sample inventory area into my ~/workspace/kubernetes/picluster/inventory area and merged in my existing hosts.yaml, so that I had any customizations that were originally made in k8s-cluster.yml).

With this, I was ready to go to the kubespray directory and do the upgrade using…

cd ~/workspace/kubernetes/kubespray
ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -v --private-key=~/.ssh/id_ed25519 -e upgrade_cluster_setup=true

Initially, I saw that the calico-node pods were stuck in a crash loop…

calico-node: error while loading shared libraries: libpcap.so.0.8: cannot open shared object file: No such file or directory

It turns out that the 2.24.0+ release of kubespray uses calico v3.72.2, which has issues on arm64 processors. The choice was to go to v3.72.0, which apparently has a memory leak, or go to v3.72.3, where the problem with the library was fixed. I decided to do the later, but when I overrode calico_version, the upgrade failed, because there is no checksum for that version.

I found out that in the kubespray area, there is a scripts directory, with a download_hash.sh script, which would read the updated calico_version in ./roles/kubespray-defaults/defaults/main/download.yml and update the roles/kubespray-defaults/defaults/main/checksums.yml file. Well, it wasn’t as easy as that, because I was using a MacBook and the grep command does not have a -P (perl) option, used in the script. So…

I copied the Dockerfile to HashMaker.Dockerfile, and trimmed it to this:

# syntax=docker/dockerfile:1

FROM ubuntu:22.04@sha256:149d67e29f765f4db62aa52161009e99e389544e25a8f43c8c89d4a445a7ca37

ENV LANG=C.UTF-8 \
DEBIAN_FRONTEND=noninteractive \
PYTHONDONTWRITEBYTECODE=1

WORKDIR /kubespray

# hadolint ignore=DL3008
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
apt-get update -q \
&& apt-get install -yq --no-install-recommends \
curl \
python3 \
python3-pip \
sshpass \
vim \
openssh-client \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /var/log/*

RUN --mount=type=bind,source=requirements.txt,target=requirements.txt \
--mount=type=cache,sharing=locked,id=pipcache,mode=0777,target=/root/.cache/pip \
pip install --no-compile --no-cache-dir -r requirements.txt \
&& find /usr -type d -name '*__pycache__' -prune -exec rm -rf {} \;

SHELL ["/bin/bash", "-o", "pipefail", "-c"]

COPY scripts ./scripts

I copied scripts/download_shas.sh to scripts/download_shas_pcm.sh and made these changes (as inside the container there is no git repo:

9c9
< checksums_file="$(git rev-parse --show-toplevel)/roles/kubespray-defaults/defaults/main/checksums.yml"
---
> checksums_file="./roles/kubespray-defaults/defaults/main/checksums.yml"
11c11
< default_file="$(git rev-parse --show-toplevel)/roles/kubespray-defaults/defaults/main/main.yml"
---
> default_file="./roles/kubespray-defaults/defaults/main/main.yml"

With these changes, I did the following to build and run the container, where I could run the scripts/download_hash_pcm.sh script to update the checksum.yml file with the needed checksums…

docker buildx build --platform linux/arm64 -f HashMaker.Dockerfile -t hashmaker:latest .

docker run --rm -it --mount type=bind,source="$(pwd)"/roles,dst=/kubespray/roles --mount type=bind,source="${HOME}"/.ssh/id_ed25519,dst=/root/.ssh/id_ed25519 hashmaker:latest bash
./scripts/download_shas_pcm.sh
exit

(Yeah, I could have invoked the script instead of running bash and then invoking the script inside the container).

With this one would think that we are ready to do the upgrade. Well, I tried, but I hit some other issues…

  • Some nodes were updated to 1.29.3 kubernetes, but some were still at 1.28.2
  • The prometheus/grafana pods were in a crash loop, complaining that there were multiple default datasources.
  • Longhorn was older 1.5.3, and I figured it would be simple to helm upgrade to 1.6.1 – it wasn’t

Someone on Slack said that I need to do the kubespray upgrade with the “-c upgrade_cluster_setup=true” added. I did that, but it did not work and I still have three nodes with 1.29.3 and four with 1.28.2.

I found the problem with the versions. On the four older nodes, at some point kubeadm and/or kubelet were installed (as Ubuntu package). As a result, there was the newer /usr/local/bin/kubelet (v1.29.3), and the package installed /usr/bin/kublet (v1.28.2). For systemd, in addition to the /etc/systemd/system/kubelet.service, which used the /usr/local./bin/kubelet in ExecStart, there was a kubelet.service.d directory with 10-kubeadm.conf file that used /usr/bin/kubelet in ExecStart. This one seemed to take precedence.

To resolve, I removed the Ubuntu kubeadm package, which depended on kubelet, and I removed the kubelet.service.d directory and reloaded systemd. My only guess is that at one point I tried installing kubeadm. Now, upgrades will show all nodes using the newer 1.29.3 kubernetes.

I got into real trouble with this one. I tried deleting pods, removing replicasets that were no longer in use, and then tried to helm upgrade kube-prometheus-stack. That caused even more problems, as the upgrade failed and now I had a whole bunch of failing pods and replicasets not ready. The Prometheus pods were complaining about multiple attachments to the same PV (I was using Longhorn storage). I couldn’t clear the errors and could remove PVCs. I’m not sure if the problem was that I didn’t use all the arguments that I used, when I initially installed Prometheus.

I tried updating Longhorn (pulling the 1.6.1 values.yaml, changing policy from Delete to Retain and type from ClusterIP to NodePort, and then helm update with the modified values.yaml), and that was a mess too. Crash loops, and replicasets not working.

I ended up deleting the cluster entirely. I was concerned that maybe there was an issue with upgrading in general, so I installed the older kubespray/kubernetes cluster, without installing any other components (Longhorn, Prometheus), and did an upgrade. Everything worked fine.

I need to retry this, maybe with the upgrade of Prometheus using the same args as install did. I’m also worried about the multiple attachment issue with the PV.

In the meantime, I wanted to trying updating Longhorn…

With the original, Longhorn was at 1.5.3, and 1.6.1 is available. I had tried a helm upgrade (after I had upgraded the cluster), and had all sorts of problems. So, I created a new cluster, with the latest Kubernetes, made sure everything was up, and then helm installed 1.5.3, using the modified values.yaml I had with Retain policy and NodePort:

helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.5.3 --values values-1.5.3.yaml

I then did a helm upgrade to 1.6.1…

helm upgrade longhorn longhorn/longhorn --namespace longhorn-system --version 1.6.1

There were some pods in crash loops, and items not ready. I deleted the older replicasets. It looked like the deployment had annotation for 1.6.1, but was still calling out an image of 1.5.3. Looking at Longhorn notes, I saw that I could use kubectl to upgrade, and even knowing that I did use Helm install/upgrade before, I decided to try it.

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.1/deploy/longhorn.yaml

There were a bunch of warnings when running the command, but all pods came up and the deployment showed 1.6.1 for the image.

I’m not sure if there was something wrong with doing the helm update, if it was because I was customizing the values.yaml files, or if it was because I was using NodePort. With the kubectl apply, the type was set to clusterIP.

I’ve got more research to do here to isolate this issue.

I tested installing a 1.28.2 cluster, and then upgraded to 1.29.3 (doing just control plane nodes and etcd node first, and then all the worker nodes). Every pod was up and running, daemonsets/replicasets/deployments were all working, and things were looking pretty good.

There were some pre-upgrade replicatesets that were present (no needed/available instances), so I deleted them. I did a snapshot and backup of a Longhorn volume and that worked as well. I do see two problems so far.

First, under Grafana, the data sources were gone. I could not modify the Loki instance (as built-in), but I created another one. The original was giving connection refused errors. I think that the IP it uses, is the old one. There also was no Prometheus data source. I created one, and used the cluster IP and it works as well.

Second, I tried to do a backup of the Kubernetes cluster using Velero, and it failed. I tried viewing the log, but there was none. When checking ‘velero backup-location get’, it shows that the backup location is not available. It seems like various components are using older IPs/ports?

PROBLEM FOUND… It appears that when an upgrade occurs and the coredns version has changed, a new deployment, replicaset, service, and pods are created with the new version AND they get a new nameserver IP (10.133.0.10). However, the existing pods (and new ones created) are still referring to the old nameserver IP (default is 10.133.0.3). There is a service for that old nameserver IP, but it is not resolving addresses. If you do nslookup and specify the new nameserver IP, it will work, but that doens’t help everything that is running or new pods created, which are using the old nameserver IP.

WORKAROUND: If an install (cluster.yml) is done again, using the exact same settings, the first DNS service becomes active again. One can then delete the newly created service, and the unused replicasets. I tried repeating the upgrade, but that did not resolve the issue.

There does appear to be a download of the new coredns and restart of the systemd-resolved service. I don’t know if there is some mechanism to switch pods to use the new IP or if somehow the new service should have replaced the original and use the same IP.

After messing with things over a few weeks I found out quite a bit of things…

CoreDNS: I see that with the newer Kubespray master branch versions, they now have checksums for coredns 3.72.3. AS a result, I don’t need to go through the contortions of creating my own branch of Kubespray and building the checksums or coredns 3.72.3. I just picked a newer commit of Kubespray (not the current tagged version, as it still did not have the checksums for coredns 3.72.3.

Upgrading with CoreDNS changes: I found out that with the newer Ubuntu versions the kernels actually have the “dummy” kernel module. I see it in the current 6.5.0-1015-raspi kernel, and I think it was in 1013 and 1014. The implication of this is that, I was unable, in the past, to enable node local DNS in Kubespray, because this module was needed. After updating the OS on my nodes to have this newer kernel, I could then run Kubespray installs and upgrades with ‘enable_nodelocaldns’ setting and now upgrades had a working DNS, even when the version of coredns changed. There were some replicasets that remained and were not active, but the upgrades are working.

Scheduling Disabled: I was seeing several issues when doing upgrades. In one case, I found that a worker node status that was “Ready”, but had “SchedulingDisabled” indicated. I did a “kubectl uncordon NODENAME” and that enabled scheduling. Not sure why it was not completely upgraded.

Upgrading single node: I found that with Kubespray, you can use the command line argument on upgrade (and other commands) –limit “NODE1,NODE2,NIODE3”, to limit the nodes that are affected by the command to one or more that are specified in the limit clause. However, when I did an upgrade, specifying ONLY a worker node, the process failed at this step:

TASK [kubernetes-apps/network_plugin/multus : Multus | Start resources] ********
fatal: [niobe -> {{ groups['kube_control_plane'][0] }}]: FAILED! => {"msg": "Error in jmespath.search in json_query filter plugin:\n'ansible.vars.hostvars.HostVarsVars object' has no attribute 'multus_manifest_2'"}

The problem is, that I don’t have Multus enabled! It turns out that there is a bug in Kubespray, such that you need to have a control plane node included in the limit clause, so that it will parse that Multus is disable and will not attempt to start it up on the worker node. I just re-ran the upgrade specifying one control plane node (already upgraded) and the worker node I wanted to update..

Node name changes: OK, this was stupid. I named my nodes after characters from the movie “The Matrix” (Apoc, Cypher, Morpheus,…). Since the original install, I’ve been playing with updating Kubespray versions, updating Kubernetes, installing things like Prometheus and Longhorn, and working through the problem I had with CoreDNS version changing during upgrades. Recently, I realized that one of my worker nodes was actually named incorrectly. It was “niobi” and not “niobe”. I changed my inventory and rename the hostname on the node. At one point, I decided to retest upgrades (with the node local DNS enabled). I did this by checking out tags that I had created for my repo and the Kubespray repo, performing a clean install, updating the repos to newer tags or the latest commit, updating the Poetry environment so that the correct tool versions were used with the Kubespray version I was trying, and then doing an upgrade. The upgrade was failing on node “niobe”, and it took me a while to realize that when I did the install, the node was named “niobi”, but when I did the upgrade, it was named “niobe” (with the same IP). The (simple) fix, was to do fix the hostname in the inventory, before doing the initial install.

In the future, I think it is probably best to do the kubernetes/kubespray update separate from other components. In addition, I think the update should be done a node at a time, starting with control plane nodes, and then worker nodes. Kubespray does have a limit option to restrict to a node. They say to run facts.yml to update info on all nodes, update control plane/etcd nodes, and then do worker nodes:

ansible-playbook playbooks/facts.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -e kube_version=v1.29.3 --limit "kube_control_plane:etcd" -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

ansible-playbook upgrade-cluster.yml -b -i ../picluster/inventory/mycluster/hosts.yaml -e kube_version=v1.29.3 --limit "morpheus:niobi:switch" -u ${USER} -b -v --private-key=~/.ssh/id_ed25519

I used this on a re-try of the upgrade and the facts and control plane/etc steps worked fine, but I hit an error in the downloading step for the worker nodes. Just note that, with the current Kubespray, you probably should include one control plane node, when upgrading one or more worker nodes, so that the configuration is handled correctly.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Cluster Upgrade – Challenge
February 25

MySQL With Replicas on Raspberry PI Kubernetes

I Know that I’ll need a database for several projects that I want to run on my Raspberry PI based Kubernetes cluster, so I did some digging for blogs and tutorials on how to set this up.

I found some general articles on how to setup MySQL, and even one that talked about setting up multiple pods so that there are replicas for the database. Cool!

However, I had difficulty in finding information on doing this with ARM64 based processors. I found this link on how to run an MySQL operator and InnoDB with multiple replicas for ARM64 processors, but it had two problems. First, it used a fork of the upstream repository for the MySQL operator and had not been updated in over a year, so images (which were in a repo in that account) were older. Second, it made use of a “mysql-router” image, from a repo in the same account, but it didn’t exist!

So, I spent several days, trying to figure out how to get this to work, and then how to use it with the latest images that are available for ARM64 processors. I could not figure out how to build images from a forked repo, as it seems that the build scripts are setup for Oracle’s CI/CD system and there is no documentation on how to manually build. In any case, using information from this forked repo and after doing a lot of sleuthing, I have it working…

The MySQL Operator repo contains both the operator and the innodbcluster components. They are designed to work with AMD64 based processors, and there is currently no ARM64 support configured. When I asked on the MySQL operator Slack channel as of the February 2024, they indicated that the effort to support ARM64 has stalled, so I decided to figure out how to use this repo, customizing it to provide the needed support.

I used Helm versus manifests, to set things up. First, I setup an area to work and prepared to access my Raspberry PI Kubernetes cluster

cd ~/workspace/picluster
poetry shell

mkdir mysql
cd mysql

Add the mysql-operator repo:

helm repo add mysql-operator https://mysql.github.io/mysql-operator/
helm repo update

The operator chart can now be installed, but we need to tell it to use an ARM64 image of the Oracle community version of the operator. Here are the available operator versions to choose from. I’ll use the 8.3.0-2.1.2-aarch64 version:

helm install django-mysql-operator mysql-operator/mysql-operator -n mysql-operator --create-namespace --set image.tag="8.3.0-2.1.2-aarch64"

This creates a bunch of resources and most noticeable, a deployment, replica set, and pod for the operator, in the mysql-operator namespace. The name, django-mysql-operator’ is arbitrary. Check to make sure everything is running with:

kubectl get all -n mysql-operator
NAME                                  READY   STATUS    RESTARTS   AGE
pod/mysql-operator-6cc67fd566-v64dp   1/1     Running   0          7h21m

NAME                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/mysql-operator   ClusterIP   10.233.19.231   <none>        9443/TCP   7h21m

NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mysql-operator   1/1     1            1           7h21m

NAME                                        DESIRED   CURRENT   READY   AGE
replicaset.apps/mysql-operator-6cc67fd566   1         1         1       7h21m

Next, we can install the helm chart for the MySQL InnoDBCluster. Again, we need to select from available ARM64 versions for the community operator, community router (should be able to use same version), and MySQL server (pick a tag that supports both AMD64 and ARM64 – I used 8.0). Since there are so many changes, we’ll use a values.yaml file, instead of command line –set arguments.

We can get the current values.yaml file with:

helm show values mysql-operator/mysql-innodbcluster > innodb-values.yaml

In that file, you can see the defaults that would be applied, like number of replicas, and can do some additoinal customizations too. In all cases, if you use a values.yaml file, you MUST provide a root password. For our case, we select to use self signed certificates, and specify arm images for the container, sidecar, and a bunch of init containers. Here are just the changes needed, using the versions I chose at the time of this writing:

cat innodb-values.yaml
credentials:
  root:
    password: "PASSWORD YOU WANT"
# routerInstances: 1
# serverInstances: 3
tls:
  useSelfSigned: true
podSpec:
  initContainers:
    - name: fixdatadir
      image: container-registry.oracle.com/mysql/community-operator:8.3.0-2.1.2-aarch64
    - name: initconf
      image: container-registry.oracle.com/mysql/community-operator:8.3.0-2.1.2-aarch64
    - name: initmysql
      image: mysql/mysql-server:8.0
  containers:
    - name: mysql
      image: mysql/mysql-server:8.0
    - name: sidecar
      image: container-registry.oracle.com/mysql/community-operator:8.3.0-2.1.2-aarch64
router:
  podSpec:
    containers:
      - name: router
        image: container-registry.oracle.com/mysql/community-router:8.3.0-aarch64

Using this file, we can create the pods for the three MySQL pods using the command:

helm install django-mysql mysql-operator/mysql-innodbcluster -f innodb-values.yaml

It’ll create a deployment, replica, a stateful set, services, three pods, along with three PVs and PVCs, and a new innodbcluster resource and instance. The name provided ‘django-mysql’, will be the prefix for resources. They will take a while to come up, so have patience. Once the pods and statefulset are up, you see a router pod created and started:

$ kubectl get all
NAME                                       READY   STATUS    RESTARTS       AGE
pod/django-mysql-0                         2/2     Running   0              6h55m
pod/django-mysql-1                         2/2     Running   0              6h55m
pod/django-mysql-2                         2/2     Running   0              6h55m

NAME                             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                                                                    AGE
service/django-mysql             ClusterIP   10.233.59.48   <none>        3306/TCP,33060/TCP,6446/TCP,6448/TCP,6447/TCP,6449/TCP,6450/TCP,8443/TCP   6h55m
service/django-mysql-instances   ClusterIP   None           <none>        3306/TCP,33060/TCP,33061/TCP                                               6h55m

NAME                                         DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/longhorn-iscsi-installation   7         7         7       7            7           <none>          51d

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/django-mysql-router   1/1     1            1           6h55m

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/django-mysql-router-696545f47b   1         1         1       6h55m

NAME                            READY   AGE
statefulset.apps/django-mysql   3/3     6h55m

When everything is running, you can access the zero instance of the MySQL pod with:

kubectl exec -it pod/django-mysql-0 -c mysql -- /bin/bash
bash-4.4$ mysqlsh -u root -p

CREATE DATABASE IF NOT EXISTS todo_db;
USE todo_db;
CREATE TABLE IF NOT EXISTS Todo (task_id int NOT NULL AUTO_INCREMENT, task VARCHAR(255) NOT NULL, status VARCHAR(255), PRIMARY KEY (task_id));
INSERT INTO Todo (task, status) VALUES ('Hello','ongoing');

Enter in the password you defined in the innodb-values.yaml and you can now create a database, tables, and populate table entries. If you exec into one of the other MySQL pods, the information will be there as well, but will be read-only.

there are other customizations, like changing the number of replicas, the size of the PVs used, etc.

You can reverse the process, by first deleting the MySQL InnoDBCluster:

helm delete django-mysql

Wait until the pods are gone (it takes a while), and then delete the MySQL operator:

helm delete django-mysql-operator -n mysql-server

That should get rid of everything, but if, not here are other things that you can delete. Note: My storage class, Longhorn, is set to retain the PVs, so they must be manually deleted (I can’t think of an easier way):

kubectl delete sa default -n mysql-operator
kubectl delete sa mysql-operator-sa -n mysql-operator

kubectl delete pvc datadir-django-mysql-0
kubectl delete pvc datadir-django-mysql-1
kubectl delete pvc datadir-django-mysql-2
kubectl delete pv `kubectl get pv -A -o jsonpath='{.items[?(@.spec.claimRef.name=="datadir-django-mysql-0")].metadata.name}'`
kubectl delete pv  `kubectl get pv -A -o jsonpath='{.items[?(@.spec.claimRef.name=="datadir-django-mysql-1")].metadata.name}'`
kubectl delete pv  `kubectl get pv -A -o jsonpath='{.items[?(@.spec.claimRef.name=="datadir-django-mysql-2")].metadata.name}

I would like to figure out how to create a database and user, as part of the pod creation process, rather than having to exec into the pod and use mysql or mysqlsh apps.

I’d really like to be able to specify a secret for the root password, instead of including it into a vales.yaml file.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on MySQL With Replicas on Raspberry PI Kubernetes
February 12

S3 Storage In Kubernetes

In Part VII: Cluster Backup, I set up Minio running on my laptop to provide S3 storage that Velero can use to backup the cluster. In this piece, Minio will be setup “in cluster”, using Longhorn. There are a few links discussion how to do this. I didn’t try this method, but did give this a go (with a bunch of modifications), and am documenting it here.

For starters, I’m using the Helm chart for Minio from Bitnami:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

We’ll grab the configuration settings so that they can be modified:

mkdir -p ~/workspace/picluster/minio-k8s
cd ~/workspace/picluster/minio-k8s
helm show values bitnami/minio > minio.yaml

Create a secret to be used to access Minio:

kubectl create secret generic minio-root-user --namespace minio --from-literal=root-password="DESIRED-PASSWORD" --from-literal=root-user="minime"

In minio.yaml, set auth existingSecret to “minio-root-user” so that the secret will be used for authentication, set defaultBucket to “kubernetes”, and set service type to “NodePort”. The Minio deployment can be created:

helm install minio bitnami/minio --namespace minio --values minio.yaml

The Minio console can be accessed by using a browser, a node’s IP and the NodePort port:

kubectl get svc -n minio
NAME    TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                         AGE
minio   NodePort   10.233.60.69   <none>        9000:32602/TCP,9001:31241/TCP   78m

In this case, using a one of the node’s (10.11.12.190) http://10.11.12.190:31241. Use the username and password you defined above, when creating the secret.

Now, we can install Velero, using the default bucket we had created (one could create another bucket from the Minio UI), credentials file, and cluster IP for the Minio service:

cat minio-credentials
[default]
aws_access_key_id = minime
aws_secret_access_key = DESIRED-PASSWORD

velero install \
     --provider aws \
     --plugins velero/velero-plugin-for-aws:v1.8.2 \
     --bucket kubernetes \
     --secret-file minio-credentials \
     --use-volume-snapshots=false \
     --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://10.233.60.69:9000

The backup location can be checked (and should be available):

velero backup-location get
NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
default   aws        kubernetes      Available   2024-02-12 20:43:23 -0500 EST   ReadWrite     true

Finally, you can test the backup and restore of a single deployment (using the example from Part VII, where we pulled the velero repo, which has an example NGINX app):

kubectl create namespace nginx-example
kubectl create deployment nginx --image=nginx -n nginx-example

velero backup create nginx-backup --selector app=nginx
velero backup describe nginx-backup
velero backup logs nginx-backup

kubectl delete namespace nginx-example

velero restore create --from-backup nginx-backup
velero restore describe nginx-backup-20240212194128

kubectl delete namespace nginx-example
velero backup delete nginx-backup
velero restore delete nginx-backup

There is a Minio client, although it seems to be designed for use with a cloud based back-end or local installation. It has predefined aliases for Minio, and is designed to run and terminate on each command. Unfortunately, we need to set a new alias, so that it can be used with later commands. We can hack a way into use it.

First, we need to know the Cluster IP address of the Minio service, so that it can be used later:

kubectl get svc -n minio
NAME    TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)                         AGE
minio   NodePort   10.233.60.69   <none>        9000:32602/TCP,9001:31241/TCP   78m

We get the user/password, and then run the client so that an alias (using cluster IP 10.233.60.69, in this case) can be created and commands invoked.

export ROOT_USER=$(kubectl get secret --namespace minio minio-root-user -o jsonpath="{.data.root-user}" | base64 -d)
export ROOT_PASSWORD=$(kubectl get secret --namespace minio minio-root-user -o jsonpath="{.data.root-password}" | base64 -d)

kubectl run --namespace minio minio-client \
     --tty -i --rm --restart='Never' \
     --env MINIO_SERVER_ROOT_USER=$ROOT_USER \
     --env MINIO_SERVER_ROOT_PASSWORD=$ROOT_PASSWORD \
     --env MINIO_SERVER_HOST=minio \
     --image docker.io/bitnami/minio-client:2024.2.9-debian-11-r0 -- \
    /bin/bash
mc alias set myminio http://10.233.60.69:9000 $MINIO_SERVER_ROOT_USER $MINIO_SERVER_ROOT_PASSWORD 
mc admin info myminio
...
Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on S3 Storage In Kubernetes
February 3

More Power! Adding nodes to cluster

I’ll document the process I used to add two more Raspberry Pi 4s to the cluster that I’ve created in this series.

Preparing The PIs

With two new Raspberry PI 4s, PoE+ hats, SSD drives (2 TB this time), and two more UCTRONICS RM1U-3 trays each with an OLED display, power button, SATA Shield card, and USB3 jumper, I set out to assemble the trays and image them with Ubuntu.

Everything went well, assembling the trays with the Raspberry PIs. In turn, I connected a keyboard, HDMI display, Ethernet cable, and power adapter (as I don’t have PoE hub in my study). Once booted, I followed the steps in Part II of the series, however there were some issues getting the OS installed.

First, the Raspberry PI Imager program has been updated to support PI 5s, so there were multiple menus, tabbed fields, etc. I decided to connect a mouse to the Raspberry PI, rather then enter a maze of tabs and enters and arrows to try to navigate everywhere.

Second, when I went to select the Storage Device, the SSD drive was not showing up. I didn’t know if this was an issue with the UCTRONICS SATA Shield, the different brand of drive, the larger capacity, the newer installer, or the Raspberry PI itself. I did a bunch of different things to try to find out the root cause, and finally found out that to make this work, I needed to image the SSD drive using the Raspberry PI Imager on my Mac, using a SATA to USB adapter, and then place it into the UCTRONICS tray along with the Raspberry PI and it would then boot to the SSD drive.

Third, for one of the two Raspberry PIs, this still did not work, and I ended up installing the Raspberry PI OS on an SD card, update the EEPROM and bootloader, and then net booted the Raspberry PI Installer, and then I was able to get the Raspberry PI to boot from the SSD drive. Probably a good idea to update the EEPROM and bootloader to the latest anyway.

Initial Setup

Like done in Part II of the series, I picked IP addresses for the two units, added their MAC addresses into my router so that those IPs were reserved, added the host names to my local DNS server, and create SSH keys for each and used “ssh-copy-id” to copy those keys to all the other nodes and my Mac, and vice versa. Connectivity was all set.

I decided NOT to do the repartitioning mentioned in Part III, and instead leave the drive as one large 2TB (1.8TB actually) drive. My hope is that with Kubernetes, I can monitor problems, so if I see log files getting out of hand, I can deal with it, rather than having fixed paritions for /tmp, /var, /home, etc.  I did create a /var/lib/longhorn directory – not sure if Longhorn would create this automatically.

Node Prep

With SSH access to each of the PIs, I could run through the same Ansible scripts that were used to setup all the other nodes as outlined in Part IV. Before running the scripts, I added the two nodes (morpheus, switch) to the hosts.yaml file in the inventory as worker nodes. There are currently, three master nodes, and four worker nodes.

When running these ansible scripts, I specified both hosts at once, rather than doing one at a time. For example:

cd ~/workspace/picluster
ansible-playbook -i "morpheus,switch" playbooks/passwordless_sudo.yaml -v --private-key=~/.ssh/id_ed25519 --ask-become-pass
ansible-playbook -i "morpheus,switch" playbooks/ssh.yaml -v --private-key=~/.ssh/id_ed25519
...

Now that the nodes are ready, they can be added to the cluster. For a control plane node, the cluster.yaml script is used:

cd ~/workspace/picluster/kubespray
ansible-playbook -i ../inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 cluster.yml

Then, on each node, restart the NGINX proxy pod with:

crictl ps | grep nginx-proxy | awk '{print $1}' | xargs crictl stop

In our case, these will be worker nodes, and would be added with these commands (using limit so other nodes are not affected:

ansible-playbook -i ../inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 --limit=morpheus scale.yml
ansible-playbook -i ../inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 --limit=switch scale.yml

These two nodes added just fine, with the Kubernetes version v1.28.5, just like the control plane node I added before (my older nodes are still v1.28.2, but not sure how to update them currently).

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on More Power! Adding nodes to cluster
January 21

Part X: OpenLENS

Ref: https://github.com/MuhammedKalkan/OpenLens

LENS gives you a way to look at numerous things in your cluster. It consists of the OpenLENS repository, with the core libraries developed by Team LENS and the community. There are other (some commercial) tools, like the IDE, which are built on top of OpenLENS. There are binaries of free OpenLENS product and the easiest way on the Mac is to use brew to install:

brew install --cask openlens

You can then run the app and connect to your Kubernetes cluster, by clicking on the “Browse Clusters In The Catalog” button on the home screen. It will show credentials from your ~/.kube directory, and since we installed a cluster and copied over the config to ~/.kube/config, you should see that listed.

You’ll be able to see a summary of the cluster (CPU, memory, pods), along with a list of resources that you can select on the left side of the window:

There are items to view the nodes, pods, secrets, network services, persistent volume claims, Helm charts, cluster roles, custom resource definitions (CRDs), etc. Clicking on an item will allow you to see all the details, and give you the ability to edit the item.

For example, here is part of the screen for the Loki service:

Showing you labels, annotations, IP info, and access info for the service. You can click on the ports link, to access the service.

Here is the Prometheus Helm chart:

It shows the version and a description. If you were to scroll down, you can see information about the Prometheus Helm repo, and how to install, uninstall, and upgrade the chart.

If you were to check on the Helm Releases, and pick an item, like Prometheus shown below, you can see all the settings:

In summary, LENS gives you a bunch of visibility into the cluster, from one point.

FYI, the Github page for OpenLENS mentions that after 6.3.0, some extensions are not included, but that you can go to the extensions menu and enter in “@alebcay/openlens-node-pod-menu” and install those extensions. I did that and the status of the extensions flipped between enable/disable for quite a while. I exited the app, restarted, and then went to extensions and Enabled this extension.

After, I did see that when I viewed a node, and selected a pod, the menu that allows you to edit and delete the pod, now also has buttons that allow you to attach to the log (didn’t seem to work), shell into log, and view the logs for the containers in the pod. Pretty handy features.

Category: Kubernetes | Comments Off on Part X: OpenLENS
January 19

Part IX: Load Balancer and Ingress

Ref: https://metallb.universe.tf/

In lieu of having a physical load balancer, this cluster will use MetalLB as a load balancer. In my network, I have a block of IP addresses reserved for DHCP, and picked a range of IPs to use for load balancer IPs in the cluster.

The first thing to do, is to get the latest release of MetalLB:

cd ~/workspace/picluster
poetry shell

mkdir -p ~/workspace/picluster/metallb
cd ~/workspace/picluster/metallb
MetalLB_RTAG=$(curl -s https://api.github.com/repos/metallb/metallb/releases/latest|grep tag_name|cut -d '"' -f 4|sed 's/v//')
echo $MetalLB_RTAG
0.13.12

Obtain the version, install it, and wait for everything to come up:

wget https://raw.githubusercontent.com/metallb/metallb/v${MetalLB_RTAG}/config/manifests/metallb-native.yaml -O metallb-native-${MetalLB_RTAG}.yaml

kubectl apply -f metallb-native-${MetalLB_RTAG}.yaml
kubectl get pods -n metallb-system --watch
kubectl get all -n metallb-system

Everything should be running, but needs to be configured for this cluster. Specifically, we need to setup and advertise the address pool(s), which can be a CIDR, address range, and IPv4 and/or IPv6 addresses. For our case, I’m reserving 10.11.12.201 – 10.11.12.210 for load balancer IPs and using L2 advertisement (ipaddress_pool.yaml):

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: production
  namespace: metallb-system
spec:
  addresses:
  - 10.11.12.201-10.11.12.210
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: l2-advert
  namespace: metallb-system

Apply this configuration, and examine the configuration:

kubectl apply -f ipaddress_pools.yaml
ipaddresspool.metallb.io/production created
l2advertisement.metallb.io/l2-advert created

kubectl get ipaddresspools.metallb.io -n metallb-system
NAME         AUTO ASSIGN   AVOID BUGGY IPS   ADDRESSES
production   true          false             ["10.11.12.201-10.11.12.210"]

kubectl get l2advertisements.metallb.io -n metallb-system
NAME        IPADDRESSPOOLS   IPADDRESSPOOL SELECTORS   INTERFACES
l2-advert

kubectl describe ipaddresspools.metallb.io production -n metallb-system
Name:         production
Namespace:    metallb-system
Labels:       <none>
Annotations:  <none>
API Version:  metallb.io/v1beta1
Kind:         IPAddressPool
Metadata:
  Creation Timestamp:  2024-01-17T19:05:29Z
  Generation:          1
  Resource Version:    3648847
  UID:                 38491c8a-fdc1-47eb-9299-0f6626845e82
Spec:
  Addresses:
    10.11.12.201-10.11.12.210
  Auto Assign:       true
  Avoid Buggy I Ps:  false
Events:              <none>

Note: if you don’t want IP addresses auto-assigned, you can add the clause “autoAssign: false”, to the “spec:” section of the IPAddressPool.

To use the load balancer, you can change the type under the “spec:” section from ClusterIP or NodePort to LoadBalancer, by editing the configuration. For example, to change Grafana from NodePort to LoadBalancer, one would use the following to edit the configuration:

kubectl edit -n monitoring svc/prometheusstack-grafana

This is located at the bottom of the file:

...
spec:
  clusterIP: 10.233.22.171
  clusterIPs:
  - 10.233.22.171
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-web
    nodePort: 32589
    port: 80
    protocol: TCP
    targetPort: 3000
  selector:
    app.kubernetes.io/instance: prometheusstack
    app.kubernetes.io/name: grafana
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

When you show the service, you’ll see the load balancer IP that was assigned:

kubectl get svc -n monitoring prometheusstack-grafana
NAME                      TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)        AGE
prometheusstack-grafana   LoadBalancer   10.233.22.171   10.11.12.201   80:32589/TCP   5d23h

Here is a sample deployment (web-demo-test.yaml) to try. IUt has the LoadBalancer type specified:

apiVersion: v1
kind: Namespace
metadata:
  name: web
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
  namespace: web
spec:
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: httpd
        image: httpd:alpine
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: web-server-service
  namespace: web
spec:
  selector:
    app: web
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer

Apply the configuration and check the IP address:

kubectl apply -f web-app-demo.yaml
kubectl get svc -n web

From the command line, you can do “curl http://IP_ADDRESS” to make sure it works. If you want a specific IP address, you can change the above web-app-demo.yaml to add the following line after the type (note the same indentation level):

  type: LoadBalancer
  loadBalancerIP: 10.11.12.205

Before removing MetalLB, you should change any services that are using it, to go back to NodePort or ClusterIP as the type. Then, delete the configuration:

kubectl delete -f metallb-native-${MetalLB_RTAG}.yaml

Ref: https://docs.nginx.com/nginx-ingress-controller/technical-specifications/

Ref: https://kubernetes.github.io/ingress-nginx/deploy/

With Load Balancer setup and running, we’ll create an Ingress controller using NGINX. You can view the compatibility chart here to select the NGINX version desired. For our purposes, we’ll use helm chart install, so that we have sources and can delete/update CRDs. I’m currently running Kubernetes 1.28, so either 1.02 or 1.1.2 of the Helm Chart. Let’s pull the charts for 1.1.2:

cd ~/workspace/picluster/
helm pull oci://ghcr.io/nginxinc/charts/nginx-ingress --untar --version 1.1.2
cd nginx-ingress

Install NGINX Ingress with:

helm install my-nginx --create-namespace -n nginx-ingress .
NAME: my-nginx
LAST DEPLOYED: Fri Jan 19 11:14:16 2024
NAMESPACE: nginx-ingress
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The NGINX Ingress Controller has been installed.

If you want to customize settings, you can add the “–values values.yaml” argument, after first getting the list of options using the following command, and then modifying them:

helm show values ingress-nginx --repo https://kubernetes.github.io/ingress-nginx > values.yaml

The NGINX service will have an external IP address, as the type is LoadBalancer, and MetalLB will assign an address from the pool (note: you can specify an IP to use in values.yaml).

To test this out, we’ll great a web based app:

kubectl create deployment demo --image=httpd --port=80
kubectl expose deployment demo

We can then create an ingress entry for a local (dummy) domain and forward port 8080 to the default port (80) for the app:

kubectl create ingress demo-localhost --class=nginx --rule="demo.localdev.me/*=demo:80"
kubectl port-forward --namespace=nginx-ingress service/my-nginx-nginx-ingress-controller 8080:80 &

To test this out, you can try accessing the URL:

curl http://demo.localdev.me:8080
Handling connection for 8080
<html><body><h1>It works!</h1></body></html>

If you have a publicly visible domain, you can forward that to the app. I have not tried it, but it looks like the ingress command would look like:

kubectl create ingress demo --class=nginx  --rule YOUR.DOMAIN.COM/=demo:80

Here is an example if doing path based routing of requests. First, create two pods and services that would handle request:

In apple.yaml:

kind: Pod
apiVersion: v1
metadata:
name: apple-app
labels:
app: apple
spec:
containers:
- name: apple-app
image: hashicorp/http-echo
args:
- "-text=apple"

---

kind: Service
apiVersion: v1
metadata:
name: apple-service
spec:
selector:
app: apple
ports:
- port: 5678 # Default port for image
In banana.yaml:
kind: Pod
apiVersion: v1
metadata:
name: banana-app
labels:
app: banana
spec:
containers:
- name: banana-app
image: hashicorp/http-echo
args:
- "-text=banana"

---

kind: Service
apiVersion: v1
metadata:
name: banana-service
spec:
selector:
app: banana
ports:
- port: 5678 # Default port for image

Then, create an ingress-demo.yaml that will redirect requests:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
spec:
rules:
- http:
paths:
- path: /apple
pathType: Prefix
backend:
service:
name: apple-service
port:
number: 5678
- path: /banana
pathType: Prefix
backend:
service:
name: banana-service
port:
number: 5678

Apply the three YAML files. To test, you can access Ingress service IP (10.11.12.201 in this example) or any node IP with the prefix:

curl http://10.11.12.201/apple
apple
curl http://10.11.12.201/banana
banana
curl http://10.11.12.201/unknown
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

This redirects the request to the apple service/pod.

To remove NGINX Ingress, you can use “helm delete”.

You must manually update the CRDs, before upgrading NGINX. Pull the new release and then apply the updated CRDs:

cd ~/workspace/picluster
helm pull oci://ghcr.io/nginxinc/charts/nginx-ingress --untar --version VERSION_DESIRED
cd nginx-ingress
kubectl apply -f crds/

You may see this warning, but it can be ignored:

Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply

You should check the release notes, for any other specific actions needed for a new release. You can then upgrade NGINX:

helm upgrade my-nginx .

FYI: At the bottom of the NGINX install page, there are notes on how to upgrade without downtime.

To uninstall, remove the CRDs and then uninstall with Helm, using the name specified, when the cluster was created:

kubectl delete -f ~/workspace/picluster/nginx-ingress/crds/
helm uninstall my-nginx -n nginx-ingress
Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part IX: Load Balancer and Ingress
January 18

Part VIII: Prometheus/Grafana and Loki

For monitoring of the cluster and logs, we’ll setup several tools…

First, we’ll create a login account for accessing (Grafana) UI. Create area and build password to store in a secret:

cd ~/workspace/picluster
poetry shell

mkdir -p ~/workspace/picluster/monitoring/kube-prometheus-stack
cd ~/workspace/picluster/monitoring/kube-prometheus-stack/
kubectl create namespace monitoring
echo -n ${USER} > ./admin-user
echo -n 'PASSWORD' > ./admin-password # Change to desired password
kubectl create secret generic grafana-admin-credentials --from-file=./admin-user --from-file=admin-password -n monitoring
rm admin-user admin-password

Add the helm repo:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Install kube-prometheus-stack with the credentials we selected, and using Longhorn for persistent storage (allocating 50GB):

helm install prometheusstack prometheus-community/kube-prometheus-stack --namespace monitoring \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName="longhorn" \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.accessModes[0]="ReadWriteOnce" \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage="50Gi" \
  --set grafana.admin.existingSecret=grafana-admin-credentials

Wait for everything to come up with either of these:

kubectl --namespace monitoring get pods -l "release=prometheusstack"
kubectl get all -n monitoring

At this point, you can look at the Longhorn console, and see that there is a 50GB volume created for Prometheus/Grafana. As with any Helm install, you can get the values used for the chart with the following command and then do a helm update with the -f option and the updated yaml file:

helm show values prometheus-community/kube-prometheus-stack > values.yaml

Next, change the Grafana UI from ClusterIP to NodePort (or LoadBalancer, if you have set that up):

kubectl edit -n monitoring svc/prometheusstack-grafana

From a browser, you can use a node port IP and the port shown in the service output to access the UI and log in with the credentials you created in the above steps:

NAME                                               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-operated                      ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   5m34s
service/prometheus-operated                        ClusterIP   None            <none>        9090/TCP                     5m33s
service/prometheusstack-grafana                    NodePort    10.233.22.171   <none>        80:32589/TCP                 5m44s
...

For example, “http://10.11.12.190:32589” in this example. There will already be a data source set up for Prometheus, and you can use this to examine the cluster. Under Dashboards, there are some predefined dashboards, and you can also make your own and obtain others and import them.

I found this one, from David Calvert (https://github.com/dotdc/grafana-dashboards-kubernetes.git), with some nice dashboards for nodes, pods, etc. I cloned the repo to my monitoring directory, and then from the Grafana UI, clicked on the plus sign at the top of the main Grafana page and selected “Import Dashboard”, clicked on the dag/drop pane, navigated to one of the dashboard json files – the k8s-views-global.json is nice, selected the predefined “Prometheus” data source, and clicked “Import”. This gives a screen with info on the nodes, network, etc.

TODO: Setting up Prometheus to use HTTPS only.

For log aggregation, we can install Loki, persisting information to longhorn. I must admit, I struggled getting this working and, although repeatable, there may be better ways to get this working:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install loki grafana/loki-stack \
  --namespace monitoring \
  --set loki.persistence.enabled=true \
  --set loki.persistence.storageClassName=longhorn \
  --set loki.persistence.size=20Gi \
  --set 'promtail.tolerations[0].key=CriticalAddonsOnly' \
  --set 'promtail.tolerations[0].operator=Exists' \
  --set 'promtail.tolerations[0].effect=NoExecute' \
  --set 'promtail.tolerations[1].key=node-role.kubernetes.io/control-plane' \
  --set 'promtail.tolerations[1].operator=Exists' \
  --set 'promtail.tolerations[1].effect=NoSchedule'

You can check the monitoring namespace, to wait for the promtail pods to be running on each node. Once running, you can access the Grafana UI and create a new datasource, selecting the type “Loki”. For the URL, use “http://loki:3100” and click save and test. I’m not sure why the Helm install didn’t automatically create the source, and why this manual source creation fails on the “test” part of the save and test, but the source is now there and seems to work.

To use, you can go to the Explore section, and provide a query. With the “builder” (default query mode), you can select the label type (e.g. “pod”) and then the instance you are interested in. You can also change from “builder” to “code” and enter this as the query and run the query:

{stream=”stderr”} |= `level=error`

This will show error logs from all nodes. For example (clicking on “>” symbol at left to expand entry to show the fields:

Another query that reports errors over a period of five minutes is:

count_over_time({stream="stderr"} |= `level=error` [5m])

UPDATE 1/28/2025: With a recent reinstall attempt, I was seeing an error with the loki pod in crash loop. I found out two things. First, it doesn’t look like the loki-stack install specifies an image version, and there can be a compatibility issue with Grafana. I looked tags on the loki-stack Github page, and tried the install with the latest version, by adding the argument:

--set 'image.tag=2.10.2'

This worked for me. The second is that I read that the loki-stack Helm chart is no longer supported and deprecated, and there was suggestions to use the newer Loki installer page, which has three versions – monolith for small scale testing, simple scalable for small to moderate sized clusters, and microservices for larger enterprise style clusters.

The issue is (to me) that it seems like Loki is designed to be used with cloud storage systems or, in the case of the small setups, with MinIO local filesystem S3 storage. It is also much more complex (and I guess flexible) with configurable replicas for various components.

I gave it a try and had issues with the gateway and chunks cache pods not coming up. There was some mention about Loki being designed for use with kube-dns and not the newer CoreDNS that Kubernetes uses. I gave up, and decided, for now, not to try to figure out how I could use it with Longhorn storage, like I have with loki-stack.

You can customize the promtail pod’s configuration file so that you can do queries on custom labels, instead of searching for specific text in log messages. To do that, we first obtain the promtail configuration:

cd ~/workspace/picluster/monitoring
kubectl get secret -n monitoring loki-promtail -o jsonpath="{.data.promtail\.yaml}" | base64 --decode > promtail.yaml

Edit this file, and under the “scrape_configs” section, you will see “pipeline_stages”:

scrape_configs:
  # See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
  - job_name: kubernetes-pods
    pipeline_stages:
      - cri: {}
    kubernetes_sd_configs:

We will add a new stage called “- match”, under the “- cri:” line. To do the matching for a app called “api” we use TODO: Have a app with JSON and then describe the process. Use https://www.youtube.com/watch?v=O52dseg2bJo&list=TLPQMjkxMTIwMjOvWB8m2JEG4Q&index=7 for reference.

To remove Prometheus and Grafana you must remove several CRDs,helm uninstall, and remove the secret:

kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd probes.monitoring.coreos.com
kubectl delete crd prometheusagents.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd scrapeconfigs.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com

helm uninstall -n monitoring prometheus-stack
kubectl delete secret -n monitoring grafana-admin-credentials

Top remove Loki, you can helm uninstall:

helm uninstall -n monitoring loki

To clean up anything else that remains, you can remove the namespace:

kubectl delete ns monitoring
Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part VIII: Prometheus/Grafana and Loki
January 16

Part VII: Cluster Backup

We will use Velero to perform full or partial backup and restore of the cluster and will use Minio to provide a local S3 storage areas on another computer on the network (I just used my Mac laptop, but you could use a server or multiple servers instead), to be able to do scheduled backups with Velero.

On the Mac, you can use brew to install Minio and the Minio client (mc):

brew upgrade
brew install minio/stable/minio
brew install minio/stable/mc

Create an area for data storage and an area for Minio:

cd ~/workspace/picluster
poetry shell
cd mkdir -p ~/workspace/minio-data
mkdir -p ~/workspace/picluster/minio

In the ~/workspace/picluster/minio/minio.cfg create this file with the user/password desired(default is minioadmin/minioadmin) and host name (or IP) where server will run (in this example, I use my laptop name):

# MINIO_ROOT_USER and MINIO_ROOT_PASSWORD sets the root account for the MinIO server.
# This user has unrestricted permissions to perform S3 and administrative API operations on any resource in the deployment.
# Omit to use the default values 'minioadmin:minioadmin'.
# MinIO recommends setting non-default values as a best practice, regardless of environment

MINIO_ROOT_USER=minime
MINIO_ROOT_PASSWORD=PASSWORD_YOU_WANT_HERE

# MINIO_VOLUMES sets the storage volume or path to use for the MinIO server.

MINIO_VOLUMES="~/workspace/minio-data"

# MINIO_SERVER_URL sets the hostname of the local machine for use with the MinIO Server
# MinIO assumes your network control plane can correctly resolve this hostname to the local machine

# Uncomment the following line and replace the value with the correct hostname for the local machine and port for the MinIO server (9000 by default).

MINIO_SERVER_URL="http://triunity.home:9000"

Note: There is no way to change the user/password, from the console later.

Create a minio-credentials file with the same user name and password as was done in the minio.cfg file:

[default]
aws_access_key_id = minime
aws_secret_access_key = SAME_PASSWORD_AS_ABOVE

I did “chmod 700” for both minio.cfg and minio-credentials.

Next, create a script(minio-server-start) to start up Minio with the desired settings:

export MINIO_CONFIG_ENV_FILE=./minio.cfg
minio server --console-address :9090 &

When you run this script, it will output will indicate a warning that the local host has all the data and a failure will cause loss of data (duh). It will show the URL for API (port 9000) and console (port 9090), along with the username and password to access. Near the bottom, it will show you an alias command that you should copy and paste. It names the server and provides credentials info. It looks like:

mc alias set 'myminio' 'http://trinity.home:9000' 'minime' 'THE_PASSWORD_FROM CONFIG'

Then do the following to make sure that the server is running the latest code:

mc admin update myminio

In your browser, go to the URL and log in with the username/password. Under Administrator -> Buckets menu on the left panel, create a bucket called “kubernetes”. I haven’t tried, but you can turn on versioning, object locking, and quota.

Ref: https://velero.io/docs/main/contributions/minio/

For the Mac, use brew to install Velero and either note the version or check with the list command (in my case it has 1.12.3):

brew install velero
brew list velero

You can check compatibility of the Velero version you have and the kubernetes version running (and adjust the version used by brew, if needed). The matrix is here. (Optionally) Pull the Velero sources from git, so that we can use examples and have documentation:

cd ~/workspace/picluster
git clone https://github.com/vmware-tanzu/velero.git
cd velero

In the README.md, it will have version compatibility info.

It indicates that velero 1.12.x works with Kubernetes 1.27.3 and 1.13.x with Kubernetes 1.28.3. We have 1.28 Kubernetes, but there is no brew version for Velero 1.13 right now, so we’ll hope 1.12.3 Velero works.

We need the Velero plugin for AWS. The plugins are shown here. For Velero 1.12.x, we need AWS plugin 1.8.x. The plugin tags show that v1.8.2 is the latest.

Next, start up Velero, specifying the plugin version to use, the bucket name you created in Minio (“kubernetes”), the credentials file, Minio as the S3 storage, and the Minio API URL (your host name with port 9000):

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.8.2 \
    --bucket kubernetes \
    --secret-file ~/workspace/picluster/minio/minio-credentials \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://trinity.home:9000

It will display that Velero is installed and that you can use “kubectl logs deployment/velero -n velero” to see the status.

Check that the backup location is available with:

velero backup-location get
NAME      PROVIDER   BUCKET/PREFIX   PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
default   aws        kubernetes      Available   2024-01-16 13:15:52 -0500 EST   ReadWrite     true

If you have the Velero git repo pulled, as mentioned above, you can start an example:

cd ~/workspace/picluster/velero
kubectl apply -f examples/nginx-app/base.yaml

kubectl get all -n nginx-example
NAME                                    READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-75b696dc55-25d6r   1/1     Running   0          66s
pod/nginx-deployment-75b696dc55-7h5zx   1/1     Running   0          66s

NAME               TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
service/my-nginx   LoadBalancer   10.233.46.15   <pending>     80:30270/TCP   66s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-deployment   2/2     2            2           66s

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-deployment-75b696dc55   2         2         2       66s

You should see the deployment running (Note: there is no external IP, as I don’t have a load balancer running right now). If you want to just backup this application, you can do:

velero backup create nginx-backup --selector app=nginx
velero backup get
NAME           STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup   Completed   0        0          2024-01-16 13:22:53 -0500 EST   29d       default            app=nginx

velero backup describe nginx-backup
velero backup logs nginx-backup

If you look at the “kubernetes” bucket from the Minio console, you’ll see the backup files there. Now, we can delete the application and then restore it…

kubectl delete namespace nginx-example
kubectl get all -n nginx-example
No resources found in nginx-example namespace.

velero restore create --from-backup nginx-backup
Restore request "nginx-backup-20240116132642" submitted successfully.
Run `velero restore describe nginx-backup-20240116132642` or `velero restore logs nginx-backup-20240116132642` for more details.
(picluster-py3.11) pcm@trinity:~/workspace/picluster/velero$ kubectl get all -n nginx-example
NAME                                    READY   STATUS    RESTARTS   AGE
pod/nginx-deployment-75b696dc55-25d6r   1/1     Running   0          3s
pod/nginx-deployment-75b696dc55-7h5zx   1/1     Running   0          3s

NAME               TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/my-nginx   LoadBalancer   10.233.16.165   <pending>     80:31834/TCP   3s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/nginx-deployment   2/2     2            2           2s

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/nginx-deployment-75b696dc55   2         2         2       3s

You can backup the full cluster with “velero backup create FULL_BACKUP-2024-01-17”, using a name to denote the backup. You can use “velero backup get” to see a list of backups, and “velero restore get” to see a list of restores. You can even schedule backups with a command, like the following:

velero schedule create homek8s --schedule="@every 6h"

I haven’t tried this, because I’m using my laptop, which is not always on.

First, we’ll delete the NGINX app that was installed, and the corresponding Velero backup.

kubectl delete -f examples/nginx-app/base.yaml
velero backup delete nginx-backup
velero restore delete nginx-backup

You can check in Minio to make sure the backups are gone. Next, Velero can be removed from the cluster, along with the CRDs used:

kubectl delete namespace/velero clusterrolebinding/velero
kubectl delete crds -l component=velero

Lastly, you can delete the “kubernetes” bucket from the Minio console, and kill the Minio process.

I did have a problem at one point (may have been due to an older Minio version), where I deleted a backup from Velero, and it later re-appeared when doing “velero backup get”. I looked at operations with “mc admin trace myminio” and saw that requests were coming into Minio to remove the backup from my “myminio” server, but the bucket was NOT being removed. Velero would later sync with Minio, see the backup and show that it was still there on a later “velero backup get” command.

I found that the following would remove the bucket and everything under:

mc rm --recursive kubernetes --force --dangerous

There is also a “mc rb” command to remove the bucket (have not tried), or an individual backup can be removed with “mc rm RELATIVE/PATH/FROM/BUCKET”, like “kubernetes/backups/nginx-backup” and “kubernetes/restores/nginx-backup-20231205150432”. I think I tried doing it from the Minio UI, but the files were not removed. My guess is that it does an API request to remove the file, just like what Velero does, whereas the command line seems to remove the file.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part VII: Cluster Backup
January 10

Part VI: Adding Shared Storage

There are many shared storage products available for Kubernetes. I had settled on Longhorn, as it provides block storage, is pretty easy to setup, has snapshots, is distributed, and allows backup to secondary storage (I plan on using NFS to backup to a NAS box that I have on my network). As of this writing, the latest is 1.5.3 (https://longhorn.io/).

With the 1TB SSD drives on each Raspberry PI, and the /dev/sda7 partition, mounted as /var/lib/longhorn, the RPIs can be prepared for Longhorn. There is a script that can be used to see if all the dependencies have been met on the nodes. For 1.5.3 run:

curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/scripts/environment_check.sh | bash

If there are any errors, you need to address them, before continuing. For example, if it complains that iscsid is missing on a node, you can do:

sudo apt-get reinstall -f open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/prerequisite/longhorn-iscsi-installation.yaml

You should make sure that the longhorn-iscsi-installation pods are running on all nodes. In my case, one was not, and the log for the iscsi-installation container was saying that module iscsi_tcp was not present. For that, I did the following:

sudo apt install linux-modules-extra-raspi
sudo reboot

I’ve added that package to tools setup in Part IV, so that it will be already present.

If multipathd is enabled, which is a security risk for block storage devices, you can handle that with:

sudo systemctl stop multipathd
sudo systemctl disable multipathd

In my run, I had a node, apoc, with missing package:

I did a “sudo apt install nfs-common -y” on that node. Since then, I’ve added that to the RPI tools setup in Part IV, so that it’ll be there. Re-run the script to make sure that all the nodes are ready for install.

Helm has already been installed on my Mac, so we can obtain Longhorn with:

helm repo add longhorn https://charts.longhorn.io
helm repo update

Setup an area and get the current settings for Longhorn, so that we can customize them:

cd ~/workspace/picluster
poetry shell
cd mkdir ~/workspace/picluster/longhorn
cd ~/workspace/picluster/longhorn

Before installing, we’ll pull down the settings for v1.5.3:

curl -o values-1.5.3.yaml https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/chart/values.yaml

In values-1.5.3.yaml, under the section service:, for ui: type, set type to NodePort. Under the section persistence:, set the ReclaimType to Retain:

75c75
<     type: ClusterIP
---
>     type: NodePort
89c89
<   reclaimPolicy: Delete
---
>   reclaimPolicy: Retain

This will allow you to access the UI by using any node’s IP, and when Longhorn is brought down, the files in block storage are retained.

We also need to set tolerations for the manager, UI, and driver. There are instructions in the values.yaml file where you remove the square brackets and un-comment the toleration settings. If you don’t do this, the longhorn-driver-deployment pod will never get out of Init state. Diffs for just the tolerations will look like:

--- a/longhorn/values-1.5.3.yaml
+++ b/longhorn/values-1.5.3.yaml
@@ -182,13 +182,13 @@ longhornManager:
     ## Allowed values are `plain` or `json`.
     format: plain
   priorityClass: ~
-  tolerations: []
+  tolerations:
   ## If you want to set tolerations for Longhorn Manager DaemonSet, delete the `[]` in the line above
   ## and uncomment this example block
-  # - key: "key"
-  #   operator: "Equal"
-  #   value: "value"
-  #   effect: "NoSchedule"
+  - key: "key"
+    operator: "Equal"
+    value: "value"
+    effect: "NoSchedule"
   nodeSelector: {}
   ## If you want to set node selector for Longhorn Manager DaemonSet, delete the `{}` in the line above
   ## and uncomment this example block
@@ -202,13 +202,13 @@ longhornManager:

 longhornDriver:
   priorityClass: ~
-  tolerations: []
+  tolerations:
   ## If you want to set tolerations for Longhorn Driver Deployer Deployment, delete the `[]` in the line above
   ## and uncomment this example block
-  # - key: "key"
-  #   operator: "Equal"
-  #   value: "value"
-  #   effect: "NoSchedule"
+  - key: "key"
+    operator: "Equal"
+    value: "value"
+    effect: "NoSchedule"
   nodeSelector: {}
   ## If you want to set node selector for Longhorn Driver Deployer Deployment, delete the `{}` in the line above
   ## and uncomment this example block
@@ -218,13 +218,13 @@ longhornDriver:
 longhornUI:
   replicas: 2
   priorityClass: ~
-  tolerations: []
+  tolerations:
   ## If you want to set tolerations for Longhorn UI Deployment, delete the `[]` in the line above
   ## and uncomment this example block
-  # - key: "key"
-  #   operator: "Equal"
-  #   value: "value"
-  #   effect: "NoSchedule"
+  - key: "key"
+    operator: "Equal"
+    value: "value"
+    effect: "NoSchedule"
   nodeSelector: {}
   ## If you want to set node selector for Longhorn UI Deployment, delete the `{}` in the line above
   ## and uncomment this example block

Install Longhorn with the updated values and monitor the namespace until you see that everything is up:

helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.5.3  --values values-1.5.3.yaml
kubectl get all -n longhorn-system

Use “kubectl get service -n longhorn-system” to find the port for the frontend service, and then with a browser you can access the UI using one of the node’s IPs and the port. For example, http://10.11.12.188:30191, on one run that I did.

You can see and manage volumes, view the total amount of disk space and what is scheduled, and see the nodes being used and their state.

As an example, we can create a PVC that uses Longhorn for storage:

cat << EOF | kubectl create -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 2Gi
EOF

You can verify that the PVC is using longhorn for storage, by doing:

kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Bound    pvc-89456753-e271-46a0-b8c0-9e53affc4c6b   2Gi        RWX            longhorn       3s

The storage class shows “longhorn”. From the Longhorn console, you can see that there is a detached volume for that PVC.

Next, you can create a pod that uses the PVC. Here is an example, using NGINX:

cat << EOF |kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/var/foo/"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim
EOF

This specifies the PVC “myclaim”, and you can see that there is a PV created that uses the PVC, has reclaim policy of retain, and uses the longhorn storage class:

kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
pvc-89456753-e271-46a0-b8c0-9e53affc4c6b   2Gi        RWX            Retain           Bound    default/myclaim   longhorn                4m35s

You can setup a backup for the Longhorn storage. In my case, I have a NAS box that is accessible via NFS.

The first step is to create a share area on the device. You can follow whatever instructions youhave for creating a NFS share.

On my NAS (using GUI console), I created a share at /longhorn, with R/W access for my account (I’m in the “administ” group, BTW) and “no squash users” set. I set the IP range to 10.11.12.0/24, so only nodes from network can access this share. I made sure that the shared area exists, has 777 perms, user/group set to admin. NOTE: Is is actually at /share/CACHEDEV1_DATA/longhorn and there is a symlink at /share/longhorn. I created a subdirectory called “backups” in this area (so there can be other sub-directories for other shares, if desired).

I checked that it appears under /etc/exports with the subnet called out and the settings desired:

"/share/CACHEDEV1_DATA/longhorn" 10.11.12.0/24(sec=sys,rw,async,wdelay,insecure,no_subtree_check,no_root_squash,fsid=...)

I checked that the share was set up correctly, by mounting it from a node and creating/changing a file:

sudo mount -t nfs <IP OF NAS>:/longhorn /mnt

Now, Longhorn can be configured to use this NFS share for backups. Use helm to install the NFS provisioner for Longhorn.

helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm repo update

You can see the configuration for the provisioner with:

helm show values -n nfs-storage nfs-subdir-external-provisioner/nfs-subdir-external-provisioner

Install the provisioner, setup for your share, using the IP of your NFS server:

helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner –set nfs.server=<IP_OF_NFS_SERVER> –set nfs.path=/longhorn/backup -n nfs-storage –create-namespace

Under the Longhorn UI (accessible via NodePort), go to Settings, and in the Backup Target, set the path to the NFS share and click the SAVE button at the bottom of the page:

nfs://<IP_OF_NFS_SERVER>:/longhorn/backup/

Once you have created a volume and it is attached to a node, you can do a backup or take a snapshot. Form the Volume section, click on the name of a volume to bring up details, and then you can click on “Take Snapshot” or “Create Backup”. You can go back to older versions of snapshots, by detaching volume and attaching with maintenance checked. From the snapshot, you can check revert and then detach and re-attach w/o maintenance. Once healthy, you can see that the snapshot is there.

To remove Longhorn, you must set a flag to allow deletion, before removing:

kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag
helm uninstall longhorn -n longhorn-system

If you ever update your kernel on the Raspberry PI, you’ll need to reinstall the extra modules. You can do this with:

sudo apt-get reinstall linux-modules-extra-$(uname -r)

Reboot afterwards. If you don’t do this install, features like Longhorn will be missing required modules and fail.

The Longhorn documentation has information on expanding a volume.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part VI: Adding Shared Storage
December 31

Part V: Bringing Up Cluster With Kubespray

Now that everything is ready, we can use ansible to bring up the cluster with kubespray. The cluster.yml playbook will check to make sure all the dependencies are present on the nodes, versions are correct, and will proceed to install kubernetes on the cluster, as defined by the hosts.yaml you’ve created. Move to the kubespray area, and run the cluster.yaml playbook:

cd ~/workspace/picluster
poetry shell
cd ../kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 cluster.yml

It takes a long time to run, but has a lot to do! With the verbose flag, you can see each step performed and whether or not things were changed or not. At the end, you’ll get a summary, just like on all the other playbooks that were invoked. Here is the end of the output for a run, where I already had a cluster (so things were setup already) and just ran the cluster.yml playbook again.

PLAY RECAP ********************************************************************************************************************************************************************************
cypher                     : ok=658  changed=69   unreachable=0    failed=0    skipped=1123 rescued=0    ignored=0
localhost                  : ok=3    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
lock                       : ok=563  changed=44   unreachable=0    failed=0    skipped=1005 rescued=0    ignored=0
mouse                      : ok=483  changed=50   unreachable=0    failed=0    skipped=717  rescued=0    ignored=0
niobi                      : ok=415  changed=37   unreachable=0    failed=0    skipped=684  rescued=0    ignored=0

Sunday 31 December 2023  10:10:12 -0500 (0:00:00.173)       0:21:31.035 *******
===============================================================================
container-engine/validate-container-engine : Populate service facts --------------------------------------------------------------------------------------------------------------- 99.19s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ----------------------------------------------------------------------------------------------------------------------- 46.85s
etcd : Reload etcd ---------------------------------------------------------------------------------------------------------------------------------------------------------------- 35.10s
etcd : Gen_certs | Write etcd member/admin and kube_control_plane client certs to other etcd nodes -------------------------------------------------------------------------------- 34.06s
kubespray-defaults : Gather ansible_default_ipv4 from all hosts ------------------------------------------------------------------------------------------------------------------- 27.37s
network_plugin/calico : Start Calico resources ------------------------------------------------------------------------------------------------------------------------------------ 26.65s
download : Download_file | Download item ------------------------------------------------------------------------------------------------------------------------------------------ 25.50s
policy_controller/calico : Start of Calico kube controllers ----------------------------------------------------------------------------------------------------------------------- 17.93s
network_plugin/calico : Check if calico ready ------------------------------------------------------------------------------------------------------------------------------------- 17.34s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS templates ------------------------------------------------------------------------------------------------------------ 17.28s
etcd : Gen_certs | Gather etcd member/admin and kube_control_plane client certs from first etcd node ------------------------------------------------------------------------------ 16.50s
download : Download_file | Download item ------------------------------------------------------------------------------------------------------------------------------------------ 14.33s
download : Check_pull_required |  Generate a list of information about the images on a node --------------------------------------------------------------------------------------- 12.78s
container-engine/containerd : Containerd | restart containerd --------------------------------------------------------------------------------------------------------------------- 12.36s
download : Check_pull_required |  Generate a list of information about the images on a node --------------------------------------------------------------------------------------- 12.13s
etcd : Gen_certs | run cert generation script for etcd and kube control plane nodes ----------------------------------------------------------------------------------------------- 11.75s
download : Check_pull_required |  Generate a list of information about the images on a node --------------------------------------------------------------------------------------- 11.47s
download : Check_pull_required |  Generate a list of information about the images on a node --------------------------------------------------------------------------------------- 11.46s
network_plugin/calico : Calico | Create calico manifests -------------------------------------------------------------------------------------------------------------------------- 11.23s
download : Download_file | Download item ------------------------------------------------------------------------------------------------------------------------------------------ 10.81s

If things are broken, you’ll need to go back and fix them and try again. Once it is working, though, we can now get the kube configuration file, so that we can run kubectl commands (we installed kubectl on the Mac in step IV). I use a script (at ~/workspace/picluster) to make this easy to do:

../setup-kubectl.bash

The contents of the script are:

CONTROL_PLANE_NODE=cypher
CONTROL_PLANE_NODE_IP=10.11.12.198
ssh ${CONTROL_PLANE_NODE} sudo cp /etc/kubernetes/admin.conf /home/${USER}/.kube/config
ssh ${CONTROL_PLANE_NODE} sudo chown ${USER} /home/${USER}/.kube/config
mkdir -p ~/.kube
scp ${CONTROL_PLANE_NODE}:.kube/config ~/.kube/config
sed -i .bak -e "s/127\.0\.0\.1/${CONTROL_PLANE_NODE_IP}/" ~/.kube/config

You’ll need to change the CONTROL_PLANE_NODE with the name of one of the control plane nodes, and CONTROL_PLANE_NODE_IP with that node’s IP address. Once this command is run, the config file will be set up to allow the kubectl command to access the cluster.

Next up in the series will be adding shared storage, a load balancer, ingress, monitoring, etc. Below are some other operations that can be done for the cluster.

This is a two step process, depending on what version you want to get to with Kubernetes, and what release of kubespray you are running. Each release of kubespray will have a tag and will correspond to a kubernetes version. You can see the tags with:

git tag | sort -V --reverse
v2.23.1
v2.23.0
v2.22.1
v2.22.0
v2.21.0
...

Alternately, you can just use a specific commit or the latest on the master branch. Once you decide which tag/commit you want, you can do a checkout for that version:

git checkout v2.23.1
git checkout aea150e5d

For whichever tag/commit you use, you can find out the default kubernetes and calico plugin (what I chose for networking), by doing grep commands from the repo area (you can look at specific files, but some times these are stored in different places):

grep -R "kube_version: "
grep -R "calico_version: "

Please note that, with kubespray, you have to upgrade by major release, and cannot skip releases. So, if you want to go from tag v2.21.0 to v2.23.1, you would need to update to v2.22.0 or v2.22.1, and then v2.23.1.0. If you are using a commit, just see what the previous tag was for the commit and then update tags to that tag and then you’ll be all set.

Initially, I ended up using a non-tag version of kubespray because I wanted kubernetes 1.27, and the nearest release tag at the time was v2.22.1, which used kubernetes 1.26.5. I ended up using a commit on master that gave me 1.27.3.

As of this writing, the newest tag is v2.23.1, which is 9 weeks ago, uses kubernetes 1.27.7. I just grabbed the latest on master, which supports kubernetes 1.28.5 (you can see that in commit message):

git show HEAD:inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml | grep kube_version
kube_version: v1.28.5

Granted, you may want to stick to tagged releases (it’s safer), or venture into newer versions, with newer kubernetes. However, you still need to update by a major release at a time with kubespray.

To update kubespray, from ~/workspace/kubernetes/kubespray/ I did the following:

  • Saved my old inventory: mv ~/workspace/kubernetes/picluster/inventory/mycluster{,.save}
  • Did a “git pull origin master” for the kubespray repo and checked out the version I wanted (either a tag, latest, etc).
  • Copied the sample inventory: cp -r inventory/sample ../picluster/inventory/mycluster
  • Updated files in ../picluster/inventory/mycluster/* from the ones in mycluster.save to get the customizations made. This includes hosts.yaml, group_vars/k8s_cluster/k8s-cluster.yml, group_vars/k8s_cluster/addons.yml, other_servers.yaml, and any other files you customized.
  • I set the kubernetes_version in group_vars/k8s_cluster/k8s-cluster.yml to the version desired, as this was a customized item that was older.

In my case, the default calico version would be v3.26.4 (before I had v3.25.2 overridden), and kubernetes v1.28.5 (before I had v1.27.3).

Use the following command, to upgrade the cluster, using the new kubespray code and kubernetes version:

ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 upgrade-cluster.yml

When I did this, I ended up with Kubernetes 1.28.2, instead of the default 1.28.5 (not sure why). I ran the upgrade again, only this time I specified “kube_version: v1.28.5” in the ../picluster/inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml as an override, but it still was using v1.28.2.

Ref: https://kubespray.io/#/docs/upgrades

I received another Raspberry PI 4 for Christmas and wanted to add it to the cluster. I followed all the steps in Part II to place the Ubuntu on the PI, Part III to repartition the SSD drive, Part IV to add the new host to hosts.yaml and then ran the ansible commands just for the node I was adding to setup the rest of the items needed.

To add a control plane node, update the inventory (adding the node definition, and adding the node name to the control plane list and list of nodes) and run the kubespray cluster.yml script:

ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 cluster.yml

Then, restart the nginx-proxy pod, which is the local proxy for the api server. Since I’m using containerd, run this on each worker node:

crictl ps | grep nginx-proxy | awk '{print $1}' | xargs crictl stop

To add a worker node, update the inventory (adding the node definition, and adding the node name to the node list) and run the kubespray scale.yml script:

ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 --limit=${TARGET_NODE} scale.yml

Use the limit arg to not disturb the other nodes.

Ref: https://kubespray.io/#/docs/nodes

To tear down the cluster, you can use the reset.yml playbook provided:

ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 reset.yml

On one attempt, after having updated the kubespray repo to the latest version, the cluster.yaml failed because ansible version I was using was too old:

TASK [Check 2.15.5 <= Ansible version < 2.17.0] *******************************************************************************************************************************************
fatal: [localhost]: FAILED! => {
    "assertion": "ansible_version.string is version(minimal_ansible_version, \">=\")",
    "changed": false,
    "evaluated_to": false,
    "msg": "Ansible must be between 2.15.5 and 2.17.0 exclusive - you have 2.14.13"
}

Doing a “poetry show”, I could see what I had for ansible and one of the dependencies, ansible-core:

ansible          7.6.0   Radically simple IT automation
ansible-core     2.14.13 Radically simple IT automation

To update, I used the command “poetry add ansible@latest”, which would reinstall the latest version and update all the dependencies:

Using version ^9.1.0 for ansible

Updating dependencies
Resolving dependencies... (0.3s)

Package operations: 0 installs, 2 updates, 0 removals

  • Updating ansible-core (2.14.13 -> 2.16.2)
  • Updating ansible (7.6.0 -> 9.1.0)

Writing lock file

If desired, you can do a “poetry search ansible” or “poetry search ansible-core” to see what the latest version is, and you can always specify exactly which version you want to install. That’s the beauty of poetry, in that you can fix specific versions of a package, so that things are repeatable.

I had a case where my cluster was at kubernetes v1.27.3 and v3.25.2 Calico. The kubespray repo had a tag of v2.23.1, which called out v1.27.7 kubernetes and v3.25.2 Calico. Things were great.

I tried to update kubespray to latest on master branch, which defaults to kubenetes v1.28.5 and v3.26.4. However, I still had v3.25.2 Calico in my customizations (with kubernetes updated to call out v1.28.5). The cluster.yml playbook ran w/o issues, but the calico-node pods were not up and were in a crash loops. The install-cni container for a calico-node pod was showing an error saying:

Unable to create token for CNI kubeconfig error=serviceaccounts "calico-node" is forbidden: User "system:serviceaccount:kube-system:calico-node" cannot create resource "serviceaccounts/token" in API group "" in the namespace "kube-system"

Even though kubernetes v1.28.5 is supported by Calico v3.25.2, there was some incompatibility. I haven’t figured it out, but I saw this before as well, and the solution was to either use the versions called out in the commit being used for kubespray, or at least near that version for kubernetes. By using the default v3.26.4 Calico, it came up fine.

Also note that even though I specified kubernetes v1.28.5, in my customization (which happened to be the same as the default), I ended up with v1.28.2 (not sure why).

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part V: Bringing Up Cluster With Kubespray