January 10

Part VI: Adding Shared Storage

January 19, 2024

There are many shared storage products available for Kubernetes. I had settled on Longhorn, as it provides block storage, is pretty easy to setup, has snapshots, is distributed, and allows backup to secondary storage (I plan on using NFS to backup to a NAS box that I have on my network). As of this writing, the latest is 1.5.3 (https://longhorn.io/).

Preparation

With the 1TB SSD drives on each Raspberry PI, and the /dev/sda7 partition, mounted as /var/lib/longhorn, the RPIs can be prepared for Longhorn. There is a script that can be used to see if all the dependencies have been met on the nodes. For 1.5.3 run:

curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/scripts/environment_check.sh | bash

If there are any errors, you need to address them, before continuing. For example, if it complains that iscsid is missing on a node, you can do:

sudo apt-get reinstall -f open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/deploy/prerequisite/longhorn-iscsi-installation.yaml

You should make sure that the longhorn-iscsi-installation pods are running on all nodes. In my case, one was not, and the log for the iscsi-installation container was saying that module iscsi_tcp was not present. For that, I did the following:

sudo apt install linux-modules-extra-raspi
sudo reboot

I’ve added that package to tools setup in Part IV, so that it will be already present.

If multipathd is enabled, which is a security risk for block storage devices, you can handle that with:

sudo systemctl stop multipathd
sudo systemctl disable multipathd

In my run, I had a node, apoc, with missing package:

[INFO]  Required dependencies 'kubectl jq mktemp sort printf' are installed.
[INFO]  All nodes have unique hostnames.
[INFO]  Waiting for longhorn-environment-check pods to become ready (0/5)...
[INFO]  Waiting for longhorn-environment-check pods to become ready (0/5)...
[INFO]  All longhorn-environment-check pods are ready (5/5).
[INFO]  MountPropagation is enabled
[INFO]  Checking kernel release...
[INFO]  Checking iscsid...
[INFO]  Checking multipathd...
[INFO]  Checking packages...
[ERROR] nfs-common is not found in apoc.
[INFO]  Checking nfs client...
[INFO]  Cleaning up longhorn-environment-check pods...
[INFO]  Cleanup completed.

I did a “sudo apt install nfs-common -y” on that node. Since then, I’ve added that to the RPI tools setup in Part IV, so that it’ll be there. Re-run the script to make sure that all the nodes are ready for install.

Install

Helm has already been installed on my Mac, so we can obtain Longhorn with:

helm repo add longhorn https://charts.longhorn.io
helm repo update

Setup an area and get the current settings for Longhorn, so that we can customize them:

cd ~/workspace/picluster
poetry shell
cd mkdir ~/workspace/picluster/longhorn
cd ~/workspace/picluster/longhorn

Before installing, we’ll pull down the settings for v1.5.3:

curl -o values-1.5.3.yaml https://raw.githubusercontent.com/longhorn/longhorn/v1.5.3/chart/values.yaml

In values-1.5.3.yaml, under the section service:, for ui: type, set type to NodePort. Under the section persistence:, set the ReclaimType to Retain:

75c75
<     type: ClusterIP
---
>     type: NodePort
89c89
<   reclaimPolicy: Delete
---
>   reclaimPolicy: Retain

This will allow you to access the UI by using any node’s IP, and when Longhorn is brought down, the files in block storage are retained.

We also need to set tolerations for the manager, UI, and driver. There are instructions in the values.yaml file where you remove the square brackets and un-comment the toleration settings. If you don’t do this, the longhorn-driver-deployment pod will never get out of Init state. Diffs for just the tolerations will look like:

--- a/longhorn/values-1.5.3.yaml
+++ b/longhorn/values-1.5.3.yaml
@@ -182,13 +182,13 @@ longhornManager:
     ## Allowed values are `plain` or `json`.
     format: plain
   priorityClass: ~
-  tolerations: []
+  tolerations:
   ## If you want to set tolerations for Longhorn Manager DaemonSet, delete the `[]` in the line above
   ## and uncomment this example block
-  # - key: "key"
-  #   operator: "Equal"
-  #   value: "value"
-  #   effect: "NoSchedule"
+  - key: "key"
+    operator: "Equal"
+    value: "value"
+    effect: "NoSchedule"
   nodeSelector: {}
   ## If you want to set node selector for Longhorn Manager DaemonSet, delete the `{}` in the line above
   ## and uncomment this example block
@@ -202,13 +202,13 @@ longhornManager:

 longhornDriver:
   priorityClass: ~
-  tolerations: []
+  tolerations:
   ## If you want to set tolerations for Longhorn Driver Deployer Deployment, delete the `[]` in the line above
   ## and uncomment this example block
-  # - key: "key"
-  #   operator: "Equal"
-  #   value: "value"
-  #   effect: "NoSchedule"
+  - key: "key"
+    operator: "Equal"
+    value: "value"
+    effect: "NoSchedule"
   nodeSelector: {}
   ## If you want to set node selector for Longhorn Driver Deployer Deployment, delete the `{}` in the line above
   ## and uncomment this example block
@@ -218,13 +218,13 @@ longhornDriver:
 longhornUI:
   replicas: 2
   priorityClass: ~
-  tolerations: []
+  tolerations:
   ## If you want to set tolerations for Longhorn UI Deployment, delete the `[]` in the line above
   ## and uncomment this example block
-  # - key: "key"
-  #   operator: "Equal"
-  #   value: "value"
-  #   effect: "NoSchedule"
+  - key: "key"
+    operator: "Equal"
+    value: "value"
+    effect: "NoSchedule"
   nodeSelector: {}
   ## If you want to set node selector for Longhorn UI Deployment, delete the `{}` in the line above
   ## and uncomment this example block

Install Longhorn with the updated values and monitor the namespace until you see that everything is up:

helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.5.3  --values values-1.5.3.yaml
kubectl get all -n longhorn-system

Use “kubectl get service -n longhorn-system” to find the port for the frontend service, and then with a browser you can access the UI using one of the node’s IPs and the port. For example, http://10.11.12.188:30191, on one run that I did.

You can see and manage volumes, view the total amount of disk space and what is scheduled, and see the nodes being used and their state.

Creating Persistent Volume Claim Using Longhorn as Storage

As an example, we can create a PVC that uses Longhorn for storage:

cat << EOF | kubectl create -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteMany
  volumeMode: Filesystem
  resources:
    requests:
      storage: 2Gi
EOF

You can verify that the PVC is using longhorn for storage, by doing:

kubectl get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Bound    pvc-89456753-e271-46a0-b8c0-9e53affc4c6b   2Gi        RWX            longhorn       3s

The storage class shows “longhorn”. From the Longhorn console, you can see that there is a detached volume for that PVC.

Next, you can create a pod that uses the PVC. Here is an example, using NGINX:

cat << EOF |kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/var/foo/"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim
EOF

This specifies the PVC “myclaim”, and you can see that there is a PV created that uses the PVC, has reclaim policy of retain, and uses the longhorn storage class:

kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS   REASON   AGE
pvc-89456753-e271-46a0-b8c0-9e53affc4c6b   2Gi        RWX            Retain           Bound    default/myclaim   longhorn                4m35s

Bonus Points

You can setup a backup for the Longhorn storage. In my case, I have a NAS box that is accessible via NFS.

The first step is to create a share area on the device. You can follow whatever instructions youhave for creating a NFS share.

On my NAS (using GUI console), I created a share at /longhorn, with R/W access for my account (I’m in the “administ” group, BTW) and “no squash users” set. I set the IP range to 10.11.12.0/24, so only nodes from network can access this share. I made sure that the shared area exists, has 777 perms, user/group set to admin. NOTE: Is is actually at /share/CACHEDEV1_DATA/longhorn and there is a symlink at /share/longhorn. I created a subdirectory called “backups” in this area (so there can be other sub-directories for other shares, if desired).

I checked that it appears under /etc/exports with the subnet called out and the settings desired:

"/share/CACHEDEV1_DATA/longhorn" 10.11.12.0/24(sec=sys,rw,async,wdelay,insecure,no_subtree_check,no_root_squash,fsid=...)

I checked that the share was set up correctly, by mounting it from a node and creating/changing a file:

sudo mount -t nfs <IP OF NAS>:/longhorn /mnt

Now, Longhorn can be configured to use this NFS share for backups. Use helm to install the NFS provisioner for Longhorn.

helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm repo update

You can see the configuration for the provisioner with:

helm show values -n nfs-storage nfs-subdir-external-provisioner/nfs-subdir-external-provisioner

Install the provisioner, setup for your share, using the IP of your NFS server:

helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner –set nfs.server=<IP_OF_NFS_SERVER> –set nfs.path=/longhorn/backup -n nfs-storage –create-namespace

Under the Longhorn UI (accessible via NodePort), go to Settings, and in the Backup Target, set the path to the NFS share and click the SAVE button at the bottom of the page:

nfs://<IP_OF_NFS_SERVER>:/longhorn/backup/

Once you have created a volume and it is attached to a node, you can do a backup or take a snapshot. Form the Volume section, click on the name of a volume to bring up details, and then you can click on “Take Snapshot” or “Create Backup”. You can go back to older versions of snapshots, by detaching volume and attaching with maintenance checked. From the snapshot, you can check revert and then detach and re-attach w/o maintenance. Once healthy, you can see that the snapshot is there.

Uninstalling

To remove Longhorn, you must set a flag to allow deletion, before removing:

kubectl -n longhorn-system patch -p '{"value": "true"}' --type=merge lhs deleting-confirmation-flag
helm uninstall longhorn -n longhorn-system

Side Bar

Kernel Upgrades

If you ever update your kernel on the Raspberry PI, you’ll need to reinstall the extra modules. You can do this with:

sudo apt-get reinstall linux-modules-extra-$(uname -r)

Reboot afterwards. If you don’t do this install, features like Longhorn will be missing required modules and fail.

Expanding Longhorn Volume Size

The Longhorn documentation has information on expanding a volume.

Posted January 10, 2024 by pcm in category "bare-metal", "Kubernetes", "Raspberry PI