June 2024 – Off The Record

Media Server In Kubernetes

Another one of my apps running on a standalone Raspberry PI 4 (in a docker container), is the Emby media server. I ripped all my CDs to FLAC files and have been serving them up with Emby, so that I can play on my laptop, phone, Sonos speaker, and other DLNA devices. All the music is on a NAS box and I had it mounted on the Raspberry PI.

Now that I have a Kubernetes cluster of PIs, I wanted move the Emby server, and this looked like a good exercise on how to take a Docker container and run it on Kubernetes. There were some challenges, which made this harder than expected. Let’s go through the process though…

Migrating the Docker Container

After searching around, I found the a common way to migrate from Docker to Kubernetes was to use Kompose. I had this Dockerfile for Emby:

version: "2.3"
services:
emby:
image: emby/embyserver_arm64v8:latest
container_name: emby
environment:
- PUID=1000
- PGID=1003
- TZ=America/New_York
volumes:
- /var/lib/docker/volumes/emby/_data:/config
- /mnt/music:/Music
network_mode: host
# ports:
# - 8096:8096
# - 8920:8920
restart: unless-stopped

There are a couple of things of note here. First, I set the UID to the same ID used on the NAS box for the FLAC files, and a GID to the one used on the NAS box so that family members had access to the files as well. Second, I mapped the config location to the host, of which the music area was an NFS mount to the NAS box.

Lastly, I was using host mode networking, which was needed so that the Multicast DLNA packets (M-SEARCH and NOTIFY) would be seen from the container. This allowed Emby to “see” the DLNA devices on my local network. This proved to be a difficult thing to setup under Kubernetes.

I ran the Kompose convert command and it generated a deployment, service, and some PVC definitions. Of course, I ran this on my Mac, where the mount points and config area did not exist, so there were warnings and things were not setup as desired. But, it was useful, as it gave me an idea of how I wanted to define things.

I created a single manifest that incorporated some of what Kompose generated, sprinkled with settings I wanted, and setting up the volumes to use NFS, instead of PVC using Kubernetes storage. Here’s what I came up with (shown in parts) placed into ~/workspace/kubernetes/emby/k8s-emby.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: emby
---

I wanted all the music server stuff in a separate namespace.

apiVersion: v1
kind: Service
metadata:
  name: emby-service
  namespace: emby
spec:
  type: LoadBalancer
  selector:
    app: emby
  ports:
    - name: http
      port: 8096
      targetPort: 8096
      protocol: TCP
    - name: https
      port: 8920
      targetPort: 8920
      protocol: TCP
---

A service is defined, with type LoadBalancer, so that I can access the media service with a well-known IP. I used the defaults that Emby suggested for HTTP and HTTPS access to the server.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: emby
  namespace: emby
spec:
  replicas: 1
  selector:
    matchLabels:
      app: emby
  template:
    metadata:
      labels:
        app: emby
    spec:
      containers:
        - name: emby
          image: emby/embyserver_arm64v8:latest
          env:
            - name: UID
              value: "1000"
            - name: GID
              value: "1003"
            - name: GIDLIST
              value: "1003"
            - name: TZ
              value: "America/New_York"
          ports:
            - containerPort: 8096
              protocol: TCP
            - containerPort: 8920
              protocol: TCP
          volumeMounts:
            - name: config
              mountPath: /config
            - name: music
              mountPath: /Music
      restartPolicy: Always
      volumes:
        - name: config
          nfs:
            server: IP_OF_MY_NAS
            path: /music/config
        - name: music
          nfs:
            server: IP_OF_MY_NAS
            path: /music/music

For the deployment, I used the same namespace and defined to use a single pod with Emby. which will create a single pod with the latest ARM64 version of Emby (happens to be 4.8.8.0. Could have pinned to a specific version and then update, as desired, by looking at hub.docker.com). The environment settings for UID, GID, GIDLIST, and TZ are passed in to the pod. I looked at the Docker version of the latest Emby to see that it had some different settings than my (much older) version. The HTTP and HTTPS ports for Emby are defined and match the service.

Lastly, I defined a volume for config settings and another for the music repo and mapped those to the IP and share locations of my NAS. The NAS already had a share called “music”, with a directory called “music”, containing directories for all of the artists, which in turn had directories for the artist’s albums, and then FLAC files for the album songs. I created a directory in the music share, called “config” to hold the configuration settings

With this setup, we are ready to apply the manifest and configure the Emby server…

Emby Startup

After doing “kubectl apply -f k8s-emby.yaml”, I could see that there was one pod running and a service with an IP from my load balancer pool. From a browser, I navigated to http://<emby-service-ip>:8096/ and could see the Emby setup wizard. I picked the language (“English”), and created my Emby user and password.

On the “Setup Media Libraries” page, I clicked the button for “New Library”, selected the type “Music”, gave it a name “Music”, and then clicked on the folder “Add” button, selected the “/Music” directory that maps to the NFS share, and clicked “OK”. Lastly, under “Music Folder Structure”, I picked the item “Perfectly ordered into artist/album folders, with tracks directly in the album folders”, and pressed the “OK” button.

You can click on the Advanced selector at the top right of the page and then choose some other options, if desired.

On the next screens, I skipped the metadata language info (as it was fine), kept the default port mapping selection, accepted the terms of use, and clicked on the finished button.

At this point, I could click on the “Manual Login” and log in with the credentials I set up. Under the settings (gear at top right of screen), I did some more settings.

Under “Network”, I set the “LAN Networks” to the CIDR for my local network. Under “DNLA”, I checked the “Enable DNLA Server” box and chose my user under the “Default User” entry. Under “Plugins”, I clicked the “Catalog” button, and under General, installed the Sonos plugin.

With these changes, I clicked on the “Dashboard” button, clicked on the power button icon at the top, and selected to restart the Emby server to apply all the changes.

Partial success…

As-is, I can now access the URL (port 8096) from my web browser on my Mac or phone, and select and play music. However, The “Play On” menu (square box at the top right of each page), only has the selection of the web browser I’m using. I cannot see my Sonos speaker, receiver that has DLNA support, or any other DNLA devices.

I found out that the issue is with how DLNA works. From my basic understanding, the DLNA server will multicast M-SEARCH UDP packets to 239.255.255.250, using port 1900. DLNA devices will multicast NOTIFY UDP packets to the same address and port. When I was using a Docker container, the container was using Host networking, and thus was using the same IP as the host, which is on my local network.

With Kubernetes, the Emby pod is running on the pod network (10.233.0.0/18), whereas all the DLNA devices are on the local network, and these multicast packets will not traverse subnets.

I tried one solution, and that was to add to the deployment’s template spec “hostNetwork: true”. Now, the pod is on the local network and DLNA multicasts are seen and the Emby server can Play On devices like my Sonos. The problem here is that the pod has the same IP as the node that it was deployed on. This makes it hard to use, as the pod could be re-deployed on another node. Yeah, I could force it to one node, but if that node failed, I’d loose the Emby server.

Houston We Have Lift-Off!

I found that I can setup two interfaces on the pod, by using Multus. The plan is to create a second interface on the pod that is on the local network so that it can send/receive DLNA multicasts communicating with the DNLA devices. This requires several steps…

First, we need to install Multus. Fortunately, Multus works well with Calico, which I’m using on my network. Unfortunately, I could not use the “quick start” install methods for Multus on my arm64 Raspberry PI hardware. To get this installed, I first pulled the Multus repo:

git clone https://github.com/k8snetworkplumbingwg/multus-cni.git
cd multus-cni/deployments

I used the multus-daemonset.yml to create a daemonset that will install Multus on each node. However, the two image: lines need to be changed, as “ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot” is not for the arm64 platform. I think they have some multi-platform support, maybe with annotations, but I didn’t know how to set that up. So, I went to the Github Container Registry for Multus, clicked on the “OS/Arch” tab and then selected the image for arm64 and noted the version. In multus-daemonset.yml, I changed the image version:

diff --git a/deployments/multus-daemonset.yml b/deployments/multus-daemonset.yml
index 40fa5193..fa8bde5c 100644
--- a/deployments/multus-daemonset.yml
+++ b/deployments/multus-daemonset.yml
@@ -179,7 +179,7 @@ spec:
serviceAccountName: multus
containers:
- name: kube-multus
- image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot
+ image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-debug@sha256:351652b583600b0d0d704269882fd2fa53395c5ce4602a76a2799960b2c06dce
command: ["/thin_entrypoint"]
args:
- "--multus-conf-file=auto"
@@ -204,7 +204,7 @@ spec:
mountPath: /tmp/multus-conf
initContainers:
- name: install-multus-binary
- image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot
+ image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-debug@sha256:351652b583600b0d0d704269882fd2fa53395c5ce4602a76a2799960b2c06dce
command: ["/install_multus"]
args:
- "--type"

Now, I can “kubectl apply -f multus-daemonset.yml” to install the daemonset. Once done, I checked that the pods are running on each node:

kubectl get pods --all-namespaces | grep -i multus
kube-system kube-multus-ds-4btnp 1/1 Running 0 20h
kube-system kube-multus-ds-6p9vx 1/1 Running 0 20h
kube-system kube-multus-ds-mzb4b 1/1 Running 0 20h
kube-system kube-multus-ds-s7d8v 1/1 Running 0 20h
kube-system kube-multus-ds-twn6k 1/1 Running 0 20h
kube-system kube-multus-ds-vqxh8 1/1 Running 0 20h
kube-system kube-multus-ds-wwnbj 1/1 Running 0 20h

On a node, you can check that there is a /etc/cni/net.d/00-multus.conf file as the lexically first file. Now, a network attachment definition can be created (I added it to the k8s-emby.yaml file, after the namespace definition):

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
  namespace: emby
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "subnet": "10.11.12.0/24",
        "rangeStart": "10.11.12.211",
        "rangeEnd": "10.11.12.215",
        "routes": [
          { "dst": "10.11.12.0/24" }
        ],
        "gateway": "10.11.12.1"
      }
  }'

In the metadata, I specified the “emby” namespace, so that this is visible by the emby pod. The config section is a CNI configuration. Of note is that master attribute is the interface name for the pod’s main interface (with IP on pod network). For IPAM, I used the CIDR of my local network as the subnet, and used a range of IPs that is outside of any DHCP pool, LoadBalancer pool, and existing static IPs. I set the route destination as the local network (not a default route, so that it doesn’t interfere with pod traffic).

The final step is to modify the Emby deployment so that when the Emby pod is created, it uses the new network attachment definition and creates two interfaces. This is done as an annotation under the deployment’s template metadata (added lines in red) of the k8s-emby.yaml file:

...
apiVersion: apps/v1
kind: Deployment
metadata:
  name: emby
  namespace: emby
  labels:
    app: emby
spec:
  replicas: 1
  selector:
    matchLabels:
      app: emby
  template:
    metadata:
      labels:
        app: emby
      annotations:
        k8s.v1.cni.cncf.io/networks: macvlan-conf
    spec:
      ...

Now, when we apply the deployment, the pod will have two interfaces. The main interface, eth0, and the additional net1 interface:

3: eth0@if40: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue state UP qlen 1000
    link/ether be:0f:38:6b:bc:ea brd ff:ff:ff:ff:ff:ff
    inet 10.233.115.96/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::bc0f:38ff:fe6b:bcea/64 scope link
       valid_lft forever preferred_lft forever
4: net1@tunl0: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 22:76:a6:b2:34:9f brd ff:ff:ff:ff:ff:ff
    inet 10.11.12.213/24 brd 10.11.12.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::2076:a6ff:feb2:349f/64 scope link
       valid_lft forever preferred_lft forever

Now, I can access the UI at the service’s public address, and when I click on the “Play On” button, I see all the DLNA devices that receive streamed music. Yay!

Remote/Secure Access

Everything is working locally, but I wouldn’t mind being able to play music on my phone, when I’m away from home. I started planning this and decided on a few things:

- Use the domain name that I purchased and create subdomains for each app.
- Use the Dynamic DNS service that I purchased to map my domain to my home router.
- Configure the router to map HTTP and HTTPS requests to an Ingress controller, which will route the requests to the apps, based on the subdomain used.
- Use Let’s Encrypt so that all HTTPS requests have a valid certificate (from a Certificate Authority).

Prep Work

I have the domain registration (e.g. my-domain.com) and I have created subdomains for my apps (e.g. music.my-domain.com). I know my Dynamic DNS service domain name, so I created CNAME DNS records to point the domain and all the subdomains to that DDNS name.

With my router and the Dynamic DNS service, I have configured it so that the DDNS domain name is always pointing to my router’s WAN address (which is a DHCP address from my service provider and can change).

Kubernetes Work

With the external stuff out of the way (mostly), I could focus on connecting up the Kubernetes side. I installed Traefik for an ingress controller. I like this better than NGINX, because it works well with apps that are in namespaces:

helm repo add traefik https://traefik.github.io/charts
helm repo update
helm install traefik traefik/traefik

This is running in the default namespace, but can be run in a specific namespace, if desired. I made sure the pod and service were running. On my router, I set port forwarding of HTTP and HTTPS requests to the IP of the Traefik service. That will cause all external requests to use the Traefik ingress for routing. Next, install the cert-manager:

helm repo add jetstack https://charts.jetstack.io --force-update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.15.0 \
  --set crds.enabled=true

Obviously, you can use the latest version of cert-manager that is compatible with the Kubernetes version you are running. You’ll see pods, services, deployments, and replica sets created (and running) for the cert-manager.

I created a work area to hold manifests for the resources that will be created:

mkdir -p ~/kubernetes/traefik
cd ~/kubernetes/traefik

For the Let’s Encrypt certificates, we’ll test everything out with staging certificates, and then once that is all working, we can switch to production certificates. This is done, because there is rate-limiting on production certificates and we don’t want to have multiple failures to hit the limit and block us.

Here is the staging certificate (emby-issuer.yaml):

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: emby-issuer
  namespace: emby
spec:
  acme:
    email: your-email-address
  # We use the staging server here for testing to avoid hitting
  server: https://acme-staging-v02.api.letsencrypt.org/directory
  privateKeySecretRef:
    # if not existing, it will register a new account and stores it
    name: emby-issuer-account-key
  solvers:
    - http01:
        # The ingressClass used to create the necessary ingress routes
        ingress:
          class: traefik

Note that this is in the same namespace as the app, the staging Let’s Encrypt server is used, and you provide a contact email address. For the production Issuer, emby-prod-issuer.yaml, we have:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: emby-prod-issuer
  namespace: emby
spec:
  acme:
    email: your-email-address
    # We use the staging server here for testing to avoid hitting
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # if not existing, it will register a new account and stores it
      name: emby-issuer-account-key
    solvers:
      - http01:
          # The ingressClass used to create the necessary ingress routes
          ingress:
            class: traefik

Pretty much the same thing, only using the production Let’s Encrypt server, and a different name for the issuer. Do a “kubectl apply -f” for each of these and then do a “kubectl get issuer -A” to make sure they are ready. You can check “kubectl describe issuer -n emby emby-issuer” and under the Status section see that the staging issuer is registered and ready:

    Reason: ACMEAccountRegistered
    Status: True
    Type: Ready

Now, I create an Ingress for the app that will be used with the staging certificate:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: emby
  namespace: emby
  annotations:
    cert-manager.io/issuer: "emby-issuer"
spec:
  tls:
    - hosts:
        - music.my-domain.com
      secretName: tls-emby-ingress-http
  rules:
    - host: music.my-domain.com
      http:
        paths:
          - path: /emby
            pathType: Prefix
            backend:
              service:
                name: emby-service
                port:
                  name: http

Of note is that there is an annotation that refers to the cert-manager staging issuer and the subdomain name that will be used for this Emby app is specified both as the host and in the TLS. If desired you can leave out the annotation and the TLS section and test out accessing the Emby app by using HTTP (e.g. http://music.my-domain.com/emby). That is what I did to make sure the ingress alone was OK.

This ingress will take requests to music.my-domain.com/emby/… and pass them to the service “emby-service” (running in namespace “emby”) using the port defined in the service with the name “http” (e.g. 8096).

By applying emby-ingress.yaml, you will initiate the process of creating a staging certificate. A certificate will be created, but not ready. This will trigger a certificate request and then an order. The order will create a challenge that will verify the challenge URL is reachable and then will obtain the certificate from Let’s Encrypt. Here are the get commands you can use for resources, and then you can do describe commands for the specific resources:

kubectl get certificate -A
kubectl get certificaterequest -A
kubectl get order -A
kubectl get challenge -A

It will take some time for all this to happen, but you can “describe” the challenge and check the other resources to see when they are valid/ready/approved. Don’t worry if you see a 404 status on the challenge initially. It should clear after 30 seconds or so.

When the challenge is completed successfully, the challenge resource will be removed and there will be a new secret with the name of your certificate (e.g. tls-emby-ingress-http) in the namespace of the app. This secret would be used for the certificate that users would see when accessing your domain. Granted, it is from the staging server, so there would be a warning about the validity, but now you can repeat the process with the production certificate and then visitors would see a valid certificate.

Here is the production ingress (emby-prod-ingress.yaml) that can be used with the production issuer that was previously created:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: emby
  namespace: emby
  annotations:
    cert-manager.io/issuer: "emby-prod-issuer"
spec:
  tls:
    - hosts:
        - music.my-domain.com
      secretName: emby-prod-cert
  rules:
    - host: music.my-domain.com
      http:
        paths:
          - path: /emby
            pathType: Prefix
            backend:
              service:
                name: emby-service
                port:
                  name: http

I used different names for the issuer and secret, so that there was no conflict with the staging ones. You can delete the staging ingress, issuer, and secret, once this is working. Here is output of a successful production certificate:

kubectl get cert -n emby
NAME           READY SECRET         AGE
emby-prod-cert True  emby-prod-cert 122m

kubectl get certificaterequest -n emby
NAME              APPROVED DENIED READY ISSUER           REQUESTOR                                       AGE
emby-prod-cert-1  True            True  emby-prod-issuer system:serviceaccount:cert-manager:cert-manager 122m

kubectl get order -n emby
NAME                        STATE AGE
emby-prod-cert-1-2302305457 valid 122m

Forcing HTTPS

Right now, it is possible to use both http://music.my-domain.com/emby/ and https://music.my-domain.com/emby/. I would like to redirect all HTTP requests to HTTPS. To do that, I’ll use the Traefik redirect middleware by creating this manifest (redirect2https.yaml):

# Redirect to https
apiVersion: v1
kind: Namespace
metadata:
  name: secureapps
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: redirect2https
  namespace: secureapps
spec:
  redirectScheme:
    scheme: https

As you can see, I have the namespace “secureapps”. If desired, you can use the namespace for a single app (e.g. “emby”), if you only want the redirection to apply there. You can alternatively modify the Traefik Helm chart (do a “helm show values traefik/traefik > values.yaml” and set ports.web.redirectTo: websecure” and then update the chart) to apply to all ingresses. I have not tried that method.

Now, we update the ingress manifest to add this annotation (this is the top of the emby-ingress.yaml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: emby
  namespace: emby
  annotations:
    cert-manager.io/issuer: "emby-issuer"
    traefik.ingress.kubernetes.io/router.middlewares: secureapps-redirect2https@kubernetescrd
spec:

The new line is in red and has the namespace-middleware (secureapps-redirect2https). If you wanted this to apply only to Emby, you could change the namespace of the Middleware to “emby” and use “emby-redirect2https”.

I deleted the ingress I was currently using, deleted the secret for the cert that was generated, and then applied the Middleware manifest and the ingress that was updated. A new cert was created and now, HTTP requests are redirected to HTTPS!

Removing The Remote Access Setup

Delete the issuers and ingress that you have (e.g. use kubectl delete -f emby-prod-ingress.yaml). You can then remove Traefik:

helm delete traefik

And the cert-manager:

helm delete -n cert-manager cert-manager
kubectl delete crd virtualservers.k8s.nginx.org

kubectl delete crd virtualserverroutes.k8s.nginx.org

kubectl get crd | grep cert | cut -f1 -d" " | xargs kubectl delete crd

And finally any secrets that were created for certificates:

kubectl delete -n emby secret emby-issuer-account-key emby-prod-cert tls-emby-ingress-http

Things To Explore

Traefik also has an IngressRoute mechanism that seems to be quite flexible and a lot of their documentation (like for Let’s Encrypt setup) uses this instead of Ingress. It may be worthwhile using that as, at first blush, it seems like a newer way of doing things.

Consider removing the /emby prefix from the path for accessing remotely. This would only apply, if Emby was the only web service being provided. If you have multiple web services, then you need to keep the prefix to discern which service to use.

Consider using one certificate for all sub-domains.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

June 12

Ad-Blocking With PI-Hole

I had Pi-Hole running on a standalone Raspberry PI, but wanted to move this to my Kubernetes cluster. Digging around, I found a useful article on how add PI-Hole to Kubernetes, which not only talked about using PI-Hole, but having redundant instances with info on keeping them in-sync. It used MetalLB, ingress, and CertManager for Let’s Encrypt certifications – something I was interested in.

There was another article, based on using Helm and having some monitoring setup. I may try someday.

UPDATE: I did an update of Pi-Hole from 2024.07.0 to 2025.03.0, and ran into a bunch of issues. See the details below of this upgrade.

A Few Things

First, as expected, this article had an older version of pi-hole (2022.12.1). I tried the latest version (at this time 2024.05.0), but the pods were stuck in crash loops. What I found out, was that for liveness/readiness, the YAML specified to do an HTTP get at the root of the Lighttp web server. When using the 2023.02.1 pihole image it worked, but with 2023.02.2 it failed.

Trying curl http://127.0.0.1/ inside the pod showed a 403 Forbidden error. If I tried to access http://127.0.0.1/admin, I’d get a 301 Moved Permanently with a ‘/admin/’ path. If I did http://127.0.0.1/admin/, I’d get a 302 Found response with path ‘login.php’. When I did http://127.0.0.1/admin/login.php, I’d get a 200 OK result with content.

So, I changed the liveness and health probe configuration to add a path field with ‘/admin/login.php’ and then the pods would come up successfully.

Second, For the PI-Hole admin web pages, I chose to use a network type of LoadBalancer (instead of ClusterIP and then setting up an ingress IP). Accessing locally is fine, as I just use the IP assigned by the load balancer. The article talks about setting up a certificate using Let’s Encrypt to be able to access remotely.

I already have a domain name, and I’m using Dynamic DNS to redirect that domain to my router’s WAN IP. But, I’m currently port forwarding external HTTP/HTTPS traffic to my standalone Raspberry PI for a music server that uses Let’s Encrypt for certificates.

For now, I think I’ll just access my PI-Hole admin page locally. I will, however, have to figure out how to setup Let’s Encrypt, once I move my music server and other web apps to the Kubernetes cluster, so it will be useful to keep this info in mind.

Setting Up PI-Hole

I’m doing the same thing as the article, running three replicas of the PI-Hole pods, and I altered the liveness/readiness check. Here is my manifest.yaml in pieces:

apiVersion: v1
kind: Namespace
metadata:
  name: pihole
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: pihole-configmap
  namespace: pihole
data:
  TZ: "America/New_York"
  PIHOLE_DNS_: "208.67.220.220;208.67.222.222"

This sets up a namespace for PI-Hole, defines the timezone I’m using, and the upstream DNS servers that I wanted to use (OpenDNS). You can customize, as desired.

---
apiVersion: v1
kind: Secret
metadata:
name: pihole-password
namespace: pihole
type: Opaque
data:
WEBPASSWORD: <PUT_BASE64_PASSWORD_HERE> # Base64 encoded

This is the password that will be used when logging into the PI-Hole admin page. You should encode this using “echo -n ‘MY PASSWORD’ | base64” and place the encoded string in the WEBPASSWORD attribute.

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: pihole
  namespace: pihole
spec:
  selector:
    matchLabels:
      app: pihole
  serviceName: pihole
  replicas: 3
  template:
    metadata:
      labels:
        app: pihole
    spec:
      containers:
        - name: pihole
          image: pihole/pihole:2024.05.0
          envFrom:
            - configMapRef:
                name: pihole-configmap
            - secretRef:
                name: pihole-password
          ports:
            - name: svc-80-tcp-web
              containerPort: 80
              protocol: TCP
            - name: svc-53-udp-dns
              containerPort: 53
              protocol: UDP
            - name: svc-53-tcp-dns
              containerPort: 53
              protocol: TCP
          livenessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login.php
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login.php
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 10
          volumeMounts:
            - name: pihole-etc-pihole
              mountPath: /etc/pihole
            - name: pihole-etc-dnsmasq
              mountPath: /etc/dnsmasq.d
  volumeClaimTemplates:
    - metadata:
        name: pihole-etc-pihole
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi
    - metadata:
        name: pihole-etc-dnsmasq
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi

This is the stateful set that will create three replicas of the PI-Hole pods. I’m using the latest version at this time (2024.05.0), have modified the liveness/readiness checks as mentioned above, and am using PVs (longhorn) for storing configuration.

---
apiVersion: v1
kind: Service
metadata:
  name: pihole
  namespace: pihole
  labels:
    app: pihole
spec:
  clusterIP: None
  selector:
    app: pihole
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-web-svc
  namespace: pihole
spec:
  selector:
    app: pihole
    statefulset.kubernetes.io/pod-name: pihole-0
  type: LoadBalancer
  ports:
    - name: svc-80-tcp-web
      port: 80
      targetPort: 80
      protocol: TCP
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-udp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
  type: LoadBalancer
  ports:
    - name: svc-53-udp-dns
      port: 53
      targetPort: 53
      protocol: UDP
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-tcp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
  type: LoadBalancer
  ports:
    - name: svc-53-tcp-dns
      port: 53
      targetPort: 53
      protocol: TCP

These are the services for the UI and for DNS. Of note, we are using the same laod balancer IP for the TCP and UDP DNS services. I used load balancer for the web UI as well (instead of using ClusterIP and setting up an ingress – maybe that will bite me later).

With this manifest, you can “kubectl apply -f manifest.yaml” and then look for all three of the pods to start up. You should be able to do nslookup/dig commands using the IP of the service as the server to verify that DNS is working, and you can use the IP for the pihole-web-svc service with a path of /admin/ (e.g. http://10.11.12.203/admin/). Use the password you defined in the manifest, to log in and see operation of the Ad Blocker.

Keeping The PI-Hole Pods In Sync

As mentioned in the article, we have three PI-Hole pods (one primary, two secondary), but need to keep the database in sync. To do this, Orbital Sync is used to backup the primary pod’s database, and then restore it to the secondary pods’ databases. Here is the orbital-sync.yaml manifest:

kind: ConfigMap
metadata:
name: orbital-sync-config
namespace: pihole
data:
PRIMARY_HOST_BASE_URL: “http://pihole-0.pihole.pihole.svc.cluster.local”
SECONDARY_HOST_1_BASE_URL: “http://pihole-1.pihole.pihole.svc.cluster.local”
SECONDARY_HOST_2_BASE_URL: “http://pihole-2.pihole.pihole.svc.cluster.local”
INTERVAL_MINUTES: “1”

—

apiVersion: apps/v1
kind: Deployment
metadata:
name: orbital-sync
namespace: pihole
spec:
selector:
matchLabels:
app: orbital-sync
template:
metadata:
labels:
app: orbital-sync
spec:
containers:
– name: orbital-sync
image: mattwebbio/orbital-sync:latest
envFrom:
– configMapRef:
name: orbital-sync-config
env:
– name: “PRIMARY_HOST_PASSWORD”
valueFrom:
secretKeyRef:
name: pihole-password
key: WEBPASSWORD
– name: “SECONDARY_HOST_1_PASSWORD”
valueFrom:
secretKeyRef:
name: pihole-password
key: WEBPASSWORD
– name: “SECONDARY_HOST_2_PASSWORD”
valueFrom:
secretKeyRef:
name: pihole-password
key: WEBPASSWORD

It runs every minute, and uses the secret that was created with the password to access PI-Hole. You can look at the orbital sync pod log to see that it is backing up and restoring the database among the PI-Holes.

Finishing Touches

Under the UI’s local DNS entries section, I manually entered the hostname (with a .home suffix) and IP address for each of my devices on the local network, so that I can access them by :”name.home”.

I did not setup DHCP on PI-Hole, as I used my router’s DHCP configuration.

To use the PI-Hole as the DNS server for all systems in your network, you can specify the IP of the PI-Hole on each host as the only DNS server. If you specify more than one DNS server, based on your OS, it may use the other server(s) at times and bypass the ad-blocking.

For me, I have all my hosts using the router as the primary DNS server. The router is configured to use the PI-Hole as the primary server, and then a public server as the secondary server. Normally, requests would always go to the Pi-Hole, unless for some reason it was down. This was advantageous for two reasons. First, when I had my standalone PI-Hole, if it crashed, there still was DNS resolution. Second, it made it easy to switch from the standalone PI-Hole to the Kubernetes one, by just changing the router configuration.

The only odd thing with this setup, is that when I use my laptop away from the network, my router’s IP is (obviously) not available. I’ve been getting around this, by using the “Location” feature of the MacOS, to setup the “Home” location to use my router’s IP for DNS, and to use a public DNS server for the “Roaming” location.

I guess I could setup so that the ports used for DNS on my domain name (which points to my router using Dynamic DNS), would port forward to the PI-Hole IP, but I didn’t want to expose that to the Internet.

Updating/Upgrading PI-Hole

From the initial version 2024.05.0, I had updated to 2024.07.0. For this update, I did the following…

First, I went to the PI-Hole release page to see what the latest version was. Then, I updated the manifest.yaml to call out the new version (e.g. 2024.07.0) and then deleted the pods, one at a time, so that they would load the newer image version. It worked rather well.

However, recently, I tried to update to the latest 2025.03.0, and found out on the Pi-Hole GitHub page, that Pi-Hole redesigned the app and there were many changes to config variables. Needless to say, I didn’t even look closely at the release notes, and just tried to do the same update method as before. It didn’t work, with pods in crash loops.

Needless to say, I lost all my local DNS settings, and custom blacklist/whitelist entries. It was a learning moment (disaster). I finally got it to work, and will detail the steps here, showing changes I did to the YAML files (all in ~/workspace/kubernetes/pi-hole).

Some steps may not be needed, but this is roughly what I did, taking baby steps to get things working (since everything was broken). I suspect, you may be able to update and then restart a pod at a time, letting it update the database (I think it may), and have not lost anything.

I also think, you probably could shut down everything, and then remove the UUID from the claimref for the PVs, so that when new pods start up, they will reuse the PVs. I ended up with multiple PVs and didn’t realize that I could have possibly reused them. In any case, backup your PVs, before doing this update (kicking myself for not doing that basic step).

I shutdown Orbital Sync, with a “kubectl delete -f orbital-sync.yaml” so there was no syncing going on, and the Pi-Hole with “kubectl delete -f manifest.yaml”.

Here is the new Pi-Hole manifest.yaml (shown in chunks), highlighting changed lines:

apiVersion: v1
kind: Namespace
metadata:
  name: pihole

---

No changes to the above.

apiVersion: v1
kind: ConfigMap
metadata:
name: pihole-configmap
namespace: pihole
data:
TZ: “America/New_York”
FTLCONF_dns_upstreams: “208.67.220.220;208.67.222.222”
FTLCONF_dns_listeningMode: “all”
—

The PIHOLE_DNS_ was renamed to FTLCONF_dns_upstreams. I also added the listening mode to all. The documentation mentions that if running in Docker’s bridge mode to set this to “all”. I’m guessing it is needed for Kubernetes as well (maybe not?).


apiVersion: v1
kind: Secret
metadata:
name: pihole-password
namespace: pihole
type: Opaque
data:
FTLCONF_webserver_api_password: <PUT_BASE64_PASSWORD_HERE> # Base64 encoded

---

The WEBPASSWORD config has been renamed to FTLCONF_webserver_api_password. You can use the same value (base64 encoded) as was done before.


apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: pihole
  namespace: pihole
spec:
  selector:
    matchLabels:
      app: pihole
  serviceName: pihole
  replicas: 3
  template:
    metadata:
      labels:
        app: pihole
    spec:
      containers:
        - name: pihole
          image: pihole/pihole:2025.03.0
          securityContext:
            capabilities:
              add: ["SYS_TIME"]
          envFrom:
            - configMapRef:
                name: pihole-configmap
            - secretRef:
                name: pihole-password
          ports:
            - name: svc-80-tcp-web
              containerPort: 80
              protocol: TCP
            - name: svc-53-udp-dns
              containerPort: 53
              protocol: UDP
            - name: svc-53-tcp-dns
              containerPort: 53
              protocol: TCP
          livenessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 10
          volumeMounts:
            - name: pihole-etc-pihole
              mountPath: /etc/pihole
            - name: pihole-etc-dnsmasq
              mountPath: /etc/dnsmasq.d
  volumeClaimTemplates:
    - metadata:
        name: pihole-etc-pihole
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi
    - metadata:
        name: pihole-etc-dnsmasq
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi

---

Besides updating the version of Pi-Hole, I added the SYS_TIME capability to prevent a warning. There is another, SYS_NICE, but I didn’t touch that, as I figured that the pod is only running Pi-Hole, so we don’t have to reduce the CPU allowed.

Another important thing was the liveness and readiness checks where the old code would check that the web interface was accessible, by accessing the login page at /admin/login.php. This file has been changed to login.lp, so I changed the path to /admin/login and let the web server resolve to the correct file. Without this change, the pod would fail the liveness and readiness checks and fail to come up.

apiVersion: v1
kind: Service
metadata:
  name: pihole
  namespace: pihole
  labels:
    app: pihole
spec:
  clusterIP: None
  selector:
    app: pihole

---

kind: Service
apiVersion: v1
metadata:
  name: pihole-web-svc
  namespace: pihole
spec:
  selector:
    app: pihole
    statefulset.kubernetes.io/pod-name: pihole-0
  type: LoadBalancer
  ports:
    - name: svc-80-tcp-web
      port: 80
      targetPort: 80
      protocol: TCP

---

kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-udp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
type: LoadBalancer
ports:
  - name: svc-53-udp-dns
    port: 53
    targetPort: 53
    protocol: UDP
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-tcp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
  type: LoadBalancer
  ports:
    - name: svc-53-tcp-dns
      port: 53
      targetPort: 53
      protocol: TCP

There were no other changes to the rest of the manifest.yaml file.

In the orbital-sync.yaml file, the three instances of WEBPASSWORD were changed to FTLCONF_webserver_api_password:

apiVersion: v1
kind: ConfigMap
metadata:
  name: orbital-sync-config
  namespace: pihole
data:
  PRIMARY_HOST_BASE_URL: "http://pihole-0.pihole.pihole.svc.cluster.local"
  SECONDARY_HOST_1_BASE_URL: "http://pihole-1.pihole.pihole.svc.cluster.local"
  SECONDARY_HOST_2_BASE_URL: "http://pihole-2.pihole.pihole.svc.cluster.local"
  INTERVAL_MINUTES: "1"

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orbital-sync
  namespace: pihole
spec:
  selector:
    matchLabels:
      app: orbital-sync
  template:
    metadata:
      labels:
        app: orbital-sync
  spec:
    containers:
    - name: orbital-sync
      image: mattwebbio/orbital-sync:latest
      envFrom:
        - configMapRef:
            name: orbital-sync-config
      env:
        - name: "PRIMARY_HOST_PASSWORD"
          valueFrom:
            secretKeyRef:
              name: pihole-password
              key: FTLCONF_webserver_api_password
        - name: "SECONDARY_HOST_1_PASSWORD"
          valueFrom:
            secretKeyRef:
              name: pihole-password
              key: FTLCONF_webserver_api_password
        - name: "SECONDARY_HOST_2_PASSWORD"
          valueFrom:
            secretKeyRef:
              name: pihole-password
              key: FTLCONF_webserver_api_password

With these changes, I applied manifest.yaml and checked the log to see that the pod is fully up. From a browser, I went to the public IP and verified that I could log into Pi-Hole. At this point you can add local DNS domains, and whitelist/blacklist entries.

I then modified manifest.yaml to increase the replicas to three, verified that they all are running, and then applied orbital sync and made sure it was running. Everything looked good. In my case, I did not reuse the PVs that had been created, so I deleted all detached PVs. There should be six PVs for the three running PI-Holes and their DNSMASQ volumes. Again, you could try attaching to the existing PVs vs creating new ones, by deleting the UUID in the claimref of the PVs that have been released, but remain, before starting up the Pi-Hole pods.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

June 3

Kubespray Add-Ons

In Part IV of the PI cluster series, I mention how to setup Kubespray to create a cluster. You can look there for how to setup your inventory, and the basic configuration settings for Kubespray. In that series, I mention about how to add more features, after the cluster is up. Some are pretty simple, and some require some manual steps to get everything set up.

However, you can also have Kubespray install some “add-on” components, as part of the cluster bring-up. In many cases, this makes the process more automated, and “easier”, but it does have some limitations.

First, you will be using the version and configuration that is defined in Kubespray’s Ansible templates and roles. Granted, you can always customize Kubespray, with the caveat of having to keep your changes up to date with upstream.

Second, removing the feature on a running cluster can be more difficult. You’ll have to manually delete all the resources (e.g. daemonsets, deployments, etc.), of which, some may be hard to identify (CRDs, RoleBindings, secrets, etc). Looking in the Kubespray templates may provide some insight into the resources that were created.

You may be able to find manifests for the feature and version from the feature’s repo, and pull them and use “kubectl delete” on the manifests to remove the feature. Just note, that there may be some differences, between what is in the repo manifests for a version, and what are in the manifests that Kubespray used. I haven’t tried it, but if there is a Helm based version of the feature that matches what Kubespray installed, you might be able to “helm install” the already installed feature, and then “helm delete”?

Kube VIP (Virtual IP and Service Load Balancing)

To add Kube-VIP as part of the Kubespray add-on, I did these steps, before creating the cluster.

First, I modified the inventory, so that etcd would run on each of my control-plane nodes (versus a mix of control-plane and worker nodes).

Second, in inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml, I enabled strict ARP, used IPVS (instead of iptables) for kube-proxy, and excluded my local network from kube-proxy (so that kube-proxy would not clear entries that were created by IPVS):

kube_proxy_strict_arp: true
kube_proxy_mode: ipvs
kube_proxy_exclude_cidrs: ["CIDR_FOR_MY_LOCAL_NETWORK",]

Third, I enabled kube-vip in inventory/mycluster/group_vars/k8s_cluster/addons.yml. I turned on ARP (vs BGP), and setup to do VIP for control plane and specified the API to use. I also selected to do load balancing of that VIP. I did not enable load-balancing for services, but that is an option too:

kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: VIP_ON_MY_NETWORK
loadbalancer_apiserver:
 address: "{{ kube_vip_address }}"
 port: 6443
kube_vip_lb_enable: true

# kube_vip_services_enabled: false
# kube_vip_enableServicesElection: true

I had tried this out, but found that the kube-vip container was showing connection refused and permission problems, so leader election was not working for the virtual IP chosen.

I finally found a bug report on the issue when using Kubernetes 1.29 with kube-vip. Essentially, when the first control plane node is starting up, the admin.conf file used for kubectl commands, does not have the permissions needed for kube-vip at that point in the process. The kube-vip team needs to create their own config file for kubectl. In the meantime, the bug report is trying a work-around fix in Kubespray, by switching to the super-admin.conf file, which will have the needed permissions at that point in time. However, the patch they have does not work. I did more hacking to it, and have this change, which works:

diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index f7b04a624..b5acdac8c 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,10 @@
- kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
- kube_vip_arp_enabled

+- name: Kube-vip | Check if first control plane
+ set_fact:
+ is_first_control_plane: "{{ inventory_hostname == groups['kube_control_plane'] | first }}"
+
- name: Kube-vip | Write static pod
template:
src: manifests/kube-vip.manifest.j2
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..7b59bca4c 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
hostNetwork: true
volumes:
- hostPath:
- path: /etc/kubernetes/admin.conf
+ path: /etc/kubernetes/{% if is_first_control_plane %}super-{% endif %}admin.conf
name: kubeconfig
status: {}

UPDATE: There is a fix that is in progress, which is a streamlined version of my change. Once that is merged, no patch will be needed.

With this change to Kubespray, I did a cluster create:

cd ~/workspace/kubernetes/picluster
poetry shell
cd ../kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -vvv --private-key=~/.ssh/id_ed25519 cluster.yml

Everything was up and running, but kubectl commands were failing on my Mac, because the ~/.kube/config file uses the FQDN https://lb-apiserver.kubernetes.local:6443 for the server, and there is no DNS info on my Mac for this host name (it does work on the nodes, however). The simple fix was to repace the FQDN with the IP address selected for the VIP.

Now, all requests to that IP are redirected to the node that is currently running the API server. If the node is not available, IPVS will redirect to another control plane node.

Update 01/07/2025: The fix has been released and no patching is needed. However, I did find one problem with kube-vip. If a node is rebooted (intentionally or after a power outage or crash), the node will fail to become “Ready”. The issue is that the node will try to register with the cluster, but is using the lb-apiserver.kubernetes.local name, and because the Kubernetes DNS server is not available on the node yet, it cannot resolve that name to the VIP that is configured.

The solution is to create a mapping of VIP to lb-apiserver.kubernetes.local in /etc/hosts. However, this file is automatically created (and recreated on reboot), so the mapping needs to be added to the template file /etc/cloud/templates/hosts.debian.tmpl. This should be done on each node, and ideally, one probably wants to create a playbook to automatically do this on every node in the inventory. I haven’t done it yet, so it is an exercise for the reader. 🙂

MetalLB Load Balancer

Instead of setting this up after the cluster was created, you can opt to let Kubespray do this as well. In the inventory/mycluster/group_vars/k8s_cluster/addons.yml, I did these changes:

metallb_enabled: true
metallb_speaker_enabled: "{{ metallb_enabled }}"
metallb_namespace: "metallb-system"


metallb_protocol: "layer2"
metallb_config:

 address_pools:
 primary:
 ip_range:
 - FIRST_IP_IN_RANGE-LAST_IP_IN_RANGE
 auto_assign: true
 layer2:
 - primary

Besides enabling the feature, I made sure that it was using layer two vs layer three, and under the config, setup an address pool with the range of IPs on my local network that I wanted to use for load balanced IPs. You can specify as a CIDR, if desired.

Now, when the cluster is created with Kubespray, MetalLB will be set up and you can change pods/services to use the networking type “LoadBalancer” and an IP from the pool will be assigned.

As mentioned in the disclaimer above, with the version of Kubespray I have, it installs MetalLB 0.13.9. I could have overridden the ‘metallb_version’ to a newer version, like ‘v0.14.5’, but the templates for MetalLB in Kubespray are using the older v0.11.0 kubebuilder image in several places. To get the same versioning as used when installing MetalLB via Helm, I would have to modify the templates to specify v0.14.0. I did see other configuration differences with the CRDs used in the Helm version, like setting the tls_min_version argument and not setting some priority nor priorityClassName configurations.

NGINX Ingress

This one is pretty easy to enable, by changing this setting in inventory/mycluster/group_vars/k8s_cluster/addons.yml:

ingress_nginx_enabled: true

When the cluster comes up, there will be an ingress daemonset, which created ingress controller pods on each node, and a NGINX ingress service with an IP from the MetalLB address pool.

There are example YAML files in the MetalLB/NGINX Ingress post, that will allow you to create pods and services, and an ingress resource that allows access via path prefixes.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

June 1

High Availability?

OK, so I have a cluster with three control plane nodes and four worker nodes (currently). However, if I shutdown the control plane node that is hosting the API server, I lose API access. 🙁

I’ve been digging around and it looks like kube-vip would be a good solution, as it allows me to create a virtual IP for the API server, and then does load balancing and leader election between the control plane nodes so that the failure of the node providing the API can switch to another control plane node. In addition, kube-vip can do load balancing between services (I’m not sure if that makes metalLB redundant).

Before installing kube-vip, I needed to change the cluster configuration. I changed the inventory, so that etcd is running ONLY on the control-plane nodes (and not a mix of control plane and worker nodes).

Next, I made these changes to inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml:

kube_proxy_mode: ipvs
kube_proxy_strict_arp: true
kube_proxy_exclude_cidrs: ["CIDR_OF_LOCAL_NETWORK",]

This had kube-proxy also using IPVS (versus iptables), and running in strict ARP mode (needed for kube-vip). Lastly, to prevent kube-proxy from clearing IPVS settings made by kube-vip, the local network IPs must be excluded. With those changes, I re-created a cluster, and was ready to install kube-vip…

There was Medium article by Chris Kirby to use a Helm install of kube-vip for HA. It used an older version of kube-vip (0.6.4) and used value.yaml settings for K3s. I added the Helm repo for kube-vip, and pulled the values.yaml file to be able to customize it:

mkdir ~/workspace/kubernetes/kube-vip
cd ~/workspace/kubernetes/kube-vip
helm repo add kube-vip https://kube-vip.github.io/helm-charts
helm repo update

wget https://raw.githubusercontent.com/kube-vip/helm-charts/main/charts/kube-vip/values.yaml

Here are the changes I made to the values.yaml, saving it as values-revised.yaml:

6c6
< pullPolicy: IfNotPresent
---
> pullPolicy: Always
8c8
< # tag: "v0.7.0"
---
> tag: "v0.8.0"
11c11
< address: ""
---
> address: "VIP_ON_LOCAL_NETWORK"
20c20
< cp_enable: "false"
---
> cp_enable: "true"
22,23c22,24
< svc_election: "false"
< vip_leaderelection: "false"
---
> svc_election: "true"
> vip_leaderelection: "true"
> vip_leaseduration: "5"
61c62
< name: ""
---
> name: "kube-vip"
86c87,88
< nodeSelector: {}
---
> nodeSelector:
> node-role.kubernetes.io/control-plane: ""
91a94,97
> - effect: NoExecute
> key: node-role.kubernetes.io/control-plane
> operator: Exists
>
93,101c99,104
< # nodeAffinity:
< # requiredDuringSchedulingIgnoredDuringExecution:
< # nodeSelectorTerms:
< # - matchExpressions:
< # - key: node-role.kubernetes.io/master
< # operator: Exists
< # - matchExpressions:
< # - key: node-role.kubernetes.io/control-plane
< # operator: Exists
---
> nodeAffinity:
> requiredDuringSchedulingIgnoredDuringExecution:
> nodeSelectorTerms:
> - matchExpressions:
> - key: node-role.kubernetes.io/control-plane
> operator: Exists

Besides using a newer kube-vip version, this enabled load balancing for control plane nodes and services, selects nodes that have the control-plane attribute (but not a value, like the article), and sets the node affinity.

With this custom values file, I could do the install:

helm install my-kube-vip kube-vip/kube-vip -n kube-system -f values-revised.yaml

With this, all the kube-vip pods were up, and the daemonset showed three desired, current, and ready. However, when I changed the server IP to my VIP in ~/.kube/config and tried kubectl commands, they failed saying that there was a x509 certificate for each of the control plane nodes, and a cluster IP, but not for the VIP I’m using.

This can be fixed by re-generating the certificates on every control plane node:

sudo su
cd
kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' --insecure-skip-tls-verify > kubeadm.yaml

mv /etc/kubernetes/pki/apiserver.{crt,key} ~
kubeadm init phase certs apiserver --config kubeadm.yaml

In the output, I saw the IPs of the control plane nodes AND the VIP I defined. Next, the kube-apiserver container needs to be stopped and removed, so that a new one is started.

crictl ps | grep kube-apiserver
crictl stop <ID-of-apiserver>
crictl rm <ID-of-apiserver>

Now, kubectl commands using the VIP will be redirected to the control plane node running the API server, and if that node is unavailable, the requests will be redirected to another control plane node. You can see that by doing arping of the VIP and, when the leadership changes, the MAC displayed will change.

Kind of involved, but this works!

One Slight Complication…

I did have some problems, when playing with HA for the API. I had rebooted the control plane node that was actively providing the API. Kube-vip did its job, and IPVS redirected API requests to another control plane node that was “elected” as the new leader. All good so far.

However, when that control plane node came back up, it would appear in the “kubectl get node” output, but showed as “NotReady”, and it never seemed to become ready. It appeared that the network was not ready, and the calico-node pod was showing an error. I played around a bit, but couldn’t seem to clear the error.

One thing I did was a Kubespray upgrade-cluster.yml with the –limit argument, specifying the node and one of the other control plane nodes (so that control plane “facts” were specified). The kube-vip pod for the node was still failing with a connection refused error. On the node, I stopped/removed the kube-apiserver container and then kube-vip container, and then kube-vip no longer had any errors.

The only thing was that ipvsadm on the node, did not show a load balancing entry for the VIP, and the other two control plane nodes only had their IPs in the load balancing entry for the VIP. I didn’t try rebooting another control-plane node.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off