Kubernetes The Harder Way Explorations
I’m trying to bring up a test Kubernetes cluster (just for learning), by using Kubernetes The Hard Way steps, with one big twist… I want to do this on my M2 MacBook (arm64 based).
From the GitHub page, this is what they say about the steps for doing this..
This tutorial requires four (4) ARM64 based virtual or physical machines connected to the same network. While ARM64 based machines are used for the tutorial, the lessons learned can be applied to other platforms.
-
Prerequisites (1)
-
Setting up the Jumpbox (2)
-
Provisioning Compute Resources (3)
-
Provisioning the CA and Generating TLS Certificates (4)
-
Generating Kubernetes Configuration Files for Authentication (5)
-
Generating the Data Encryption Config and Key (6)
-
Bootstrapping the etcd Cluster (7)
-
Bootstrapping the Kubernetes Control Plane (8)
-
Bootstrapping the Kubernetes Worker Nodes (9)
-
Configuring kubectl for Remote Access (10)
-
Provisioning Pod Network Routes (11)
-
Smoke Test (12)
-
Cleaning Up (13)
Try #1: Docker containers (failed)
Initially, I thought I would just use Docker on the Mac, to create the four nodes used for this. It started out pretty well, provisioning, creating certs, creating config files and copying, and starting etcd.
I even optimized the process a bit with:
- Dockerfile to build nodes with the needed packages.
- Script to generate SSH keys for all nodes.
- Script to run container for each node with IP addresses, and defining /etc/hosts for all nodes with FQDNs and IPs.
- Script to distribute SSH keys and known hosts info
- Several scripts that just contain the commands needed for each step, and any ssh commands to move from node to node.
My first problem occurred when trying to bring up the control plane (step 8), starting up services. The issue is that my nodes (I tried debian:bookworm and ubuntu:noble bases) did NOT have systemd running. MY guess is that it was because the Docker containers are using the same kernel as my host, and that does not use systemd.
Initially, I tackled this by using a systemV init script template and then filled it out with info from the systemd .service files needed. I placed arguments in a /etc/default/SERVICE_NAME file and would source that and use a variable to add it to the script. Services were coming up for the control plane node, and it was looking good.
When I got to the next step (9), bringing up the first worker node. The systemd init script had two steps for starting the service:
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
This was the second problem. I found an old GitHub repo to convert systemd init scripts to systemV. Unfortunately, it was very old, and created for Python2. I ran py2to3 on it, to use with python3.13.1 that I’m using and made a few changed (some import changes, print command syntax, and some mixed tab/space and indenting issues). The script ran and created a file that “looked” OK, so I ran it on the systemd unit files that I had for the workers.
However, there were two concerns with the results. One, was that for a systemd script with arguments, that lines were converted from this:
ExecStart=/usr/local/bin/kubelet \
--config=/var/lib/kubelet/kubelet-config.yaml \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--register-node=true \
--v=2
to this:
start_daemon -p $PIDFILE /usr/local/bin/kubelet \
start_daemon -p $PIDFILE -config=/var/lib/kubelet/kubelet-config.yaml \
start_daemon -p $PIDFILE -kubeconfig=/var/lib/kubelet/kubeconfig \
start_daemon -p $PIDFILE -register-node=true \
start_daemon -p $PIDFILE -v=2
Now, I don’t know much about systemV init scripts, but I’m wondering if this conversion is correct. With the ones I did manually, I had the service name and then ${OPTIONS} with all the args. I figured I’d just try it and see if it works correctly.
The other concern was that the first service I need to apply, has an ExecStartPre and ExecStart line:
ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd
It was converted to this:
start_daemon /sbin/modprobe overlay
...
start_daemon -p $PIDFILE /bin/containerd
I was eager to see if that would work correctly, so I gave it a try. This is when I hit the third and fatal problem. There was no modprobe command. I installed kmod, so that the command worked, but found that there was no overlay module. The lsmode did not list the overlay module.
My thought here was that because these Docker containers are using the MacOS kernel, this module was not available. I may be wrong, but I think this method is sunk.
Try #2: Virtualization (failed)
It looks like there are a lot of choices here, Parallels (commercial), VirtualBox, QEMU, etc. I found a link about ways to run virtualization on a arm64 based Mac. I have VirtualBox on my Mac already, but I was intrigued with UTM, which is a virtualization/emulation app for iOS and MacOS, and is based on QEMU.
I created a DockerHub repo, that has supporting scripts to make setting up the environment. For this attempt, I used tag “initial-try” in the repo.
Prerequisites
This is assuming you are on a arm64 based Mac (M1+), have a DockerHub account for building/pushing images, and have downloaded Ubuntu 24.04 server ISO (or whatever you want to use, granted there may be some modifications needed to the steps).
Prep Work
I installed UTM following their instructions, and then created a new virtualization machine, using the Ubuntu 24.04 ISO image I had laying around. I did the default 4 GB ram, 64 GB disk, and added a directory on my host use use as a shared area, in case I wanted to transfer files to/from the host. You can skip this, if desired.
Before starting the VM, I opened the settings, went to network settings(which are set to shared network), and clicked on the “Show Advanced Settings” checkbox.
I told it to use the network 10.0.0.0/24, with DHCP using IPs form 10.0.0.1 to 10.0.0.100. You could leave it as-is, if desired. I just wanted an easy to type IP.
I ran the Ubuntu install process (selecting to use the Ubuntu updates that are available). I named the host “utm” and setup the disk with LVM (the default). As part of the process, I selected to install openssh and import my DockerHub public key, so I can SSH in without a password, and to install docker.
Upon completion, I stopped the VM, and went to UTM settings for the VM and cleared the ISO image from the CD drive, so that it would not boot again to the installer CD. Restarted the VM and verified that I could log in and SSH in to the IP 10.0.0.3. I ran “lsmod” and could see the “overlay” module, so that was a good sign.
To complete the setup of the shared area, I did the following commands:
sudo mkdir /mnt/utm
As root, edit /etc/fstab and add:
# Share area
share /mnt/utm 9p trans=virtio,version=9p2000.L,rw,_netdev,nofail,auto 0 0
You can update the mount with:
sudo systemctl daemon-reload
sudo mount -a
You can now see the shared area files under /mnt/utm. Note the ownership and group of the files, which will match that of the Mac, which is not what you want likely. To make them match this VM (but still be the same on the host), we’ll make another mount and make sure you have bindfs installed:
mkdir ~/share
sudo apt-get install bindfs -y
Add the following to /etc/fstab, substituting the MacOs owner (UID) and group (GID) values that you noted above, and your username for the account you are on:
# bindfs mount to remap UID/GID
/mnt/utm /home/USERNAME/share fuse.bindfs map=UID/1000:@GID/@1000,x-systemd.requires=/mnt/utm,_netdev,nofail,auto 0 0
For, me, my UID was already 1000 (I had changed it on my Mac so that it was the same as Linux systems I have, normally it is like 501 or 502), and my GID was 20. Update the mount again:
sudo systemctl daemon-reload
sudo mount -a
You should now see the files in ~/share with the user and groups matching your VM. One of the files in this shared area is a file called “pcm”, the username I’m using, with:
pcm ALL=(ALL) NOPASSWD: ALL
I did “sudo cp share/pcm /etc/sudoers.d/” so that from now on, I don’t need a password for sudo commands. If you want, you can just create a file named with your username (and has your username as the first word) and place it in /etc/sudoers.d/.
Next, I want to setup Docker, so that I can run w/o sudo. This is assuming you already have an account on DockerHub, and have setup a passkey for login via the command line. I used these commands to setup the docker user:
sudo groupadd docker
sudo usermod -aG docker $USER
sudo gpasswd -a $USER docker
newgrp docker
I had to reboot (they say to log out and back in, but that did not work for me).
Lastly, I installed tools I wanted for development (emacs, ripgrep).
Trying The Hard Way (Again)…
I’m now ready to give this a go. to review, here are the steps they have…
-
Prerequisites (1)
-
Setting up the Jumpbox (2)
-
Provisioning Compute Resources (3)
-
Provisioning the CA and Generating TLS Certificates (4)
-
Generating Kubernetes Configuration Files for Authentication (5)
-
Generating the Data Encryption Config and Key (6)
-
Bootstrapping the etcd Cluster (7)
-
Bootstrapping the Kubernetes Control Plane (8)
-
Bootstrapping the Kubernetes Worker Nodes (9)
-
Configuring kubectl for Remote Access (10)
-
Provisioning Pod Network Routes (11)
-
Smoke Test (12)
-
Cleaning Up (13)
On the VM, pull my repo:
git clone https://github.com/pmichali/k8s-the-hard-way-on-mac.git
cd k8s-the-hard-way-on-mac
Before starting, set your Docker user ID, so it can be referenced in the scripts:
export DOCKER_ID=YOUR_DOCKER_USERNAME
First off, I wanted to create all the ssh keys for each node, build a docker image to use for the nodes, and create a network 10.10.10.0/24:
./prepare.bash
./build.bash
All four of the nodes are created as docker containers with:
./run.bash jumpbox
./run.bash server
./run.bash node-0
./run.bash node-1
Now we have the four machines running as Docker containers for step 1 (Prerequisites) of the process. The architecture is aarch64, when checking with “uname -mov”. Before continuing on, we’ll set the known hosts and SSH keys on all the nodes so that we can SSH without passwords:
while read IP FQDN HOST SUBNET; do
echo "Node ${HOST}" ;
docker exec ${HOST} /bin/bash -c ./set-known-hosts.bash
done < machines.txt
We’ll also copy over scripts to various nodes, so that we can run them. These scripts are the commands mentioned in the various steps of “Kubernetes The Hard Way”:
docker cp CA-certs.bash jumpbox:/root/
docker cp distribute-certs.bash jumpbox:/root/
docker cp kubeconfig-create.bash jumpbox:/root/
docker cp distribute-kubeconfigs.bash jumpbox:/root/
docker cp encryption.bash jumpbox:/root/
docker cp etcd-files.bash jumpbox:/root/
docker cp etcd-config.bash server:/root/
For step 2 (Setting up the Jumpbox), we’ll access the jumpbox container, clone the kelseyhightower/kubernetes-the-hard-way repo, and install kubectl using the following command:
docker exec jumpbox /bin/bash -c ./jumpbox-install.bash
Because of the earlier steps we did, to setup known-hosts and authorized-keys on each node, and configure /etc/hosts, there is nothing to do for step 3 (Provisioning Compute Resources) of the process. You can verify that you can ssh into any node from the jumpbox, by accessing the node with the following command, and trying to SSH to other nodes by name (e.g. ssh node-1):
docker exec -it jumpbox /bin/bash
For step 4 (Provisioning the CA and Generating TLS Certificates), run these scripts from the jumpbox:
./CA-certs.bash
./distribute-certs.bash
For step 5(Generating Kubernetes Configuration Files for Authentication), run these scripts from jumpbox:
./kubeconfig-create.bash
./distribute-kubeconfigs.bash
For step 6 (Generating the Data Encryption Config and Key), run this script on jumpbox:
./encryption.bash
For step 7 (Bootstrapping the etcd Cluster), run this script on jumpbox:
./etcd-files.bash
Then, from the jumpbox, ssh into the server and run the script to startup the service for etcd:
ssh server
./etcd-config.bash
This FAILED as, for some reason, systemd is not running on any of the docker containers, even though it is on the host VM.
I did a little bit of research, and I see that Docker does not normally run systemd, as the expectation is that a container will be running one service (not multi-service, like on the host). I see some “potential” solutions…
One is to run the container with a “systemd” replacement as the running process (command). This would handle doing the systemctl start/stop operations, and it reads and processes the corresponding systemd init scripts for services that are started. It’s detailed here, and seems like maybe the most straight forward option. I haven’t tried this, but I think it would have to re done from the UTM virtual machine so that we have the overlay module that is also needed.
A second is to run systemd in the container. It looks like that requires a bunch of things, installing systemd, using /sbin/init as the command to run, volume mounting several paths to the host (so I think would require running from a VM still), and running in privileged mode. Several posts indicate different methods. I haven’t tried this either.
A third way would be a more heavyweight solution of running VMs for each of the nodes, so that systemd is running and the overlay module is present. Fortunately, I think I have found another way that may work…
Try 3: Podman
When looking for solutions for how to run systemd in a Docker container, I saw mention of how podman has systemd running by default, so I wanted to give it a try. Many of the commands are the same as Docker, so it would be easy to setup.
Before doing this in the UTM VM that I had, I decided to just try it from my Mac host. I installed the CLI version of podman on my Mac from https://podman.io/. Next, I updated the files in my github repo to refer to podman, and to alter the container image specification that I was using.
With these changes, I was ready to give another try…
Kubernetes The Hard Way (yet again)
In the GitHub repo, there is a machines.txt file with a list of all the nodes with IP, FQDN, node name, and pod subnet (if applicable). This is read by the scripts to configure nodes as needed through the process. Let’s get started with the steps for the Kubernetes The Hard Way tutorial (listed above). They will be very similar to try #2.
For step 1 (Prerequisites), store your DockerHub user id in an environment variable for use by some of the scripts:
export DOCKER_ID=YOUR_DOCKER_ID
Create and startup the podman machine with the following script:
./init.bash
Note: If you happen to be running Docker Desktop, it will indicate that you can set the DOCKER_HOST environment variable, so that podman can access its machine. In that case, just copy and paste the export command shown.
Create the ssh keys for all the nodes, build the image to be used by nodes (along with all the SSH keys generated), and create the network:
./prepare.bash
./build.bash
There will now be a container image named localhost/${DOCKER_ID}/node:v0.1.0 in the local registry. The four containers can now be created, using the container image:
./run.bash jumpbox
./run.bash server
./run.bash node-0
./run.bash node-1
When started, the container will get an IP, name, and FQDN from the machines.txt file, and will rename the public and private keys for the specific node to id_rsa.pub and id_rsa.
There are podman commands to see what has been created so far:
podman network ls
podman network inspect k8snet
podman ps -a
At this point, we can copy the scripts I created to the various nodes:
podman cp CA-certs.bash jumpbox:/root/
podman cp distribute-certs.bash jumpbox:/root/
podman cp kubeconfig-create.bash jumpbox:/root/
podman cp distribute-kubeconfigs.bash jumpbox:/root/
podman cp encryption.bash jumpbox:/root/
podman cp etcd-files.bash jumpbox:/root/
podman cp etcd-config.bash server:/root/
These scripts are just the commands listed in the tutorial, so that you can run the script, instead of copy and pasting all the commands in the steps.
For step 2 (Setting up the Jumpbox), we’ll access the jumpbox container, clone the kelseyhightower/kubernetes-the-hard-way repo, and install kubectl using the following command:
podman exec jumpbox /bin/bash -c ./jumpbox-install.bash
Because of the earlier steps we did, to setup known-hosts and authorized-keys on each node, and configure /etc/hosts, there is nothing to do for step 3 (Provisioning Compute Resources) of the process. You can verify that you can ssh into any node from the jumpbox, by accessing the node with the following command, and trying to SSH to other nodes by name (e.g. ssh node-1):
podman exec -it jumpbox /bin/bash
For step 4 (Provisioning the CA and Generating TLS Certificates), run these scripts from the jumpbox:
./CA-certs.bash
./distribute-certs.bash
For step 5(Generating Kubernetes Configuration Files for Authentication), run these scripts from jumpbox:
./kubeconfig-create.bash
./distribute-kubeconfigs.bash
For step 6 (Generating the Data Encryption Config and Key), run this script on jumpbox:
./encryption.bash
For step 7 (Bootstrapping the etcd Cluster), run this script on jumpbox:
./etcd-files.bash
Then, from the jumpbox, ssh into the server and run the script to startup the service for etcd:
ssh server
./etcd-config.bash