December 30

Part IV: Preparing to Create Kubernetes Cluster

There are a bunch of ways to create a cluster (microk8s, k3s, KOps, kubeadm, kubespray,…) and after looking at a bunch of them, I decided to use kubespray (ref: https://kubespray.io/#/). I’m going to use my MacBook to drive all this, so I setup an environment on that machine with all the tools I needed (and more).

I created a directory ~/workspace/picluster to hold everything, and created a git repo so that I have a version controlled area to record all the changes and customizations I’m going to make. For the Mac, I used brew to install python3.11 (ref: https://www.python.org/downloads/) and poetry (ref: https://python-poetry.org/docs/) to create a virtual environment for the tools I use and to fix versions. Currently my poetry environment has:

[tool.poetry.dependencies]
python = "^3.11"
ansible = "7.6.0"
argcomplete = "^3.1.1"
"ruamel.yaml" = "^0.17.32"
pbr = "^5.11.1"
netaddr = "^0.8.0"
jmespath = "^1.0.1"

python is obvious and I’m fixing it to 3.11. ansible is the tool used to provision the nodes. argcomplete is optional, if you want to have command completion. ruamel-yaml is a YAM parser, pbr is used for python builds, netaddr is a network address manipulation lib for python, and jmespath is for JSON matching expression.

You can use any virtual environment tool you want to ensure that you have the desired dependencies. Under “poetry shell“, which creates a virtual environment I continues with the prep work for using kubespray. I installed helm, jq, and kubectl:

brew install helm
brew install jq
brew install wget
brew install kubectl

Note that, for kubectl, I really wanted v1.28, and specified by version (kubectl@1.28), however, when trying months later, that version appears to not be available, and now it will install 1.29).

I cloned the kubespray repo and then checked out the version I wanted.

git clone https://github.com/kubernetes-sigs/kubespray.git
git tag | sort -V --reverse
git checkout v2.23.1  # for example

The releases have tags, but you can chose to use any commit desired (or latest). Sometimes, there are newer versions used with commits after the release. For a specific commit, you can see what default Kubernetes version and Calico version are configured for the commit with:

grep -R "kube_version: "
grep -R "calico_version: "

You can override the values in inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml, but make sure the kubernetes and calico versions are compatible. You can check at this Tigera link to see the kubernetes versions supported for a Calico release. I have run into some issues when trying a Calico/Kubernetes pair that was not called out in the kubespray configuration (see side bar in Part V).

With the kubespray repo, I copied the sample inventory to the current area (so that the inventory for my cluster is separate from the kubespray repo):

mkdir inventory
cp -r kubespray/inventory/sample/ inventory/mycluster/

They have a script that can be used to create the inventory file for the cluster. You can obtain it with::

wget https://raw.githubusercontent.com/kubernetes-sigs/kubespray/master/contrib/inventory_builder/inventory.py

Next, you can create a basic inventory by using the following command, using IPs that you have defined for each of your nodes. If you have four, like me, you could use something like this:

CONFIG_FILE=inventory/mycluster/hosts.yaml python3 inventory.py 10.11.12.198 10.11.12.196 10.11.12.194 10.11.12.192

This creates an inventory file, which is very generic. To customize it for my use, I changed each of the “node#” host names to the names I used for my cluster:

sed -i.bak 's/node1/mycoolname/g' inventory/mycluster/hosts.yaml

I kept the grouping of which nodes were in the control plane, however, later on, I want to have three control plane nodes set up. The last thing I did, was to add the following clause to the end of the file so that proxy_env was defined, but empty (note that it is indented two and four spaces):

  vars:
    proxy_env: []

Here is a sample inventory:

all:
  hosts:
    cypher:
      ansible_host: 10.11.12.198
      ip: 10.11.12.198
      access_ip: 10.11.12.198
    lock:
      ansible_host: 10.11.12.196
      ip: 10.11.12.196
      access_ip: 10.11.12.196
    mouse:
      ansible_host: 10.11.12.194
      ip: 10.11.12.194
      access_ip: 10.11.12.194
    niobi:
      ansible_host: 10.11.12.192
      ip: 10.11.12.192
      access_ip: 10.11.12.192
  children:
    kube_control_plane:
      hosts:
        cypher:
        lock:
    kube_node:
      hosts:
        cypher:
        lock:
        mouse:
        niobi:
    etcd:
      hosts:
        cypher:
        lock:
        mouse:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
    calico_rr:
      hosts: {}
  vars:
    proxy_env: []

There are some configurations in the inventory files that need to be changed. This may involve changing existing settings or adding new ones. In inventory/mycluster/group_vars/k8s_cluster/addons.yml we need to enable helm:

helm_enabled: true

In inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml, set kubernetes version, timezone, DNS servers, pin Calico version, use Calico IPIP mode, increase logging level (if desired), use iptables, disable node local DNS (otherwise get error as kernel does not have dummy module). and disable pod security policy:

kube_version: v1.27.3

# Set timezone
ntp_timezone: America/New_York

# DNS Servers (OpenDNS - use whatever you want here)
upstream_dns_servers: [YOUR_ROUTER_IP]
nameservers: [208.67.222.222, 208.67.220.220]

# Pinning Calico version
calico_version: v3.25.2

# Use IPIP mode
calico_ipip_mode: 'Always'
calico_vxlan_mode: 'Never'
calico_network_backend: 'bird'

# Added debug
kube_log_level: 5

# Using iptables
kube_proxy_mode: iptables

# Must disable, as kernel on RPI does not have dummy module
enable_nodelocaldns: false

# Pod security policy (RBAC must be enabled either by having 'RBAC' in authorization_modes or kubeadm enabled)
podsecuritypolicy_enabled: false

BTW, if you want to run ansible commands on other systems in your network, you can edit inventory/mycluster/other_servers.yaml and add the host information there:

all:
  hosts:
    neo:
      ansible_host: 10.11.12.180
      ip: 10.11.12.180
      access_ip: 10.11.12.180
      ansible_port: 7666
    rabbit:
      ansible_host: 10.11.12.200
      ip: 10.11.12.200
      access_ip: 10.11.12.200
  vars:
    proxy_env: []

In this example, my_other_server is accessed on SSH port 7777, versus the default 22.

Kubespray uses ansible to communicate and provision each node, and ansible uses SSH. At this time, from your provisioning host, make sure that you can SSH into each node without a password. Each node should have your public SSH key in their ~/.ssh/authorized_keys file. You can use the ssh-copy-id command to setup the authorized_keys files on each node and the host, or just copy the public key.

Ansible has a bunch of “playbooks” that you can run, so I looked around on the internet, found a bunch and placed them into a sub-directory called playbooks. Now is a good time to do some more node configuration, and make sure that ansible, the inventory file, and SSH are all setup correctly. It is way easier than logging into each node and making changes. And yes, I suspect one could put all this into one huge ansible playbook, but I like to do things one at a time and check that they work.

When we run a playbook, we’ll provide our private key for SSH access, turn on the verbose (-v) flag), and sometimes ask for privilege execution. I’ll show the command for both a single node (substitute the hostname for HOSTNAME) that is in the inventory, and for all hosts in the inventory. When the playbook runs, it will display the steps being performed, and with -v flag you can see if things are changed or not. At the end, it will show a summary of the playbook run. For example, here is the output of a “ping” to every node in the cluster:

cd ~/workspace/picluster
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/ping.yaml --private-key=~/.ssh/id_ed25519

PLAY [Test Ping] **************************************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************************
ok: [lock]
ok: [cypher]
ok: [niobi]
ok: [mouse]

TASK [ping] *******************************************************************************************************************************************************************************
ok: [niobi]
ok: [mouse]
ok: [cypher]
ok: [lock]

PLAY RECAP ********************************************************************************************************************************************************************************
cypher                     : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
lock                       : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
mouse                      : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
niobi                      : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

You’ll want to check the recap to see if there are any failures, and if there are, check above for the step and item that failed.

For the following, you can set TARGET_HOST environment variable, with the single host name, and then run command for that system, or run on all hosts in inventory…

For the node preparation, the first item is a playbook to add your username to the sudo list, so that you don’t have to enter in a password, when running sudo commands:

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/passwordless_sudo.yaml -v --private-key=~/.ssh/id_ed25519 --ask-become-pass

All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/passwordless_sudo.yaml -v --private-key=~/.ssh/id_ed25519 --ask-become-pass

The passwordless_sudo.yaml contains (change USER to the username you are using):

- name: Make users passwordless for sudo in group sudo
  hosts: all
  become: yes
  vars:
    node_username: "{{ lookup('env','USER') }}"
  tasks:
    - name: Add user to sudoers
      copy:
        dest: "/etc/sudoers.d/{{ node_username }}"
        content: "{{ node_username }}  ALL=(ALL)  NOPASSWD: ALL"

You’ll have to provide your password, so that it can change the sudo permissions (hence the –ask-become-pass argument).

Next, you can setup for secure SSH by disabling root login and password based login:

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/ssh.yaml -v --private-key=~/.ssh/id_ed25519

All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/ssh.yaml -v --private-key=~/.ssh/id_ed25519

The ssh.yaml file has:

- name: Secure SSH
  hosts: all
  become: yes
  tasks:
    - name: Disable Password Authentication
      lineinfile:
        dest=/etc/ssh/sshd_config
        regexp='^PasswordAuthentication'
        line="PasswordAuthentication no"
        state=present
        backup=yes
      register: no_password
    - name: Disable Root Login
      lineinfile:
        dest=/etc/ssh/sshd_config
        regexp='^PermitRootLogin'
        line="PermitRootLogin no"
        state=present
        backup=yes
      register: no_root
    - name: Restart service
      ansible.builtin.systemd:
        state: restarted
        name: ssh
      when:
        - no_password.changed or no_root.changed

For the Raspberry PI, we want to configure the fully qualified domain name and hostname and update the hosts file. Note: I use <hostname>.home for the FQDN.

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/hostname.yaml -v --private-key=~/.ssh/id_ed25519
All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/hostname.yaml -v --private-key=~/.ssh/id_ed25519

The hostname.yaml file is:

- name: FQDN and hostname
  hosts: all
  become: true
  tasks:
    - name: Configure FQDN and hostname
      ansible.builtin.lineinfile:
        path: /boot/firmware/user-data
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^fqdn\: ', line: 'fqdn: {{ ansible_hostname }}.home' }
        - { regexp: '^hostname\:', line: 'hostname: {{ ansible_hostname }}' }
      register: hostname
    - name: Make sure hosts file updated
      ansible.builtin.lineinfile:
        path: /etc/hosts
        regexp: "^127.0.1.1"
        line: "127.0.1.1 {{ ansible_hostname }}.home {{ ansible_hostname }}"
      register: hosts
    - name: Reboot after change
      reboot:
      when:
        - hostname.changed or hosts.changed

I place a bunch of tools on the nodes, but before doing so, update the OS. I used a ansible role for doing reboots by pulling this down with the following command run from the ~/workspace/picluster/ area:

git clone https://github.com/ryandaniels/ansible-role-server-update-reboot.git roles/server-update-reboot

The syntax of this is a bit different for this command

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/os_update.yaml --extra-vars "inventory=all reboot_default=false proxy_env=[]" --private-key=~/.ssh/id_ed25519

All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/os_update.yaml --extra-vars "inventory=all reboot_default=false" --private-key=~/.ssh/id_ed25519

Granted, I’ve had times where updates required some prompting and I don’t think the script handled it. You can always log in to each node and do it manually, if desired. The os_update.yaml will update each system once at a time:

- hosts: '{{inventory}}'
  max_fail_percentage: 0
  serial: 1
  become: yes
  roles:
    - server-update-reboot

Tools can now be installed on nodes by using:

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/tools.yaml -v --private-key=~/.ssh/id_ed25519

All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/tools.yaml -v --private-key=~/.ssh/id_ed25519

Here is what the script installs. Feel free to add to the list, and remove emacs and ripgrep (tools I personally like):

- name: Install tools
hosts: all
vars:
host_username: "{{ lookup('env','USER') }}"
tasks:
- name: install tools
ansible.builtin.apt:
name: "{{item}}"
state: present
update_cache: yes
loop:
- emacs
- ripgrep
- build-essential
- make
- nfs-common
- open-iscsi
# - linux-modules-extra-raspi
become: yes
- name: Emacs config
copy:
src: "emacs-config.text"
dest: "/home/{{ host_username }}/.emacs"
- name: Git config
copy:
src: "git-config.text"
dest: "/home/{{ host_username }}/.gitconfig"

In the playbooks area, I have an emacs-config.text with:

(global-unset-key [(control z)])
(global-unset-key [(control x)(control z)])
(global-set-key [(control z)] 'undo)
(add-to-list 'load-path "~/.emacs.d/lisp")
(require 'package)
(add-to-list 'package-archives
  '("melpa" . "http://melpa.milkbox.net/packages/") t)
(setq frame-title-format (list '(buffer-file-name "%f" "%b")))
(setq frame-icon-title-format frame-title-format)
(setq inhibit-splash-screen t)
(global-set-key [f8] 'goto-line)
(global-set-key [?\C-v] 'yank)
(setq column-number-mode t)

And a git-config.text with:

[user]
	name = YOUR NAME
	email = YOUR_EMAIL@ADDRESS
[alias]
	qlog = log --graph --abbrev-commit --pretty=oneline
	flog = log --all --pretty=format:'%h %ad | %s%d' --graph --date=short
	clog = log --graph --pretty=\"tformat:%C(yellow)%h%Creset %Cgreen(%ar)%Creset %C(bold blue)<%an>%Creset %C(red)%d%Creset %s\"
	lg = log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %C(green)%an%Creset %Cgreen(%cr)%Creset' --abbrev-commit --date=relative
[gitreview]
	username = YOUR_USERNAME
[core]
	pager = "more"
	editor = "emacs"
[color]
  diff = auto
  status = auto
  branch = auto
  interactive = auto
  ui = true
  pager = true

UPDATE: With Ubuntu 22.04 it does not find the linux-modules-extra-raspi. If I try to install a specific instance, like linux-modules-extra-6.8.0-1018-raspi, it switches to linux-modules-6.8.0-1018-raspi and says it is already installed. Maybe this is no longer needed. Ignoring for now (worked without it).

After setting up the Kubernetes cluster, I realized that I wanted to include the kube-vip feature for the Kubespray created cluster. This allows us to have one (load balancer) IP for the cluster API and requests will get forwarded to the control plane node that is the active “leader”. If a control plane node fails, leadership will change to another control plane node, but the API IP will remain the same.

The issue I found, however, is that when Kubespray creates the cluster, the API will not use the IP I defined, but instead, will use a domain name. Specifically, lb-apiserver.kubernetes.local will be used. This works fine, but there are two problems.

This is used in the kube.config file that is used for kubectl commands. If I copy the config to my laptop under ~/.kube/config and then run kubectl commands, they fail, as that domain is unknown. I can work around that, by just replacing the domain name with the IP that I configured in setup for kube-vip.

The second problem occurs when a node is rebooted. It will attempt to re-connect to the cluster, but fails, because it cannot register with the API server when trying to use the domain name that was set up. Since the cluster is running locally, on my home network, there is no lb-apiserver.kubernetes.local outside of the cluster, and since this rebooted node does not have the cluster’s DNS running yet, it cannot resolve the name.

The solution I chose was to add a host name entry to /etc/hosts. Since this file is automatically generated, I had to add the mapping of IP address to lb-apiserver.kubernetes.local in the file /etc/cloud/templates/hosts.debian.tmpl on each node.

I suspect that a playbook could be created to do this for each file in the inventory. I haven’t done that, because my cluster is already up, so I just updated each node manually.

For the configuration of the UCTRONICS OLED display and enabling the power switch to do a controlled shutdown, we need to place the sources on the node(s) and build them for that architecture. Before starting, get the sources:

cd ~/workspace
git clone https://github.com/pmichali/SKU_RM0004.git
cd picluster

Note: I forked the manufacturer’s repo and just renamed the image for now. Later, the manufacturer added a deployment script, but I’m sticking with manual install and using Ansible to set things up.

Next, use this command:

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/uctronics.yaml -v --private-key=~/.ssh/id_ed25519

All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/uctronics.yaml -v --private-key=~/.ssh/id_ed25519

The uctronics.yaml contains:

- name: UCTRONICS Hardware setup
  hosts: all
  become: true
  tasks:
    - name: Enable OLED display and power switch
      ansible.builtin.lineinfile:
        path: /boot/firmware/config.txt
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^dtparam=i2c_arm=on', line: 'dtparam=i2c_arm=on,i2c_arm_baudrate=400000' }
        - { regexp: '^dtoverlay=gpio-shutdown', line: 'dtoverlay=gpio-shutdown,gpio_pin=4,active_low=1,gpio_pull=up'}
      register: hardware
    - name: Get sources
      copy:
        src: "~/workspace/SKU_RM0004"
        dest: "/root/"
      register: files
    - name: Check built
      stat:
        path: /root/SKU_RM0004/display"
      register: executable
    - name: Build display code
      community.general.make:
        chdir: "/root/SKU_RM0004/"
        make: /usr/bin/make
        target: oled_display
      when:
        - files.changed or not executable.stat.exists
      register: built
    - name: Check installed
      stat:
        path: /usr/local/bin/oled_display
      register: binary
    - name: Install display code
      ansible.builtin.command:
        chdir: "/root/SKU_RM0004/"
        cmd: "/usr/bin/install -m 755 oled_display /usr/local/bin"
      when:
        - built.changed or not binary.stat.exists
      register: installed
    - name: Service script
      copy:
        src: "oled.sh"
        dest: "/usr/local/bin/oled.sh"
        mode: 0755
      register: oled_shell
    - name: Service
      copy:
        src: "oled.service"
        dest: "/etc/systemd/system/oled.service"
        mode: 0640
      register: oled_service
    - name: Reload daemon
      ansible.builtin.systemd_service:
        daemon_reload: true
        name: oled
        enabled: true
        state: restarted
      when:
        - installed.changed or oled_shell.changed or oled_service.changed
    - name: Reboot after change
      reboot:
      when:
        - hardware.changed

This uses oled.sh in the playbooks area:

#!/bin/bash
echo "oled.service: ## Starting ##" | systemd-cat -p info
/usr/local/bin/oled_display

And oled.service for the service:

[Unit]
Description=OLED Display
Wants=network.target
After=syslog.target network-online.target
[Service]
Type=simple
ExecStart=/usr/local/bin/oled.sh
Restart=on-failure
RestartSec=10
KillMode=process
[Install]
WantedBy=multi-user.target

Next up is to setup cgroups for the Raspberry PI with the command:

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/cgroups.yaml -v --private-key=~/.ssh/id_ed25519

All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/cgroups.yaml -v --private-key=~/.ssh/id_ed25519

The cgroups.yaml has:

- name: Prepare cgroups on Ubuntu based Raspberry PI
  hosts: all
  become: yes
  tasks:
    - name: Enable cgroup via boot commandline if not already enabled
      ansible.builtin.lineinfile:
        path: /boot/firmware/cmdline.txt
        backrefs: yes
        regexp: '^((?!.*\bcgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1\b).*)$'
        line: '\1 cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory swapaccount=1'
      register: cgroup
    - name: Reboot after change
      reboot:
      when:
        - cgroup.changed

The following will setup load the overlay modules, setup iptables for bridged traffic, and will allow IPv4 forwarding.

Single node:
ansible-playbook -i "${TARGET_HOST}," playbooks/iptables.yaml -v --private-key=~/.ssh/id_ed25519

All nodes:
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/iptables.yaml -v --private-key=~/.ssh/id_ed25519

The iptables.yaml contains:

- name: Prepare iptables on Ubuntu based Raspberry PI
  hosts: all
  become: yes
  tasks:
    - name: Load overlay modules
      community.general.modprobe:
        name: overlay
        persistent: present
    - name: Load br_netfilter module
      community.general.modprobe:
        name: br_netfilter
        persistent: present
    - name: Allow iptables to see bridged traffic
      ansible.posix.sysctl:
        name: net.bridge.bridge-nf-call-iptables
        value: '1'
        sysctl_set: true
    - name: Allow iptables to see bridged IPv6 traffic
      ansible.posix.sysctl:
        name: net.bridge.bridge-nf-call-ip6tables
        value: '1'
        sysctl_set: true
    - name: Allow IPv4 forwarding
      ansible.posix.sysctl:
        name: net.ipv4.ip_forward
        value: '1'
        sysctl_set: true

With ansible, you can do a variety of operations on the nodes of the cluster. One is to ping nodes. You can do this for other systems in your network, if you set up the other_servers.yaml file:

To ping cluster...
ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/ping.yaml -v --private-key=~/.ssh/id_ed25519

To ping other servers...
ansible-playbook -i inventory/mycluster/other_servers.yaml playbooks/ping.yaml -v --private-key=~/.ssh/id_ed25519

The ping.yaml script is pretty simple:

- name: Test Ping
  hosts: all
  tasks:
  - action: ping

You can make other scripts, as needed, like the o_update.yaml shown earlier on this page. At this point, we are ready to cross our fingers and bring up the Kubernetes cluster in Part V.

If during some of the UCTRONICS setup steps, I2C is not enabled and OLED display does not work, you can do these steps (ref: https://dexterexplains.com/r/20211030-how-to-install-raspi-config-on-ubuntu):

In a browser, go to https://archive.raspberrypi.org/debian/pool/main/r/raspi-config/

Get the latest version with (for example):

wget https://archive.raspberrypi.org/debian/pool/main/r/raspi-config/raspi-config_20230214_all.deb 

Install supporting packages:

sudo apt -y install libnewt0.52 whiptail parted triggerhappy lua5.1 alsa-utils

Fix any broken packages (just in case):

sudo apt install -fy

And then install the config util:

sudo dpkg -i ./raspi-config_20230214_all.deb

Run it with “sudo raspi-config” and select interface options, and then I2C, and then enable. Finally, do a “sudo reboot”.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part IV: Preparing to Create Kubernetes Cluster
December 25

Part III: Repartitioning the SSD Drive?

UPDATE: It is better, not to partition the disk drive. Although that prevents the user area on disk from being consumed by log files, there can be situations where the /var dir will get full and Kubernetes will then evict pods from the node due to Disk Pressure. The logs are rotated weekly, with a maximum of five, but there can be instances where syslog gets huge on one day (I saw some 20-40 GB) and disk at 94%.

As a workaround, I edited /etc/logrotate.d/rsyslog and added “size 10G” AFTER the “weekly” line. I couldn’t force rotation of log file with “su root logrotate -f /etc/logrotate.d/rsyslog” due to permission errors (I think I could have changed the group for /var/log to root and after running the command, switch it back to syslog).

 

December 28, 2023

OK, I probably didn’t need to do this, but I thought it may be nice to have separate partitions for logging files, OS, user area, and for “data”, which was going to be where shared disk space would come from. In any case, here are the gruesome details of what I did, for those interested…

Here is the layout that I was interested in doing:

PartitionPurpose
/dev/sda1Unaltered original boot partition, about 256-512 MB.
/dev/sda2Root partition setup for 100GB.
/dev/sda3Will be /var and setup for 90GB.
/dev/sda5For /tmp, limited to 2GB.
/dev/sda6For /home dir, using 8GB.
/dev/sda7Mounted as /var/lib/longhorn to be used for cluster shared storage. Rest of disk space.

First, I used my SD card that had Ubuntu OS on it and inserted that into the powered off system. I disconnected the network interface, so the “host” on the SD card would not conflict with systems already deployed. I ensured that the keyboard and HDMI display were connected, because all the work would need to be done from the local console.

Next, I unplugged the USB jumper that connects the SSD drive to the Raspberry PI4. I booted the system, and it looked for USB drive and then eventually SD drive and booted. It took a while, because I had the network disconnected. If you have a unique hostname/IP setup on the SD card, you could likely leave the network connected (and could paste in commands from this blog using an SSH connection).

Once up and I logged in, I inserted the USB jumper, so that the SSD drive is seen as an external drive. The SD card appears as /dev/mmcblk0p1 and /dev/mmcblk0p2 for the “fd” command, but the SSD drive is not there. I did a “sudo su”, as there are several commands to run as root. When I do “fdisk -l”, I can see the SSD drive as /dev/sda.

The first thing to be done, is to check the file system(otherwise the mount will fail and fsck cannot be run to fix mismatched free blocks in partition table) and then resize /dev/sda2:

e2fsck -f /dev/sda2
resize2fs /dev/sda2 100G

Next, run “fdisk /dev/sda” and use the “p” command to see the current partitions. There should be /dev/sda1 as a 256-512MB partition and /dev/sda2 with the remainder of the space of the drive. Delete partition #2 using the “d” command and “2”.

Created a new primary partition with “n”, “p”, “2”(default). Accepted the default FIRST sector, and for the LAST sector, enter “+100G”. When asked if the ext4 signature should be removed, you can enter “N”.

Create another new primary partition, “n”, “p”, “3”, accepted the default FIRST sector, and used “+90GB” for the LAST sector. That’s it for primary partitions, so we’ll now create an extended partition #4 with “n”, “e”, and accept the defaults. That gives around a 741-743 GB partition.

Now, you can create logical partitions with “n”, accepting the FIRST sector, and specifying the size as the LAST sector. We’ll create partition #5 that is “+2G”, partition #6 that is “+8G”, and finally partition #7 that accepts the defaults for FIRST and LAST to use the rest of the disk (about 730GB).

If you make any mistakes, you can do a “p” command to see the partitions, use the “d” command to delete mistakes, and then recreate the partitions as needed. This is all in memory for now. I print out the partitions one more time to make sure they look OK:

Command (m for help): p
Disk /dev/sda: 931.51 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: 2115
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes
Disklabel type: dos
Disk identifier: 0x5036de4e

Device     Boot     Start        End    Sectors   Size Id Type
/dev/sda1  *         2048     526335     524288   256M  c W95 FAT32 (LBA)
/dev/sda2          526336  210241535  209715200   100G 83 Linux
/dev/sda3       210241536  398985215  188743680    90G 83 Linux
/dev/sda4       398985216 1953525167 1554539952 741.3G  5 Extended
/dev/sda5       398987264  403181567    4194304     2G 83 Linux
/dev/sda6       403183616  419960831   16777216     8G 83 Linux
/dev/sda7       419962880 1953525167 1533562288 731.3G 83 Linux

When things look OK, use the “w” command to write the partition table. Finally, use the “partprobe /dev/sda” command. I did a “shutdown -h 0”, removed the SD card, and reconnected the ethernet cable, and powered the Pi back on. You can ssh in and will see /dev/sda1 and /dev/sda2 with the desired sizes. Do “sudo su” so that you can run several commands as root.

Now it is time to set the partition types for all the other partitions:

mkfs -t ext4 /dev/sda3
mkfs -t ext4 /dev/sda4
mkfs -t ext4 /dev/sda5
mkfs -t ext4 /dev/sda6
mkfs -t ext4 /dev/sda7

Now we want to move some filesystems over to the new partitions. Create mount points and mount most of the new partitions:

mkdir -p /mnt/home /mnt/tmp /mnt/root /mnt/var
mount /dev/sda2 /mnt/root
mount /dev/sda3 /mnt/var
mount /dev/sda5 /mnt/tmp
mount /dev/sda6 /mnt/home

Use rsync to move things over from the current file system:

rsync -aqxP /mnt/root/var/* /mnt/var
rsync -aqxP  /mnt/root/tmp/* /mnt/tmp
rsync -aqxP  /mnt/root/home/* /mnt/home

The originals can now be removed:

cd /mnt/root
rm -rf var home tmp
cd

Using the “blkid” command, you can see the UUIDs for each of the new partitions. Edit /etc/fstab and add on the new mount points with their corresponding UUIDs:

UUID=W /var        ext4    defaults        0       2
UUID=X /tmp	      ext4    defaults	      0	      2
UUID=Y /home	      ext4    defaults	      0	      2
UUID=Z /var/lib/longhorn ext4    defaults	      0	      2

Where W is the UUID for /dev/sda3, X is the UUID for /dev/sda5, Y is the UUID for /dev/sda6, and Z is for /dev/sda7.

Finally, you can unmount all the partitions with the following and then reboot the system and check that the partitions are OK:

umount /dev/sda2 /dev/sda3 /dev/sda5 /dev/sda6

Not that you will have to give it time to shutdown and then restart. You can check the console with monitor attached. Mine took quite a while, I think because it was wrapping up upgrades from “sudo apt-get update -y && sudo apt-get upgrade -y” that I had done earlier. Just be patient.

After the system rebooted, I could see the partitions are mounted with “df -h”:

Filesystem      Size  Used Avail Use% Mounted on
tmpfs           781M  3.0M  778M   1% /run
/dev/sda2        99G  2.1G   93G   3% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda6       7.8G  1.4M  7.4G   1% /home
/dev/sda3        89G  1.7G   82G   3% /var
/dev/sda5       2.0G   96K  1.8G   1% /tmp
/dev/sda1       505M  125M  381M  25% /boot/firmware
tmpfs           781M  4.0K  781M   1% /run/user/1000

One thing to note with the UCTRONICS OLED status display and the repartitioning. The app that controls the display reports disk usage at 107%. I haven’t looked at it yet, but whatever it is using to determine disk space is not accounting for the change due to partitioning (it was fine before the partitioning). Fortunately, it is a simple Python app and the source is provided, so I can make some changes, maybe to report multiple “partitions”.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part III: Repartitioning the SSD Drive?
December 25

Part II: Preparing the Raspberry PI

For the very first PI4 that I bought, I got the CANA kit, so it had a plastic enclosure, power adapter with switch, and 126 GB SD card. With this system I connected a mouse, keyboard, and HDMI monitor, and used the Raspberry PI imager app on my MacBook and installed Ubuntu server (22.04) on the SD card. I booted up and made sure everything worked.

Since I’m using these UCTRONICS trays, I would follow these steps to partially assemble the unit. I’d connect the SSD drive’s SATA connector to the SATA shield card, and screw the SSD card to the side of the tray. Next, I installed the SD adapter into the SATA shield card. I aligned the Raspberry PI4 onto the posts and screwed it on using the threaded standoffs for the PoE+ hat. I inserted the SD adapter into the SD slot of the PI, connected the OLED and power switch cables from the SATA shield card to the front panel. Lastly, I would connect the USB jumper from the SATA shield card to the Raspberry PI4 card for SSD drive connections.

I reused the power supply from my CANA kit, and connected the display and keyboard to the Raspberry PI. I think I could connect power to either the USB-C connector on the PI or on the SATA shield card. An alternative would be to attach the PoE+ hat, but then I’d have to connect ethernet to the PoE+ powered hub, and that was down in the rack, where I didn’t have an HDMI monitor. So I skipped that part, as I was doing this in the study. I connected ethernet cable and powered on the unit. The RPI will display an install screen, where you can press SHIFT key to cause net boot.

This will download the installer image and then restart. You’ll eventually get an installer screen like what the PI imager has on the Mac/PC. It is easiest to use a mouse, but if you don’t have one, you can press the tab button to advance field, and enter to select.

You’ll want to select the model of Raspberry PI (4), select the OS (I used Ubuntu 23.10 64 bit – in the past it was 22.04), and then select the storage device.

You should see the SSD drive listed (Samsung 1TB drive in my case), and select it. Click the next button, and select to edit the configuration. Enter the host name, enabled SSH with password authentication (for now), and selected a username and simple password (for now). I set the time zone as well. Here is an (older installer) screen shot of some settings.

Lastly, click on SAVE, click on YES to use the changes, and then click on the WRITE button. Again, here is an older screen shot, where there was no PI model button, and the configuration (gear icon) was on the same screen.

While the installer was running, I checked on my router for the hostname, and made a DHCP reservation for the final desired IP I wanted for that system. I also created a HOSTNAME.home DNS entry for this node.

At one point, it rebooted and eventually displayed that cloud init was done. I pressed ENTER and was able to log in. I shut down the system, and then started it back up, so that it would pickup the desired IP address that I setup on my router.

To make setup easier, I created an SSH key on this system, so that I could SSH in from my MacBook and do all the rest from there, without needing the display and keyboard connected. I SSHed into the system, created a key with:

ssh-keygen -t ed25519

and then I copied the public key to all the other nodes and systems so that I can easily get into the system. I also added this new system to the ~/.ssh/config file that I use on other systems, so that I can ssh using the host name. I set the login password to what I really wanted. Make sure that you can ssh into each node, without using a password.

Rather than rely on a DHCP reservation, I changed the /etc/netplan/50-cloud-init.yaml to assign a static IP, set the router IP, and set DNS servers. For the first one, I used my router’s IP as I have it set up to use my Pi-Hole address blocker for primary DNS and a public DNS as a secondary one. This way, later, when I add Pi-Hole to Kubernetes, I can just change the IP on the router and every node will use that new DNS. Optionally, you can add a search clause. I tried “.home” thinking it would then resolve my hosts by hostname, but that isn’t working out. Here is an example 50-cloud-init.yaml:

network:
version: 2
renderer: networkd
ethernets:
eth0:
addresses:
- 10.11.12.198/24
routes:
- to: default
via: 10.11.12.1
nameservers:
addresses: [10.11.12.1, 208.67.220.220]
search: [.home]

This was done on each Raspberry PI, followed by “sudo netplan apply” to update the node. Just be sure to do this via the console, otherwise you’ll loose connection, if done from an ssh session.

Of course, there are several other things that need to be set up on a Raspberry PI, like:

  • Setting the domain name for the node.
  • Setting up the OLED display and power switch, since I’m using UCTRONICS tray.
  • Changing kernel settings.

However, since I’ll be using kubespray to provision the cluster, and kubespray uses ansible, there is the ability to use ansible playbooks to provision all the nodes in a more automated fashion on all nodes. This will be detailed in Part IV, but the next step (Part III) is to do some (optional) partitioning of the SSD drive.

On some older units, I went through all sorts of contortions, to install an OS on the SD card using the Raspberry PI imager on my MacBook, booting, updating the rpi-eeprom, setting things up to net boot. There is a bootconfig.txt file that can be used to update the bootload for enabling net boot. I had this entry in there:

BOOT_ORDER=0xf241          SD(1), USB(4), network(2), retry each (f).

Though I’ve seen 0x41 as well. It was really messy. This new loader is much easier.

I hit a case with some newer SSD drives (a Crucial 2TB BX500 drive), where the drive did not show up in the installer’s list of storage devices to select. To get around this I had to do the following…

I attached the SSD drive to my Mac, using a SATA III adapter, and used the Raspberry PI imager to image the disk (using the same settings as explained above).

Then, I attached the SSD drive to the Raspberry PI’s USB3 port (with the UCTRONICS I installed the drive in the tray, connected to SATA Shield card, and used the provided USB3 jumper. I made sure there was no SD card installed, connected an Ethernet cable and started up the Raspberry PI, making sure it booted from the SSD card.

I have had cases where I had to boot from the SD card running Raspberry PI OS (imaged on a Mac or via net booting the RPI), and run these commands, to update the OS, EEPROM, and bootloader…

sudo apt update
sudo apt full-upgrade
sudo rpi-update
sudo rpi-eeprom-update -d -a

I then rebooted, so that the new firmware was activated, and then tried to boot using the SD card and see if the SSD drive was visible (sudo fdisk -l), and then boot without the SD card, hoping it would boot from the SSD drive.

I also tried netbooting to the installer, and instead of choosing an OS, I chose Utility Apps, Bootloader, and picked the option to boot to USB. I wrote that to the SD storage device, it rebooted, and I checked to see if it booted to the SSD. If it did, I would shutdown and restart, without the SD card.

It was a bit of a mess trying to get it working. This happened recently, when I wanted to setup two more PI 4s. One I had to go through these hoops, and the other one worked, after I imaged the SSD on the Mac. I guess the PI4 bootloader should boot from USB, out of the box. For some reason, I hit one that did not.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part II: Preparing the Raspberry PI
December 25

Part I: Raspberry PI Kubernetes Cluster Goals

Currently, I have a few of old tower based Linux servers, running services (VPN, file server, Emby music server, a custom app for monitoring my photovoltaic system, etc). I had started to adapt several of these to run in containers, so that I could move them around, if a system failed, especially since the systems were getting quite old.

In addition, I started to buy some Raspberry PIs so that I had newer technology and hosts that used much less power than my old gear, and I could place these containers on the PIs.

Since I worked with Kubernetes development for several years, before retiring, I decided to build a cluster so I had a way to spread the workload, easily move pods around upon failures, monitor and manage the system, and scale it out as I get more Raspberry PIs.

For the initial design, I have five Raspberry PIs right now, though one is currently hosting a bunch of containers. Plan was to put four of them into service right now, and then once I have my containers migrated over to the cluster, I can add the fifth system. I just got a sixth one for Christmas, so I’ll be adding that in soon.

For the hardware, I’m using Raspberry PI 4s with 8 GB RAM (~$85 each), and I have the PoE+ Hats (~$28), so that I can power them off of the PoE based ethernet hub I have (LinkSys LGS116P 8 regular ports, 8 PoE ports ~$120). I purchased a bunch of Samsung 1 TB SSD drives (870 EVO ~$50). Probably should have gotten 2 TB or larger.

I found a really cool product from UCTRONICS (model B0B6TW81P6 ~$290), which consists of a 1U rack mounted enclosure that holds four Raspberry PI4s, each in a removable tray. There are two fans inside as well. Since I needed one or two more PIs, I also found just a face plate (model RM1U ~ $11) and individual tray units (RM1U-3 ~$76 each). Here is a picture of the enclosure on the bottom, and the face plate on the top.

Each of the tray units have a “SATA shield” logic board with SATA connector for the SSD drive on the bottom, a USB connector on the top that can be connected to the USB3 port of the PI using the provided connector and a jumper.

There is a front panel with a LCD that can display IP address, CPU temp, disk usage, and RAM usable (small python app that can be tweaked). There are SD and SSD activity LEDs on the shield card that show, and they provide a jumper cable for the SD card of the PI so that it is accessible from the front panel (via the shield card). Lastly, there is a power switch, so that you can do a clean shutdown.

There is room for the PI’s PoE Hat and a fan connector on the shield card, so that you can attach one of the fans in the enclosure to one of the PIs. The face plate is made out for a Model 4B PI. the enclosure is pricey, but a really great way to place these into a rack, have a SSD drive connected, and be able to cleanly shutdown the units.

The RM1U is not enclosed, and there are no fans, but I wasn’t concerned, as this unit would be in the basement, which is cool year round. I don’t know whether UCTRONICS will make something for the Raspberry PI 5s or how long they will make these rack mounts and enclosures, but it was a nice way for me to bundle things up.

For each of the Raspberry PIs, they will have a fixed IP address and a unique name (versus having node1, node2,…). I chose to have my router reserve IP addresses, outside of the range used for DHCP. Alternately, you could configure each PI with a static IP address.

Part II will discuss how to prepare the PIs for cluster use.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off on Part I: Raspberry PI Kubernetes Cluster Goals
November 29

Dual-Stack Kubernetes on bare-metal with LazyJack

v1.0

Preliminary support has been added to Lazyjack as of 1.3.5! Now, as of Kubernetes 1.13, the KEP for dual-stack is still under review, and only a few changes have been made to the code, but you can bring up a cluster in dual-stack mode.  You will only see one family of IPs for pods displayed via “kubectl get pod”, but if you look on the pods, you will see both IPv4 and IPv6 addresses.

I’ve already updated kubeadm-dind-cluster to support dual-stack for clusters brought up on a single node, using docker-in-docker, but now Lazyjack supports this too, on bare-metal nodes.

The config.yaml file for Lazyjack will have these changes:

  • A second CIDR can be specified for the management and pod networks, by using the “cidr2” field, under the respective sections. You can specify one family under “cidr” and one under “cidr2”.
  • The service network CIDR will specify which family is used for the service network. The KEP only supports a single IP family for service networks at this time.
  • Omit the DNS64 and NAT64 sections, which are not used in dual-stack mode.
  • The ‘dns64’ and ‘nat64’ operational modes are note specified under the opmodes field any nodes.

Here is an example config that is using IPv6 for the service network:

general:
    mode: dual-stack
    plugin: ptp
    insecure: true
    kubernetes-version: "v1.13.0-alpha.3"
    work-area: "/home/c2/bare-metal/work-area"
topology:
    minion1:
        interface: "enp10s0"
        opmodes: "minion"
        id: 2
    minion2:
        interface: "enp9s0"
        opmodes: "minion"
        id: 3
    my-master:
        interface: "enp10s0"
        opmodes: "master"
        id: 4
mgmt_net:
    cidr: "10.192.0.0/16"
    cidr2: "fd00:20::/64"
pod_net:
    cidr: "10.244.0.0/16"
    cidr2: "fd00:40::/72"
service_net:
    cidr: "fd00:30::/110"

 

 

 

 

 

 

 

Category: bare-metal, Kubernetes | Comments Off on Dual-Stack Kubernetes on bare-metal with LazyJack
November 8

Dual-stack Kubernetes with kubeadm-dind-cluster

V1.0

Overview

A coworker has pushed out a Kubernetes Enhancement Proposal (KEP) for dual-stack Kubernetes that is currently under review by the community. This capability is currently targeted for the 1.14 release.https://github.com/kubernetes/kubernetes/pull/70659

This proposal will provide IPv4 and IPv6 addresses for all containers (pod network) and nodes (management network), allowing communication with other pods and external resources with either protocol. To simplify this first release will use a single IP family for services, meaning the service network will either be IPv4 or IPv6.

What’s Up?

We’ve started implementing some changes to support dual-stack (as WIP, in some cases, because the KEP is not approved yet). To support that, I’ve modified the kubeadm-dind-cluster provisioning tool (a.k.a k-d-c) so that we can experiment with bringing up a cluster with dual-stack networking, during development.

The changes include setting the CNI configuration files for dual-stack, adding static routes for the Bridge or PTP plugin so that pods can communicate with either IP family across nodes, adjust the KubeAdm configuration file so that the API will use a specific IP family, and does not make use of the DNS64/NAT64 capabilities as both IP families are available on each container.

I’ve verified that we can bring up a cluster in dual-stack mode, with pod to pod (across nodes) and pod to external connectivity using both IPv4 and IPv6. I’ve used IPv4 for the service network, and with PR 70659 (under review as of today), I have verified a cluster with an IPv6 service network.

Granted, there are things that don’t work yet, as much of the KEP needs to be implemented (like service endpoints and pod status API), but it was very satisfying to see a PoC cluster come up.

How To…

To try this out, there are a few preparation steps. First, clone the kubeadm-dind-cluster repo.

cd
git clone https://github.com/kubernetes-sigs/kubeadm-dind-cluster.git dind

 

Next, clone Kubernetes in a subdirectory underneath k-d-c:

cd ~/dind
git clone https://github.com/kubernetes/kubernetes.git

Within the Kubernetes repo, grab my PR that is out for review (or wait until this is merged):

cd kubernetes
git fetch origin pull/70659/head:pr70659
git checkout pr70659

Now, you can bring up a cluster in dual-stack mode, using the desired service network IP family. You can set the dual stack mode:

export IP_MODE=dual-stack

And since we are customizing the Kubernetes code, we need to tell k-d-c to build a new image:

export DIND_IMAGE=mirantis/kubeadm-dind-cluster:local
export BUILD_KUBEADM=y
export BUILD_HYPERKUBE=y

If you are in a lab, and need to use a company DNS server, you can also set REMOTE_DNS64_V4SERVER.

Now, let’s build a new k-d-c image:

cd ..
build/build-local.sh
cd kubernetes

 

To use an IPv6 service network, you can just bring up the cluster using the default values:

../dind-cluster.sh up

 

To use IPv4, you’ll need to first set SERVICE_CIDR to an IPv4 CIDR, before bringing up the cluster. You can use the same value that k-d-c uses for IPv4 only networks, like:

export SERVICE_CIDR="10.96.0.0/12"

 

Then, just use the same “up” command to bring things up.

In each of these modes, you’ll see either IPv4 or IPv6 addresses, when doing a “kubectl get pods –all-namespaces -o wide” command. The pods will still have both IPv4 and IPv6 addresses, and from the pods, you’ll be able to ping and ping6 to external IPv4 and IPv6 sites, respectively.

 

Futures…

I haven’t played with external access to the cluster, and obviously there is work to do for the APIs and kube-proxy, along with changes to kubeadm (see the KEP for details).

I’m working on updating my Lazyjack tool that helps with provisioning Kubernetes on bare-metal nodes, so that it too can bring up dual-stack clusters. This will provide feature parity with k-d-c, only using separate physical nodes, instead of Kubernetes running on node containers (using Docker-in-docker) on a single host.

 

Category: Kubernetes | Comments Off on Dual-stack Kubernetes with kubeadm-dind-cluster
October 22

Lazyjack 1.3.2 New Features

Several new capabilities have been added to Lazyjack recently:

  • Supports IPv4 only mode, so clusters can be created with IPv4 addresses.
  • The kubeadm.conf file generated will use templates that are version specific. This allows easier customizing of the configuration easily. Supports Kuberenetes 1.10-1.13, although experiencing some issues using the alpha 1.13 setup.
  • Clusters can be configured for insecure mode, where init is not needed, and the config YAML file doesn’t have to be copied over to minions (as it is not updated with a token). This makes it easier to start up a cluster, by just running the prepare and up steps.

Time permitting, I hope to add dual stack capabilities.

 

Category: bare-metal, Kubernetes | Comments Off on Lazyjack 1.3.2 New Features
August 15

Lazyjack IPv6 Updates

v1.1

Since its introduction, and as of  V1.2.1 Lazyjack, several new capabilities have been added…

  • Support for PTP CNI plugin. User can specify “ptp” in config.yaml for “General: Plugin”, instead of the default “bridge” setting.
  • DNS64 configuration is stored in a volume, instead of host local file. This provides a more secure setup for the container.
  • Documentation updated to indicate how to use new capabilities, and how to customize cluster setup.
  • NAT64 dynamic IPv4 pool is configurable. The CIDR specified in “nat64: v4_cidr” of config.yaml can be adjusted to allow different subnets to be used, in case of conflicts.
  • Customizable MTU for pod/management network.  The “pod_net: mtu” setting in config.yaml can be used to set the MTU used.
  • Direct access to IPv6 external hosts without using DNS644 prefix. Setting `dns64: allow_aaaa_use` in config.yaml to “true” allows IPv6 capable external sites to be accessed directly.
  • Removed hard-coded Kubernetes version in kubeadm.conf template, so that user can specify version to be used.

Other features, like running kube-proxy in IPVS mode, or selecting CoreDNS as the DNS server, instead of kube-dns, can be enabled, by altering the kubeadm.conf file that is created by the “prepare” step, and then perform the “up” step. see the README.md file for more info.

Note: For security purposes, it is strongly recommended that you set “general: work-area” to an area that has access restricted. The default area, “/tmp”, could be prone to attacks, by users without the required permissions.

Category: Kubernetes | Comments Off on Lazyjack IPv6 Updates
July 20

How to recover (I think) from a botched Kubernetes update

v1.0

What Happened?

I was using KubeAdm v1.10 and wanted to give the latest Kubernetes from master a try. I (unfortunately) just updated the binaries for kubeadm, kubectl, and kubelet.  I restarted the kubelet daemon (“sudo sytemctl restart kubelet”), and then ran “kubeadm init” hoping to sit back and watch the new cluster come up.

First, I found that the config file needed a newer API version, so I changed that to use “kubeadm.k8s.io/v1alpha2”, instead of “kubeadm.k8s.io/v1alpha1”, and tried “kubeadm init” again.

Well, it failed to come up, and it looking at the issue, I found that the kubelet configuration file, /etc/systemd/system/kubelet.service.d/10-kubeadm.conf was referring to a kubelet config file that did not exist:

Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"

 

I had no clue how to create this file, nor why it wasn’t there.

 

What Should Have Been Done?

It looks like, going from v1.10 to a newer version, the upgrade procedures should be used. One needs to go from one minor release to minor release at at time (1.10 -> 1.11, 1.11 -> 1.12,…). In this process, one can use “kubeadm config migrate –old-config kubeadm.conf –new-config new-kubeadm.conf, to update the config file. You can then change the API version, and set the Kubernetes version, before using the config file in the update.

I’m not sure what you would do, if you didn’t have a running v1.10 cluster, as this method seems to imply that is needed. Maybe you’d end up in the same state as I was in.

 

What to do, if you didn’t do the upgrade?

From what I can tell, it appears that v1.11+ needs /var/lib/kubelet/config.yaml, which doesn’t exist in v1.10. It will get generated when kubeadm init is invoked, and removed when kubeadm reset is done. But, when I had a previous v1.10 install, it was not getting created during init, and with this file missing, init fails. That file was the key to try to recover from my mess.

On a fresh system install, I brought up Kubernetes v1.11, using KubeAdm and the same config file that I was using on the corrupt system. I took the config.yaml (this one is what I had, YMMV) that was created, and placed it on the system that was corrupted, and then brought up the cluster with “kubeadm init”.

The cluster came up OK with that change. Oddly enough, doing a “kubeadm reset” remokved the file, and it was recreated the next time I did “kubeadm init”. I’m not sure why, when I had v1.10 setup, and then switched to v1.11, the file was not created. In any case, I’m happy it is working now.

I did see another problem though, and I’m not sure if it is related. Once I brought up the master node, set the bridge CNI plugin config file, and untainted the node, I created some alpine pods. From the pod, I could not ping other pods on the node. Looking at the iptables rules, I was seeing this rule…

-P FORWARD DROP

instead of…

-P FORWARD ACCEPT

I flushed all the iptables rules, and then brought up the cluster with KubeAdm again, and after untainting and creating pods, everything seemed to work just peachy.

 

 

Category: Kubernetes | Comments Off on How to recover (I think) from a botched Kubernetes update
July 20

Using kube-router with kubeadm-dind-cluster

v1.0

What is kube-router?

One of the options for networking in Kubernetes, is to use kube-router. This plugin uses the Bridge CNI plugin and go BGP to provide networking for the cluster.

Why use it?

With kube-router, it uses the IPVS (IP Virtual Server) kernel module, instead of iptables rules. This gives much better performance (hash vs serial lookups) and scales much better. Kube-router also uses goBGP, to provide full mesh connectivity using iBGP, instead of requiring static routes, when using the bridge plugin.

What is kubeadm-dind-cluster?

The kubeadm-dind-cluster tool that I’ve mentioned here before, allows you to create a Kubernetes cluster on single host (VM or bare-metal), by using Docker-in-Docker (it creates docker containers, which will be nodes where KubeAdm is invoked to bring up a cluster).

This tool is nice, because it saves you from doing all the tedious steps of setting up a cluster using KubeAdm manually. There are instructions in the kubeadm-dind-cluster repo, on how to use the tool to bring up a cluster.  The tools supports the bridge, calico, flannel, and weave CNI plugins.

Peanut Butter and Chocolate…

I have a PR out in kubernetes-sig/kubeadm-dind-cluster repo to add support for using kube-router, instead of kube-proxy. To use this, you can perform the following steps (assuming you have a kubeadm-dind-cluster repo pulled):

  1. Patch in the PR changes
    1. git fetch origin pull/159/head:pr159
    2. git log –abbrev-commit pr159 –oneline -n 1 | cut -f 1 -d” “
    3. git cherry-pick <# from log output>
  2. build/build-local.sh
  3. export DIND_IMAGE=mirantis/kubeadm-dind-cluster:local
  4. export CNI_PLUGIN=kube-router
  5. Bring up the cluster “./dind-cluster.sh up”

This will skip the normal bridge CNI plugin setup and creation of static routes, run a YAML file to configure the bridge CNI plugin and startup kube-router pods on each node (which will start up BGP), and will then remove the kube-proxy daemonset.

Once the cluster is up, you can “kubectl exec” into one of the kube-router pods to see the BGP and IPVS setup.

After the PR (159) is upstreamed, you’ll only need to set the CNI_PLUGIN to kube-router, and then bring up the cluster.

 

Limitations

Currently, this only works with IPv4. Although ipset and goBGP support IPv6, kube-router is not set up to run in IPv6 mode. There is a PR to add IPv6 support.

Category: Kubernetes | Comments Off on Using kube-router with kubeadm-dind-cluster