bare-metal – Page 2 – Off The Record

Part II: Preparing the Raspberry PI

January 29, 2024

Assembly and OS Install

For the very first PI4 that I bought, I got the CANA kit, so it had a plastic enclosure, power adapter with switch, and 126 GB SD card. With this system I connected a mouse, keyboard, and HDMI monitor, and used the Raspberry PI imager app on my MacBook and installed Ubuntu server (22.04) on the SD card. I booted up and made sure everything worked.

Since I’m using these UCTRONICS trays, I would follow these steps to partially assemble the unit. I’d connect the SSD drive’s SATA connector to the SATA shield card, and screw the SSD card to the side of the tray. Next, I installed the SD adapter into the SATA shield card. I aligned the Raspberry PI4 onto the posts and screwed it on using the threaded standoffs for the PoE+ hat. I inserted the SD adapter into the SD slot of the PI, connected the OLED and power switch cables from the SATA shield card to the front panel. Lastly, I would connect the USB jumper from the SATA shield card to the Raspberry PI4 card for SSD drive connections.

I reused the power supply from my CANA kit, and connected the display and keyboard to the Raspberry PI. I think I could connect power to either the USB-C connector on the PI or on the SATA shield card. An alternative would be to attach the PoE+ hat, but then I’d have to connect ethernet to the PoE+ powered hub, and that was down in the rack, where I didn’t have an HDMI monitor. So I skipped that part, as I was doing this in the study. I connected ethernet cable and powered on the unit. The RPI will display an install screen, where you can press SHIFT key to cause net boot.

This will download the installer image and then restart. You’ll eventually get an installer screen like what the PI imager has on the Mac/PC. It is easiest to use a mouse, but if you don’t have one, you can press the tab button to advance field, and enter to select.

You’ll want to select the model of Raspberry PI (4), select the OS (I used Ubuntu 23.10 64 bit – in the past it was 22.04), and then select the storage device.

You should see the SSD drive listed (Samsung 1TB drive in my case), and select it. Click the next button, and select to edit the configuration. Enter the host name, enabled SSH with password authentication (for now), and selected a username and simple password (for now). I set the time zone as well. Here is an (older installer) screen shot of some settings.

Lastly, click on SAVE, click on YES to use the changes, and then click on the WRITE button. Again, here is an older screen shot, where there was no PI model button, and the configuration (gear icon) was on the same screen.

While the installer was running, I checked on my router for the hostname, and made a DHCP reservation for the final desired IP I wanted for that system. I also created a HOSTNAME.home DNS entry for this node.

At one point, it rebooted and eventually displayed that cloud init was done. I pressed ENTER and was able to log in. I shut down the system, and then started it back up, so that it would pickup the desired IP address that I setup on my router.

Node Setup

To make setup easier, I created an SSH key on this system, so that I could SSH in from my MacBook and do all the rest from there, without needing the display and keyboard connected. I SSHed into the system, created a key with:

ssh-keygen -t ed25519

and then I copied the public key to all the other nodes and systems so that I can easily get into the system. I also added this new system to the ~/.ssh/config file that I use on other systems, so that I can ssh using the host name. I set the login password to what I really wanted. Make sure that you can ssh into each node, without using a password.

Rather than rely on a DHCP reservation, I changed the /etc/netplay/50-cloud-init.yaml to assign a static IP, set the router IP, and set DNS servers and domain name (I purchased a domain name and used dynamic DNS to point it to my router’s external IP – using the router’s capability to keep the dynamic IP updated). Here is an example 50-cloud-init.yaml:

network:
  version: 2
  renderer: networkd
  ethernets:
    eth0:
      addresses:
        - 10.11.12.198/24
      routes:
        - to: default
          via: 10.11.12.1
      nameservers:
        addresses: [10.11.12.1, 208.67.220.220]
        search: [MY.DOMANNAME]

This was done on each Raspberry PI, followed by “sudo netplan apply” to update the node. Just be sure to do this via the console, otherwise you’ll loose connection, if done from an ssh session.

Of course, there are several other things that need to be set up on a Raspberry PI, like:

Setting the domain name for the node.
Setting up the OLED display and power switch, since I’m using UCTRONICS tray.
Changing kernel settings.

However, since I’ll be using kubespray to provision the cluster, and kubespray uses ansible, there is the ability to use ansible playbooks to provision all the nodes in a more automated fashion on all nodes. This will be detailed in Part IV, but the next step (Part III) is to do some (optional) partitioning of the SSD drive.

Side Bar

Alternative to netbooting and imaging

On some older units, I went through all sorts of contortions, to install an OS on the SD card using the Raspberry PI imager on my MacBook, booting, updating the rpi-eeprom, setting things up to net boot. There is a bootconfig.txt file that can be used to update the bootload for enabling net boot. I had this entry in there:

BOOT_ORDER=0xf241 SD(1), USB(4), network(2), retry each (f).

Though I’ve seen 0x41 as well. It was really messy. This new loader is much easier.

My SSD drive is not seen at netboot!

I hit a case with some newer SSD drives (a Crucial 2TB BX500 drive), where the drive did not show up in the installer’s list of storage devices to select. To get around this I had to do the following…

I attached the SSD drive to my Mac, using a SATA III adapter, and used the Raspberry PI imager to image the disk (using the same settings as explained above).

Then, I attached the SSD drive to the Raspberry PI’s USB3 port (with the UCTRONICS I installed the drive in the tray, connected to SATA Shield card, and used the provided USB3 jumper. I made sure there was no SD card installed, connected an Ethernet cable and started up the Raspberry PI, making sure it booted from the SSD card.

I have had cases where I had to boot from the SD card running Raspberry PI OS (imaged on a Mac or via net booting the RPI), and run these commands, to update the OS, EEPROM, and bootloader…

sudo apt update
sudo apt full-upgrade
sudo rpi-update
sudo rpi-eeprom-update -d -a

I then rebooted, so that the new firmware was activated, and then tried to boot using the SD card and see if the SSD drive was visible (sudo fdisk -l), and then boot without the SD card, hoping it would boot from the SSD drive.

I also tried netbooting to the installer, and instead of choosing an OS, I chose Utility Apps, Bootloader, and picked the option to boot to USB. I wrote that to the SD storage device, it rebooted, and I checked to see if it booted to the SSD. If it did, I would shutdown and restart, without the SD card.

It was a bit of a mess trying to get it working. This happened recently, when I wanted to setup two more PI 4s. One I had to go through these hoops, and the other one worked, after I imaged the SSD on the Mac. I guess the PI4 bootloader should boot from USB, out of the box. For some reason, I hit one that did not.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

December 25

Part I: Raspberry PI Kubernetes Cluster Goals

December 28, 2023

Goals

Currently, I have a few of old tower based Linux servers, running services (VPN, file server, Emby music server, a custom app for monitoring my photovoltaic system, etc). I had started to adapt several of these to run in containers, so that I could move them around, if a system failed, especially since the systems were getting quite old.

In addition, I started to buy some Raspberry PIs so that I had newer technology and hosts that used much less power than my old gear, and I could place these containers on the PIs.

Since I worked with Kubernetes development for several years, before retiring, I decided to build a cluster so I had a way to spread the workload, easily move pods around upon failures, monitor and manage the system, and scale it out as I get more Raspberry PIs.

For the initial design, I have five Raspberry PIs right now, though one is currently hosting a bunch of containers. Plan was to put four of them into service right now, and then once I have my containers migrated over to the cluster, I can add the fifth system. I just got a sixth one for Christmas, so I’ll be adding that in soon.

General Design

For the hardware, I’m using Raspberry PI 4s with 8 GB RAM (~$85 each), and I have the PoE+ Hats (~$28), so that I can power them off of the PoE based ethernet hub I have (LinkSys LGS116P 8 regular ports, 8 PoE ports ~$120). I purchased a bunch of Samsung 1 TB SSD drives (870 EVO ~$50). Probably should have gotten 2 TB or larger.

I found a really cool product from UCTRONICS (model B0B6TW81P6 ~$290), which consists of a 1U rack mounted enclosure that holds four Raspberry PI4s, each in a removable tray. There are two fans inside as well. Since I needed one or two more PIs, I also found just a face plate (model RM1U ~ $11) and individual tray units (RM1U-3 ~$76 each). Here is a picture of the enclosure on the bottom, and the face plate on the top.

Each of the tray units have a “SATA shield” logic board with SATA connector for the SSD drive on the bottom, a USB connector on the top that can be connected to the USB3 port of the PI using the provided connector and a jumper.

There is a front panel with a LCD that can display IP address, CPU temp, disk usage, and RAM usable (small python app that can be tweaked). There are SD and SSD activity LEDs on the shield card that show, and they provide a jumper cable for the SD card of the PI so that it is accessible from the front panel (via the shield card). Lastly, there is a power switch, so that you can do a clean shutdown.

There is room for the PI’s PoE Hat and a fan connector on the shield card, so that you can attach one of the fans in the enclosure to one of the PIs. The face plate is made out for a Model 4B PI. the enclosure is pricey, but a really great way to place these into a rack, have a SSD drive connected, and be able to cleanly shutdown the units.

The RM1U is not enclosed, and there are no fans, but I wasn’t concerned, as this unit would be in the basement, which is cool year round. I don’t know whether UCTRONICS will make something for the Raspberry PI 5s or how long they will make these rack mounts and enclosures, but it was a nice way for me to bundle things up.

For each of the Raspberry PIs, they will have a fixed IP address and a unique name (versus having node1, node2,…). I chose to have my router reserve IP addresses, outside of the range used for DHCP. Alternately, you could configure each PI with a static IP address.

Part II will discuss how to prepare the PIs for cluster use.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

November 29

Dual-Stack Kubernetes on bare-metal with LazyJack

v1.0

Preliminary support has been added to Lazyjack as of 1.3.5! Now, as of Kubernetes 1.13, the KEP for dual-stack is still under review, and only a few changes have been made to the code, but you can bring up a cluster in dual-stack mode. You will only see one family of IPs for pods displayed via “kubectl get pod”, but if you look on the pods, you will see both IPv4 and IPv6 addresses.

I’ve already updated kubeadm-dind-cluster to support dual-stack for clusters brought up on a single node, using docker-in-docker, but now Lazyjack supports this too, on bare-metal nodes.

The config.yaml file for Lazyjack will have these changes:

A second CIDR can be specified for the management and pod networks, by using the “cidr2” field, under the respective sections. You can specify one family under “cidr” and one under “cidr2”.
The service network CIDR will specify which family is used for the service network. The KEP only supports a single IP family for service networks at this time.
Omit the DNS64 and NAT64 sections, which are not used in dual-stack mode.
The ‘dns64’ and ‘nat64’ operational modes are note specified under the opmodes field any nodes.

Here is an example config that is using IPv6 for the service network:

general:
    mode: dual-stack
    plugin: ptp
    insecure: true
    kubernetes-version: "v1.13.0-alpha.3"
    work-area: "/home/c2/bare-metal/work-area"
topology:
    minion1:
        interface: "enp10s0"
        opmodes: "minion"
        id: 2
    minion2:
        interface: "enp9s0"
        opmodes: "minion"
        id: 3
    my-master:
        interface: "enp10s0"
        opmodes: "master"
        id: 4
mgmt_net:
    cidr: "10.192.0.0/16"
    cidr2: "fd00:20::/64"
pod_net:
    cidr: "10.244.0.0/16"
    cidr2: "fd00:40::/72"
service_net:
    cidr: "fd00:30::/110"

Category: bare-metal, Kubernetes | Comments Off

October 22

Lazyjack 1.3.2 New Features

Several new capabilities have been added to Lazyjack recently:

Supports IPv4 only mode, so clusters can be created with IPv4 addresses.
The kubeadm.conf file generated will use templates that are version specific. This allows easier customizing of the configuration easily. Supports Kuberenetes 1.10-1.13, although experiencing some issues using the alpha 1.13 setup.
Clusters can be configured for insecure mode, where init is not needed, and the config YAML file doesn’t have to be copied over to minions (as it is not updated with a token). This makes it easier to start up a cluster, by just running the prepare and up steps.

Time permitting, I hope to add dual stack capabilities.

Category: bare-metal, Kubernetes | Comments Off

June 6

IPv6 Kubernetes – Improving External Access Performance

v1.2 – June 14th 2018

Summary

With current Kubernetes IPv6 only clusters (v1.9.0+), a brute force approach was taken, to deal with the outside world. Since there are some external sites that are IPv4 only, Kubernetes was set up with a NAT64 and DNS64 server to treat all external destinations as IPv4 only.

Here, we’ll talk about ways to more intelligently handle external sites, using IPv6 access, when possible. The result is an improvement in performance, both in space and time.

What We Have Today

Let’s use an example of a pod on a minion node of a three node, bare-metal, IPv6 only Kubernetes cluster, trying to ping google.com.

First, the pod requests a lookup of the destination name, to obtain the IP address. Since not all destinations support IPv6 (e.g. github.com), the DNS64 server in the cluster is configured to always use the A record (IPv4) and ignore any AAAA record (IPv6). The IPv4 address will be embedded into a synthesized IPv6 address, using the configured prefix. In this example, the address 216.58.217.78 is combined with the fd00:10:64:ff9b:: prefix to get fd00:10:64:ff9b::d83a:d94e.

The pod (fd00:40::3:0:0:4e7) will then send a ping request, out it’s interface (to fd00:10:64:ff9b::d839:d94e), as shown at (A) in the diagram below.

The ping request will cross the local bridge, br0, and the routing table on the node will direct the packet, over the pod network, to the master node. The packet will be sent (B) from the minion node’s eth1 interface (fd00:20::3) to the master node’s pod network interface (eth1). The route on the master node, will direct the packet to the NAT64 server (a container), over the veth interface.

The NAT64 server (C) creates mapping of source IPv6 address (at this point the minion node’s pod network interface fd00:20::3) to a private IPv4 address (172.18.0.53) from a locally maintained pool. It will extract the destination IPv4 address (216.58.217.78) and send the ping to the master node (D), where iptables employs SNAT to map the private IPv4 address to the node’s IPv4 address (e.g. 10.1.1.2).

Finally, the packet is sent out the main interface (E) to the next hop, which would also do SNAT for this local IPv4 address.

The ping response would follow the reverse route thought the NAT64 server, to the minon node, and finally the pod.

Improvements For IPv6 External Sites

We can, however, configure the DNS64 to allow AAAA records to be used, for external destinations that support IPv6 addressing.

In this example, the DNS lookup would return the AAAA record for google.com (2607:f8b0:4004:801::200e) and the pod shown at (A) would send a ping to that address, as shown in the diagram below.

The ping request would traverse the local bridge, br0, and the routing table on the minion node would direct the packet out the main interface (eth0), and using SNAT, would use the IP of the node as the source address (2001:db8::100), as shown at (B). The packet would be sent to the next hop, where SNAT may occur, if the minion node’s IPv6 address is not public.

The ping response would follow the reverse route, into the minion node, and to the pod.

This avoids sending the packets to the master node’s NAT64 server, where translation and mapping is performed, both a time and space savings (no mapping table needed).

Bare Metal Implementation Details

The Lazyjack tool has been modified (in v1.1.0+) to allow the user to specify whether or not destinations that support IPv6 addressing can be directly accessed, without using NAT64.

Under the dns64 section in the config.yaml, there is a new entry titled “allow_aaaa_use”, which if set to “true”, will use the AAAA records from DNS64 and directly access external IPv6 addresses. If omitted, or set to “false”, the existing mechanism of using only the A DNS record and performing NAT64 on all packets for external destinations.

Before using Lazyjack, the nodes of the cluster must be provisioned for IPv6. One each node, this includes:

Enabling IPv6 and IPv6 forwarding on main interface.
Giving the main interface (with Internet access) an IPv6 address (we used SLAAC).
Having a default IPv6 route that sends traffic out the main interface (done via SLAAC).
To preserve the default route, set sysctl accept_ra with a value of two. For example:

sudo sysctl net.ipv6.conf.eth0.accept_ra = 2

KubeAdm-dind-cluster (DinD) Implementation Details

As of PR 148 merging, the Kubeadm-dind-cluster tool (note the new repo location) for provisioning clusters has been updated to allow the user to enable the ability to use (IPv6) AAAA records for DNS lookups, so that unaltered IPv6 addresses can be used, rather than forcing the use of (IPv4) A records and requiring DNS64 to be used. This new capability can be enabled by setting the environment variable, DIND_ALLOW_AAAA_USE=true.

The k-d-c tool will then use a modified DNS64 configuration, and create the needed ip6tables entries on the host to allow forwarding of packets to the kubeadm-dind-net bridge, and perform SNAT for outgoing packets.

You can check the PR, and once merged, use the latest code on the master branch.

Category: bare-metal, Kubernetes | Comments Off

March 14

KubeAdm with Local Kubernetes Repo for IPv6

V1.4

Goal

In working with my lazyjack tool to create IPv6 based Kubernetes clusters on bare metal systems, I wanted to run the latest code on master (1.10.0-beta.2 at this time) to run E2E tests and possibly tweak things. So, I needed to be able to run KubeAdm with my own repo, instead of using something prebuilt from upstream.

In addition, I wanted to make sure that I could do some customizing with lazyjack.

Preparation

As described in my blog post on lazyjack, I have a three node, bare-metal setup, with a second interface connected to a physical switch, for the Kubernetes management/pod network. The lazyjack tool is installed on the nodes and I have a config.yaml with the network topology, including all IP addresses desired. Developement tools (go, git, etc) are installed on the node used as the master node.

I’m using Ubuntu 16.04 on each of my systems.

Next, I pulled down the latest Kubernetes code:

mkdir -p ~/go/src/k8s.io
cd ~/go/src/k8s.io
git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes

Now, we are ready to set things up to use this repo, for the Kuberentes cluster.

Steps

Repo Prep

I checked out a branch that had the version I wanted. At the time of this writing, beta 2 of 1.10 was available:

cd ~/go/src/k8s.io/kubernetes
git checkout -b release-1.10 origin/release-1.10
git checkout v1.10.0-beta.2

Building/Installing

Next, I built everything and installed kubectl, kubeadm, and kubelet binaries, so that we’re using the latest for everything. Restarted kubelet to get the new version running:

make clean
make
make release

cd _output/bin/
sudo cp kubeadm kubectl kubelet /usr/bin/
sudo systemctl daemon-reload
sudo systemctl restart kubelet

I copied these three binaries over to the two minion systems, placed them in /usr/bin, and restarted kubelet to update them as well.

For the release images, they are in TAR files, which can be loaded into docker:

cd ~/go/src/k8s.io/kubernetes/_output/release-images/amd64
for f in *.tar; do docker load -i $f; done

Startup Local Registry

The docker daemon needs to know about an insecure registry that is being created on one of the nodes. Create the following file on each system:

cat > /etc/docker/daemon.json <<EOT
{
    "insecure-registries": ["10.86.7.77:5000"]
}
EOT
systemctl restart docker

I guess I could have used the management IP for the master node ([fd00:20::2]), but I decided to use the admin interface IP of the machine on the lab network. In this case, it was 10.86.7.77. You would replace this, with your master node’s IP address.

On the master node, you want to start the local registry:

docker run -d -p 5000:5000 --restart always --name registry registry:2

Tagging images

Note: If there is an easier way than this (like some make target), please let me know…

After making the release and doing a docker load for each tar file, there are a bunch of images in the local registry. We need to tag these images with our master node’s IP and port 5000 (10.86.7.77:5000 in my case). This is a bit complicated, as most of the images built do not have the -amd64 suffix and the tag used has an underscore, which doesn’t play well in the sandbox. Hopefully there is an easier way, but this is what I did…

I first found out the list of images, by doing “docker images” to find out the tag that was created (e.g.v1.10.0-beta.2.17_3d19fe4010c246-dirty). Here’s the list of images:

k8s.gcr.io/hyperkube-amd64
gcr.io/google_containers/kube-apiserver
k8s.gcr.io/kube-apiserver
gcr.io/google_containers/kube-controller-manager
k8s.gcr.io/kube-controller-manager
gcr.io/google_containers/cloud-controller-manager
k8s.gcr.io/cloud-controller-manager
k8s.gcr.io/kube-aggregator
gcr.io/google_containers/kube-aggregator
gcr.io/google_containers/kube-scheduler
k8s.gcr.io/kube-scheduler
gcr.io/google_containers/kube-proxy
k8s.gcr.io/kube-proxy

With that, I could filter by the tag and build up the commands needed to tag these images for my local repo (10.86.7.77:5000). You can see that I had to strip out the registry name to just have the image name part for the local repo tag:

docker images \
    --format="docker tag {{.Repository}}:{{.Tag}} 10.86.7.77:5000/{{.Repository}}:v1.10.0-beta.2" | \
    grep v1.10.0-beta.2.17_3d19fe4010c246 | \
    sed -e "s?5000/gcr.io/google_containers?5000?" | \
    sed -e "s?5000/k8s.gcr.io?5000?" > x
chmod 777 x

You can do the same as above, only replace the tag (e.g.v1.10.0-beta.2.17_3d19fe4010c246-dirty) with what you have for a tag. Also, since we need images with the -amd64 suffix, the same command can be tweaked to create those tags too:

docker images \
    --format="docker tag {{.Repository}}:{{.Tag}} 10.86.7.77:5000/{{.Repository}}-amd64:v1.10.0-beta.2" | \
    grep v1.10.0-beta.2.17_3d19fe4010c246 | \
    sed -e "s?5000/gcr.io/google_containers?5000?" | \
    sed -e "s?5000/k8s.gcr.io?5000?" > y
chmod 777 y

Before invoking “y”, you need to pull the “hyperkube-amd64” line (first one?), as it already has the right suffix. Now, you can invoke these two files and create all the tags needed. Now, I don’t know if you need the tags in file “x” (other than the hyperkube-amd64), so you could try skipping running that file and just do the first line, along with running file “y”. In my Kubernetes cluster, I think all the images had the -amd64 suffix.

Pushing Images

With everything tagged, you can then push these images to your local repo. I did this command, to first check that I have the right syntax for my tags:

docker images | grep 10.86.7.77:5000
10.86.7.77:5000/hyperkube-amd64 v1.10.0-beta.2 351430a5275d 3 days ago 633 MB
10.86.7.77:5000/kube-apiserver v1.10.0-beta.2 2e8a0bd89199 3 days ago 224 MB
10.86.7.77:5000/kube-apiserver-amd64 v1.10.0-beta.2 2e8a0bd89199 3 days ago 224 MB
10.86.7.77:5000/kube-controller-manager v1.10.0-beta.2 80ea4fb85ccb 3 days ago 147 MB
10.86.7.77:5000/kube-controller-manager-amd64 v1.10.0-beta.2 80ea4fb85ccb 3 days ago 1
...

If that looks good (registry, image name, and tag), create and invoke the following file in order to push them up:

docker images --format="docker push {{.Repository}}:{{.Tag}}" | \
    grep 10.86.7.77:5000 > z
chmod 777 z
./z

Don’t Forget These Images…

Once I tried this all out, I hit a problem with missing images. It turns out that KubeAdm is using some older versions of etcd (3.2.16) and kube-dns (1.14.8) images. To be sure I had these in my local registry (because everything will come from there), I needed to handle these specially:

docker pull gcr.io/google_containers/etcd-amd64:3.2.16
docker tag gcr.io/google_containers/etcd-amd64:3.2.16 10.86.7.77:5000/etcd-amd64:3.2.16
docker push 10.86.7.77:5000/etcd-amd64:3.2.16
docker tag gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.8 10.86.7.77:5000/k8s-dns-kube-dns-amd64:1.14.8
docker push 10.86.7.77:5000/k8s-dns-kube-dns-amd64:1.14.8
docker tag gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.8 10.86.7.77:5000/k8s-dns-dnsmasq-nanny-amd64:1.14.8
docker push 10.86.7.77:5000/k8s-dns-dnsmasq-nanny-amd64:1.14.8
docker tag gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.8 10.86.7.77:5000/k8s-dns-sidecar-amd64:1.14.8
docker push 10.86.7.77:5000/k8s-dns-sidecar-amd64:1.14.8

As you can see, I didn’t have the etcd image, and had to pull it first. You can check with “docker images” to see if you have to pull any images, before tagging and pushing. Note: For some systems, I’ve had to pull from k8s.gcr.io instead of gcr.io/google_containers.

The versions of etcd and kube-dns that KubeAdm uses are hard coded in the code, so I had to find out by trial and error. Looking at logs, I noticed that these pods were not coming up, and saw what image versions were being tried in the pulls (e.g. 10.86.7.77:5000/etcd-amd64:3.2.16). I would then, do a pull of that version from k8s.gcr.io or gcr.io/google_containers and push it up to my local repo.

Bringing Up The Cluster

To bring things up quickly, I’m using a current release (v1.0.6) of my lazyjack tool, but you can do the same manually, if you’re a masochist :). I installed it in /usr/local/bin so that it is in my path.

I created a config.yaml for my setup to represent the topology and addresses that I wanted to use (see lazyjack README.md for details on this file).

For this effort, we need to customize the generate kubeadm.conf file. As a convenience, I overrode the default work area to point to are area under the account I was using (though you can use the default /tmp/lazyjack/ area, if desired):

general:
   plugin: bridge
   work-area: "/home/c2/bare-metal/work-area"

I ran “sudo lazyjack init” to create the certificates needed and place them into the config.yaml file. I copied this updated YAML file to the minion nodes, which also have lazyjack installed.

Next, I ran “sudo lazyjack prepare” on the master, and each of the minion nodes. There should be bind9 and tayga containers running on the master, for the DNS64 and NAT64 servers, respectively. This step will also create a kubeadm.conf file in the work area.

To bring up the cluster with the local repo, we need to edit that file to change the kubernetesVersion line to the version tag we are using (instead of 1.9.0 that the tool created), and add the imageRepository line pointing to our repo. In this example, I used:

kubernetesVersion: v1.10.0-beta.2
imageRepository: 10.86.7.77:5000

Now, on the master, I ran “sudo lazyjack up”. This takes a few minutes, and the output should indicate success and provide the lines to setup kubectl:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run kubectl and make sure that all the pods and services are up and running. Then, you can run the “up” command on the other nodes and check with kubectl that the nodes area “ready” and the additional proxy pods are running.

You should be able to then use your IPv6 based cluster, running code from your local repo!

Issues

Please be sure to use docker version 17.03, as versions 17.06, 17.09, 17.12, 18.03, and 18.04 are showing that, even with the host enabling IPv6, any containers created have IPv6 disabled. The effect seen is that the kube-dns pod is stuck in “creating” state, and logs show that the CNI plugin is failing to add an IPv6 address to the pod (permission denied error). This is true with any user create pods that use the pod network and need an IPv6 address from the CNI plugin. This is discussed in this CNI issue and in this docker issue.

Category: bare-metal, Kubernetes | Comments Off

March 6

Istio on IPv6 Kubernetes – Undiscovered Country

V1.2

Overview

Since I was able to get a Kubernetes cluster running with IPv6 only on bare metal, the next logical step was to give a go at trying to bring up Istio. Like the Star Trek movie, this was something untried, and my goal in this blog is to document my efforts to try Istio on IPv6 as a Proof of Concept (PoC). Spoiler: I have it working, but the road to make this a reality, will take quite a few code changes.

Update: Had presented a summary of IPv6 readiness at Istio Community Meeting 3-22-18 At 15:55 mark.

Assumptions

This isn’t for the faint at heart, but I’ll try to make it as cookbook as possible (granted, I need to verify this on a fresh setup, in case my memory failed me on some steps). That said, I expect that you have bare metal systems available, network topology set up, needed tools installed (e.g. Go, GIT, docker, lazyjack), accounts set up on github.com and hub.docker.com, and have installed the Kubernetes that you want to use (1.9+).

For my setup, I have three Ubuntu 16.04 machines, each with IPv4 access to the outside, and a separate interface connected to a switch for Kubernetes management and pod networks. Go is at 1.9.2. I have a cloned Kubernetes master branch on February 14th, 2018 (commit f33e0b3), and built all the needed apps. Kubectl, kubeadm, and kubelet are at v1.9.3 and placed in /usr/bin/. It’s not critical to have the latest and greatest here, as long as it is 1.9+ code.

For Istio, I tried to use the path of least resistance and decided to use NodePort, instead of LoadBalancer, and not to use authentication. I plan on trying MetalLB that I previous tried on a IPv4 cluster.

Starting point

Since I had my LazyJack tool working, I used that to bring up my cluster with IPv6. It uses the reference bridge plugin, and has static routes so that nodes can communicate with each other. Here is the config.yaml that I used for my cluster:

plugin: bridge
topology:
 bxb-c2-77:
 interface: "enp10s0"
 opmodes: "master dns64 nat64"
 id: 2
 bxb-c2-78:
 interface: "enp9s0"
 opmodes: "minion"
 id: 3
 bxb-c2-79:
 interface: "enp10s0"
 opmodes: "minion"
 id: 4
support_net:
 cidr: "fd00:10::/64"
 v4cidr: "172.18.0.0/16"
mgmt_net:
 cidr: "fd00:20::/64"
pod_net:
 prefix: "fd00:40:0:0"
 size: 80
service_net:
 cidr: "fd00:30::/110"
nat64:
 v4_cidr: "172.18.0.128/25"
 v4_ip: "172.18.0.200"
 ip: "fd00:10::200"
dns64:
 remote_server: "64.102.6.247"
 cidr: "fd00:10:64:ff9b::/96"
 ip: "fd00:10::100"

Everything needed for this setup, was done by LazyJack in about five minutes, and worked just fine. I have a Kubernetes cluster running IPv6. Now, let the hacking begin!

Istio Preparation

Using the Developer’s Guide as reference, and knowing I already had Go installed, I went right to cloning and setting up the Istio repo…

export ISTIO=$GOPATH/src/istio.io
export HUB="docker.io/pmichali"
export TAG=pmichali
export GITHUB_USER=pmichali
export KUBECONFIG=${HOME}/.kube/config

mkdir -p ~/go/src/github.com/istio.io
cd ~/go/src/github.com/istio.io
git clone https://github.com/istio/istio
cd istio

You would want to substitute “pmichali” with your github.com and hub.docker.com username (I did same for tag name). Do a “docker login” to your hub.docker.com account, so that pushes will work later on.

I checked out the latest from master and built everything. In this case, I’m using commit 18a20f9 from March 4th, 2018 and used a separate branch.

git checkout -b trial-20180305

Spam, Spam, Eggs, and Spam…

Here’s were the fun starts. Several changes are needed to support IPv6. It’s not really that many, however, there are a few changes that, to make them permanent, will require some larger effort. In addition, changes are needed to the sample apps, like BookInfo.

As a starting point, I have a fork of Istio, where I’ve created an ipv6 branch that is based off of the March 4th, 2018 commit on master (18a20f9). You can take the latest commit (6299579) off of the ipv6 branch, or cherry pick the ones you want. I’ll make issues and submit PRs to Istio for the easy changes that I’ve made.

The first patch (commit a5451cd) modifies validation of proxy addresses in Pilot, to accept IPv6 addresses correctly.

The second patch (commit 0502713) changes the hostname to IP resolution in Pilot, to add the needed square brackets to IPv6 addresses (separating the port from IP part).

The third patch (commit e7e5d48) changes the bootstrap code, so that it can parse IPv6 for Pilot discovery, Zipkin, and statsd addresses that are stored in config.

The other bootstrap code patch (commit 21b82c4), changes a JSON template file, which has the side effect of altering the output of the test files. As a result, the patch also includes updated golden files, so that unit tests pass. For this to be upstreamed, we need to be able to test bootstrap with both IPv4 and IPv6, and have a way to allow deployment of the template file in either mode.

For the Envoy Pilot JSON file(commit 65dc3e9), the IP addresses are patched to use IPv6 addresses for localhost and any host. Like the previous patch, IPv6 is forced, and to upstream, this needs to be configurable, so that users can use either IPv4 or IPv6 mode.

There are two constants in Istio, which specify the wildcard and localhost addresses. This was patched (commit 814036f) so that IPv6 addresses are used. Like the one bootstrap change, this affects the output of the golden files, so they are included in this commit (quite a few of them). Again to make this upstreamable, this should be configurable, so that users could enable IPv6 mode and use IPv6 addresses everywhere.

I forgot to run lint, before each commit, so I did another commit (commit b7302ab) to fix those warnings, although the actual upstream commits would have to fix these warnings, and do a cleaner fix than the quick and dirty changes I did.

That is it for the base Istio code. We’ll talk about the some of the sample applications later in the blog.

Build Everything

Now that the code is changed, and we have the minimum unit test modifications so that things will pass, build everything (you’ll need to do “docker login”, before doing the push):

make
make docker
make push

install/updateVersion.sh -a ${HUB},${TAG}

I edited install/kubernetes/istio.yaml so that it uses NodePort, instead of LoadBalancer (search and replace), and uncommented the line selecting port (32000).

The Moment of Truth

Bring up the Istio components:

kubectl apply -f install/kubernetes/istio.yaml

You should see that all the services and pods are up and running, and most importantly, that pods are not restarting or in a crash loop. This is what I see on my setup:

$ kubectl get pods --all-namespaces -o wide
NAMESPACE      NAME                                READY     STATUS    RESTARTS   AGE       IP                   NODE
istio-system   istio-ca-5dfc8d9499-jlkdf           1/1       Running   0          10m       fd00:40::3:0:0:25f   bxb-c2-78
istio-system   istio-ingress-df5f9b947-rdn4g       1/1       Running   0          10m       fd00:40::4:0:0:12d   bxb-c2-79
istio-system   istio-mixer-7d95868d79-tmgf6        3/3       Running   0          10m       fd00:40::4:0:0:12c   bxb-c2-79
istio-system   istio-pilot-97d94c7f6-nr7nj         2/2       Running   0          10m       fd00:40::3:0:0:25e   bxb-c2-78
kube-system    etcd-bxb-c2-77                      1/1       Running   0          5h        fd00:20::2           bxb-c2-77
kube-system    kube-apiserver-bxb-c2-77            1/1       Running   0          5h        fd00:20::2           bxb-c2-77
kube-system    kube-controller-manager-bxb-c2-77   1/1       Running   0          5h        fd00:20::2           bxb-c2-77
kube-system    kube-dns-dcf744547-nzzr2            3/3       Running   0          5h        fd00:40::2:0:0:2d    bxb-c2-77
kube-system    kube-proxy-5vbjw                    1/1       Running   0          5h        fd00:20::3           bxb-c2-78
kube-system    kube-proxy-kf5cm                    1/1       Running   0          5h        fd00:20::4           bxb-c2-79
kube-system    kube-proxy-s479m                    1/1       Running   0          5h        fd00:20::2           bxb-c2-77
kube-system    kube-scheduler-bxb-c2-77            1/1       Running   0          5h        fd00:20::2           bxb-c2-77
$ kubectl get svc --all-namespaces -o wide
NAMESPACE      NAME            TYPE           CLUSTER-IP        EXTERNAL-IP   PORT(S)                                                            AGE       SELECTOR
default        kubernetes      ClusterIP      fd00:30::1                443/TCP                                                            5h        
istio-system   istio-ingress   LoadBalancer   fd00:30::2:1e82        80:30802/TCP,443:32379/TCP                                         10m       istio=ingress
istio-system   istio-mixer     ClusterIP      fd00:30::1:1fbc           9091/TCP,15004/TCP,9093/TCP,9094/TCP,9102/TCP,9125/UDP,42422/TCP   10m       istio=mixer
istio-system   istio-pilot     ClusterIP      fd00:30::3:ec89           15003/TCP,15005/TCP,15007/TCP,8080/TCP,9093/TCP,443/TCP            10m       istio=pilot
kube-system    kube-dns        ClusterIP      fd00:30::a                53/UDP,53/TCP                                                      5h        k8s-app=kube-dns

What About The Apps?

BookInfo

I’d be remiss, if I didn’t spin up the BookInfo app. After monkeying with this for a while, I realized that this app also needed some changes as well.

I did another patch (commit 4ea619d) that changes the bind address to “::” for book info, so it is listening on the the right IP/port. Also, since I was modifying the app, I needed a way to modify the images that were created, to use my changes. I updated the build_push_update_images.sh script to push the images created, to my repo (instead of docker.io/istio).

With this commit, I ran the script and provided a dummy version:

cd ~/go/src/istio.io/istio/samples/bookinfo
./build_push_update_images.sh 0.0.0
cd ../..
kubectl create -f install/kubernetes/istio-sidecar-injector-configmap-debug.yaml
kubectl apply -f <(istioctl kube-inject -f samples/bookinfo/kube/bookinfo.yaml --injectConfigMapName istio-inject)

Once everything is up, you can access the BookInfo productpage, by using the service IP or pod network IP and the port (9080). For example:

kubectl get svc --all-namespaces | grep productpage
default productpage ClusterIP fd00:30::2:8091 <none> 9080/TCP

curl [fd00:30::2:8091]:9080

To upstream, this app needs to be modified so that the user can select between an IPv4 and IPv6 variant.

Helloworld

This app also needed to be modified to listen on the IPv6 any address, so another patch was committed (commit 6299579). New images are created:

cd ~/go/src/istio.io/istio/samples/helloworld/src/
./build_service.sh

Then, I would tag and push the two images to my docker hub area:

docker tag istio/examples-helloworld-v1:latest docker.io/pmichali/examples-helloworld-v1:pmichali
docker tag istio/examples-helloworld-v2:latest docker.io/pmichali/examples-helloworld-v2:pmichali

docker push pmichali/examples-helloworld-v1
docker push pmichali/examples-helloworld-v2

Prior to applying the ~/go/src/istio.io/istio/samples/helloworld/helloworld.yaml, I modified it (in two places) to point to my images (e.g. docker.io/pmichali/examples-helloworld-v1:pmichali anddocker.io/pmichali/examples-helloworld-v2:pmichali) and I changed the imagePullPolicy to Always. The final step, is to then apply this YAML file:

cd ~/go/src/istio.io/istio/samples/helloworld/
kubectl apply -f helloworld.yaml

With this app, there is a nodeport, so you can access it from the service or pod network IP and port 5000, or the node IP using the nodeport:

kubectl get svc | grep helloworld
helloworld    NodePort    fd00:30::76ae             5000:30780/TCP   5m

kubectl get pods --all-namespaces -o wide | grep helloworld
default        helloworld-v1-6759b98975-c6vft      1/1       Running   0          4m        fd00:40::4:0:0:131   bxb-c2-79
default        helloworld-v2-7c6c464dc-g2pcl       1/1       Running   0          4m        fd00:40::3:0:0:263   bxb-c2-78

$ curl [fd00:30::76ae]:5000/hello
Hello version: v1, instance: helloworld-v1-6759b98975-c6vft
$ curl [fd00:40::3:0:0:263]:5000/hello
Hello version: v2, instance: helloworld-v2-7c6c464dc-g2pcl
$ curl [fd00:20::3]:30780/hello
Hello version: v2, instance: helloworld-v2-7c6c464dc-g2pcl

This app should be modified so that the IP mode is configurable.

Cleanup

For the apps, you can do:

cd ~/go/src/istio.io/istio/samples/helloworld/
kubectl delete -f helloworld.yaml
cd ~/go/src/istio.io/istio/
kubectl delete -f <(istioctl kube-inject -f samples/bookinfo/kube/bookinfo.yaml --injectConfigMapName istio-inject)

For Istio, run:

cd ~/go/src/istio.io/istio/
kubectl delete -f install/kubernetes/istio.yaml

To bring down Kubernetes, you can use “sudo lazyjack down” on minions and then master mode. Follow this with “sudo layjack clean” to remove everything related to the provisioning for Kubernetes.

Final Notes/Observations

I was noticing that, with the BookInfo app, I could “curl” to port 9080, using the service IP and the pod network IP, but I was unable to curl to the app from port 9080 using the node IP address. Also, the service didn’t show a nodeport for BookInfo, and using 32000 did not work either. I didn’t see the NodePort type called out in any of the YAML files. I not sure if there should have been a nodeport defined or if that should work.

With the helloworld app, I could access it from port 5000 using the service and pod IPs, and from port 32677 (shown for the service) using the node’s IP. This worked as expected.

The needed code changes will be easy, in fact, I plan on cleaning up what I have (and adding UTs for the changes). For the JSON and YAML file changes, some form of templating mechanism will be needed to allow operation in either IPv4 or IPv6 mode.

Keep in mind, that if you need to do some iterations on code changes, make sure that the deployment YAML files are set to “Always” pull images, or you need to ensure each node gets the updated version. I would do the following on my nodes (sometimes with the -f option):

docker rmi `docker images --format="{{.ID}} {{.Repository}} {{.Tag}}" | grep pmichali | cut -f 1 -d" "`
docker rmi `docker images --format="{{.ID}} {{.Repository}} {{.Tag}}" | grep istio| cut -f 1 -d" "`

docker rmi `docker images --format="{{.ID}} {{.Repository}} {{.Tag}}" | grep helloworld | cut -f 1 -d" "`
docker rmi `docker images --format="{{.ID}} {{.Repository}} {{.Tag}}" | grep bookinfo | cut -f 1 -d" "`

Also, if you run the updateVersion.sh script, you’ll need to make sure that istio.yaml has NodePort set, instead of LoadBalancer, and the port 32000 line uncommented.

Some thought will be needed on how to setup the samples for either IPv4 or IPv6 mode of operation.

Category: bare-metal, Istio, Kubernetes | Comments Off

February 19

Lazyjack – Provisioning bare-metal for IPv6 Kubernetes

v1.4

I’ve been experimenting with IPv6, Kubernetes, and Istio using Docker-In-Docker. One difficulty I’ve been having is accessing the cluster externally, as the whole cluster is running in docker containers on one VM.

I decided to try to get Kubernetes running on multiple bare-metal nodes. Well, this turned out to be quite challenging, as there are many configuration settings and tweaks needed to make this work.

Not wanting to have to endure that agony, each time I set things up, or spend hours with others’ who want to do the same thing, I decided to write a small Go app to automate this setup. Lazyjack is the culmination of that effort.

You can find details on how to set up and use Lazyjack from the Github repo, but I’ll run through the steps here, using a two system setup I have in a lab.

Step 1: Get Everything Needed

Hardware: I already had two Ubuntu 16.04 systems, each with a pair of interfaces, one for SSH access to the box for provisioning, and one connected to an L2 switch, which would be used for the “management” network for Kubernetes. This second interface was new, and didn’t have any configuration on it.

Both boxes have access to the Internet (V4, using NAT in the lab), so that I can access repos and pull down stuff.

Update: If you want to be able to access remote IPv6 sites, without doing NAT64 (and using their IPv4 address), enable IPv6 and forwarding on each node, with an IPv6 address on the main interface. If using SLAAC, ensure system_ra=2 for the main interface, using sysctl.

Software: Being development systems, docker 17.03.2-ce and Go 1.9.2 were installed. I think these systems already had openssl installed. Likewise, Kubernetes was installed (sudo apt-get install kubernetes kubelet kubeadm) on these systems.

Update: You should install CNI v0.7.1+ on the systems, otherwise, there may be issues with IPv6 support (e.g. ip6tables configuration).

Lazyjack: The easiest way is to download the latest release, untar, and place the executable in your system path on each system. For example, for the first release:

mkdir ~/bare-metal
cd ~/bare-metal
wget https://github.com/pmichali/lazyjack/releases/download/v1.0.0/lazyjack_1.0.0_linux_amd64.tar.gz
tar -xzf lazyjack_1.0.0_linux_amd64.tar.gz
sudo cp lazyjack /usr/local/bin

Note: The tar file name may be different, based on the version of lazyjack you use.

Alternately, you can get the repo:

go get github.com/pmichali/lazyjack

build it:

cd ~/go/src/github.com/pmichali/lazyjack
go build cmd/lazyjack.go

And then move the executable to your system path on each system. The sample-config.yaml can be used as a template for the configuration.

Step 2: Create a Configuration File

I’m lazy, on the system I was going to use as the master node, I just took the sample-config.yaml, and renamed it config.yaml. That file has the following network definitions already set up:

Management network – fd00:20::/64

Support network – fd00:10::/64

Pod network – fd00:40:0:0:X/80

Service network – fd00:30::/110

DNS64 network – fd00:64:ff9b::/96

The only thing I needed to do was identify the hostnames I was using, and the interface name for the interface that would be used for the management network. The definitions I used were:

topology:
    bxb-c2-77:
        interface: "enp10s0"
        opmodes: "master dns64 nat64"
        id: 2
    bxb-c2-79:
        interface: "enp10s0"
        opmodes: "minion"
        id: 3
support_net:

As you can see, bxb-c2-77 will be the master node, and it will have dns64 and nat64 containers running on it, to support IPv6 on the cluster. The sole minion is bxb-c2-79, but you can clearly more nodes listed here. Likewise, you can use a separate node for the dns64 and nat64 services.

Each node has a unique (and arbitrary), ID from 2-65535 (but why use huge numbers?).

Update: You can configure DNS64 to allow use of IPv6 addresses, so that we can directly access external sites that support IPv6:

dns64:
    allow_ipv6_use: true

With that, we are ready to get things rolling…

Step 3: Initialize For Kubernetes

On the master (bxb-c2-77 in my case), run lazyjack (I’m assuming it is in your path) with the init command (from the area where the config.yaml file is, so that you don’t have to specify the location):

sudo lazyjack init

Yes, you need to run all lazyjack commands as root, because privileged access is needed to various resources. If you don’t run as root, you’ll see a permission denied error.

If you are curious as to what it does, you can add the “-v 4” option, before the “init” argument.

This command will create needed certificates and keys needed for Kubernetes, and will place information into the configuration file (config.yaml), with a .bak preserving the previous version (multiple runs of this command will overwrite that, BTW). Also, the file will be, obviously, owned by root, but the permission changed to 0777, so that you can edit the file, if needed later.

You must copy the configuration file to all other nodes, now that it has the updated information.

Step 4: Prepare the Systems

Running lazyjack with the “prepare” command, will get a system ready for running Kubernetes. Run this command on each node.

Note: this command will generate a kubeadm.conf file in the work area (default /tmp/lazyjack) of the master node. If desired, you can customize this file to specify different settings desired for the cluster. For example, you can change the kubernetesVersion line, to pick a different version than 1.9.0 that was generated.

Step 5: Cluster Bring-up – Master First

On the master, run lazyjack with the “up” command. This will take a few minutes, as it starts up KubeAdm. Once completed, you can setup kubectl by doing:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

On subsequent runs, I usually do a “rm -rf ~/.kube”, prior to these commands.

Now, you can run “kubectl get nodes -o wide” to see that this node is up, and “kubectl get pods –all-namespaces -o wide”, to see when Kubernetes is fully up. You’ll see something like this:

NAMESPACE   NAME                              READY  STATUS   RESTARTS AGE IP                NODE
kube-system etcd-bxb-c2-77                    1/1    Running  0        2m  fd00:20::2        bxb-c2-77
kube-system kube-apiserver-bxb-c2-77          1/1    Running  0        2m  fd00:20::2        bxb-c2-77
kube-system kube-controller-manager-bxb-c2-77 1/1    Running  0        2m  fd00:20::2        bxb-c2-77
kube-system kube-dns-dcf744547-k56t2          3/3    Running  0        3m  fd00:40::2:0:0:29 bxb-c2-77
kube-system kube-proxy-m9z9m                  1/1    Running  0        3m  fd00:20::2        bxb-c2-77
kube-system kube-scheduler-bxb-c2-77          1/1    Running  0        2m  fd00:20::2        bxb-c2-77

You can untaint the master, if you want to be able to create pods on that node.

Step 6: Cluster Bring-up – Minions

After you are sure that the master is completely up (all pods and services running), go onto each of the minion nodes, and run the same “up” command. The command should complete quickly, and you can check the status of the node, using the “kubectl get nodes” command on the master. It does take a bit for the minions to become ready. Likewise, you can use the “kubectl get pod” output to see that a proxy is running for each minion.

Note: The reason we don’t do all of the steps on one node, is because lazyjack will setup static routes to other nodes, and the interfaces must be set up on those systems first.

Step 7: Enjoy!

That’s it. You can now play with Kubernetes, creating pods that will have IPv6 addresses, and who should be able to ping6 to other pods on other nodes and have external access to the Internet.

Step 8: Cleanup

You can run the “down” and then “clean” commands on each minon, and then the master to clean things up.

Troubleshooting

Problems Bringing Up a Minion

If the “up” command on a minion fails, you can retry it with “-v 4” to see verbose output. Then, you can manually perform some of the steps that are shown. In one case, I had kubeadm join failing and when running manually, I saw:

c2@bxb-c2-78:~/bare-metal$ sudo kubeadm join --token ...
[preflight] Running pre-flight checks.
 [WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Some fatal errors occurred:
 [ERROR Port-10250]: Port 10250 is in use

This occurs when the kubelet service is already running and using that port. You can stop the service, and then do the “lazyjack up” command or, just run the “down” and then “up” command and that should reload the daemon, and restart the service.

Category: bare-metal, Go, Istio, Kubernetes, Linux | Comments Off

January 23

Istio, Kubernetes with Load Balancer, on Bare Metal…Oh My!

V1.0 01/23/2018

I found a load balancer that works on bare-metal and decided to do a quick write-up of my findings. This blog assumes that you have a basic understanding on how to bring up Kubernetes and Istio, so I won’t go into the nitty gritty details on those steps.

Preparations

The following information indicates the versions used (others may work – this is just what I used), and the basic infrastructure.

For hardware, I used two Cisco UCS blades as the hosts for my cluster, with one acting as master and one acting as a minion. On each system, the following was installed/setup…

Ubuntu 16.04 64 bit server OS.
Go version 1.9.2.
KubeAdm, kubelet, and kubectl v1.9.2.
Docker version 17.03.2-ce.
Account set up on hub.docker.com for docker registry.
Using Istio master branch, cloned on January 22nd 2018 (commit 23306b5)
Hosts on lab network with access externally.
Four available IPs for external IP pool.

Step 1: Bring Up KubeAdm

For Kubernetes, I used the reference bridge plugin, which needs a CNI config file and static route on each host. On the minion, I did this:

cat >/etc/cni/net.d/cni2.conf<<EOT
{
    "cniVersion": "0.3.0",
    "name": "dindnet",
    "type": "bridge",
    "bridge": "dind0",
    "isDefaultGateway": true,
    "ipMasq": false,
    "hairpinMode": true,
    "ipam": {
        "type": "host-local",
        "ranges": [
          [
            {
              "subnet": "10.193.0.0/16",
              "gateway": "10.193.0.1"
            }
          ]
        ]
    }
}

sudo ip route add 10.192.0.0/16 via <ip-of-master>

On the master, I did:

cat >/etc/cni/net.d/cni2.conf<<EOT
{
    "cniVersion": "0.3.0",
    "name": "dindnet",
    "type": "bridge",
    "bridge": "dind0",
    "isDefaultGateway": true,
    "ipMasq": false,
    "hairpinMode": true,
    "ipam": {
        "type": "host-local",
        "ranges": [
          [
            {
              "subnet": "10.192.0.0/16",
              "gateway": "10.192.0.1"
            }
          ]
        ]
    }
}
EOT

sudo ip route add 10.193.0.0/16 via <ip-of-minion>

On the master, I created this kubeadm.conf file, which has configuration lines for Istio (and specifies the IP of the master for advertised address):

cat >kubeadm.conf<<EOT
apiVersion: kubeadm.k8s.io/v1alpha1
kind: MasterConfiguration
kubernetesVersion: v1.9.0
api:
 advertiseAddress: "<ip-of-master>"
networking:
 serviceSubnet: "10.96.0.0/12"
tokenTTL: 0s
apiServerExtraArgs:
 insecure-bind-address: "0.0.0.0"
 insecure-port: "8080"
 runtime-config: "admissionregistration.k8s.io/v1alpha1"
 feature-gates: AllAlpha=true
EOT

With all the pieces in place, the master node was brought up with:

sudo kubeadm init --config kubeadm.conf

Then, the minion was joined by using the command output from the init invocation on the master (using sudo). Back on the master, I did the obligatory commands to access the cluster with kubectl, and made sure everything was up OK:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Step 2: Start Up Load Balancer

I cloned the repo for MetalLB and then applied the metallb.yaml file, but you can do what the install page shows:

kubectl apply -f https://raw.githubusercontent.com/google/metallb/v0.3.1/manifests/metallb.yaml

I decided to use the ARP method, instead of BGP, as the setup is super easy. Using the example they provided, I created this config file:

apiVersion: v1
kind: ConfigMap
metadata:
 namespace: metallb-system
 name: config
data:
 config: |
   address-pools:
   - name: my-ip-space
     protocol: arp
     arp-network: <start-ip-of-subnet>/26
     cidr:
     - <start-ip-of-pool>/30

If my systems were on a /24 subnet, I wouldn’t have needed the arp-network line. Under “cidr” the start address of the pool and the prefix is specified.

I applied the yaml file using kubectl and made sure that the metallb controller and speaker pods were running. You can check the log of the speaker to ensure that things started up OK, and later to see if IPs are being assigned:

kubectl logs -l app=speaker -n metallb-system

Step 3: Start Up Istio

I followed the instructions in the Istio Dev Guide page, to build and start up Istio.

The repo was pulled and a branch created based on the latest from the master branch. I built the code and pushed to my docker repository, ran updateVersion.sh, and then started Istio with:

kubectl apply -f install/kubernetes/istio.yaml
kubectl apply -f install/kubernetes/istio-initializer.yaml

I verified that everything was runing, and that the istio-ingress service was using the LoadBalancer type and had the first IP address from the pool defined for MetalLB as the external IP. The speaker log for metalLB will show that an IP was assigned.

Step 4: BookInfo

We would be remiss, if we didn’t start up the book info application and then try to access the product page using the external address:

kubectl apply -f samples/bookinfo/kube/bookinfo.yaml

After this is running, I opened my browser window on my laptop, and went to http://<external-ip>/productpage/ to view the app!

Ramblings

The MetalLB setup was painless and worked well. I haven’t tried with /31 for the pool, but it does work with /29 and I suspect larger sizes. I also didn’t try using BGP, instead of ARP.

Category: bare-metal, Istio, Kubernetes | Comments Off