IPv6 Support for Docker In Docker
V1.16 – June 11th 2018
To be able to use Docker-In-Docker (DinD) to bring up a cluster with IPv6 only networking. This would work in an environment where the cluster is connected to an IPv4 only network. This allows DinD to be used for running IPv6 based end-to-end(E2E) tests with Google Cloud Platform.
These instructions are for early adopters, who want to start playing with IPv6 clusters.
DinD allows you to create a multi-node cluster, using containers for the nodes, instead of VMs. It leverages off of KubeAdm and provides a quick an easy way to setup a cluster for testing and development.
For the team I am on, we wanted to leverage off of this to be able to setup a cluster with support for IPv6 on pods created (and later for the cluster infrastructure). We’re targeting using DinD for E2E testing of IPv6 logic for Kubernetes. This will work with running DinD using local resources or Google Compute Engine. For this blog entry, we’ll use local resources to start things up, and a bare-metal server running Ubuntu as the host.
DinD supports several CNI plugins, but for the purposes of IPv6 clusters, we’ll use the default bridge and host-local v0.7.1+ CNI plugins that have been tested with IPv6 support.
The DinD modifications also include DNS64 and NAT64 containers. These are used so that an IPv6 only cluster can be connected to an IPv4 outside world. Every DNS lookup will go to the DNS64 server, which will use a remote DNS server (customizable) to obtain the IPv4 address for the host (A record). From that, an IPv4-embedded IPv6 address, with a specific prefix, will be generated and used. When NAT64 sees this prefix, it will handle V6 to V4 address translation to be able to access the external host.
Update: If your topology supports accessing the Internet via IPv6, you can now set the environment variable DIND_ALLOW_AAAA_USE to true, and when a DNS lookup occurs for an external site that supports IPv6, the AAAA record will be used. This allows pods in the cluster to directly access the site, without using an IPv4-embedded IPv6 address and NAT64.
For this exercise, I’ll describe how to do this using Ubuntu 16.04 on a bare metal system, but have also played with this on both native Mac (doesn’t support IPv6) and inside a Ubuntu VM running under Vagrant/VirtualBox. You can adapt this for other operating systems as needed.
I’m using Go version 1.10 and docker version 17.03.2-ce (though have used 17.09.0-ce and 17.11.0-ce too). Note: I had used docker.io 1.13.1 and was seeing an error applying iptables rules, early in the process.
You’ll want git, liblz4-tool (“brew install lz4” for Mac), build-essential (for “make”), and sha1sum (should be there on Ubuntu, “brew install md5sha1sum” on Mac). Some of these are needed, if you plan on updating the DinD code.
Obtain the DinD Code
The IPv6 changes are upstreamed now, so you can clone the latest DinD repo:
cd git clone https://github.com/kubernetes-sigs/kubeadm-dind-cluster dind
Obtain the Kubernetes Code and Add IPv6 Patches
Since we want to use the latest Kubernetes code, instead of 1.6, 1.7, or 1.8, which are provided with DinD, we need to clone the repo:
cd ~/dind git clone https://github.com/kubernetes/kubernetes.git cd kubernetes
Since, we’ll want to use this Kubernetes repo with changes, we’ll set environment variables to indicate to build from the local repo:
export BUILD_KUBEADM=y export BUILD_HYPERKUBE=y
On your host, make sure that IPv6 is enabled and IPv6 forwarding. You can do this using:
sysctl -w net.ipv6.conf.all.disable_ipv6=0 sysctl -w net.ipv6.conf.all.forwarding=1
For IPv6 only mode, all you need is to set the environment variable IP_MODE to “ipv6”. However, there are several environment variables that you can export to customize things:
|Environment Variable||Example value||Description|
|DIND_SUBNET||fd00:77::||For IPv6 only mode, to specify the subnet (default is fd00:10::).|
|REMOTE_DNS64_V4SERVER||DNS server that will receive forwarded DNS64 requests for external systems (default is 188.8.131.52). Note: I was using a lab system, and needed to use a local DNS, as it was not forwarding requests to an external DNS.|
|DNS64_PREFIX||fd00:77:64:ff9b::||Prefix that will be used for all DNS resolutions by the built-in DNS64 server (default is 64:ff9b::).|
|SERVICE_CIDR||fd00:77:30::/110||Subnet used for service pods (default is fd00:30::/110).|
See dind-cluster.sh for other environment variables that can be customized, if you have special requirements.
Once the environment variables are set, a cluster can now be brought up, by using the following commands:
export IP_MODE=ipv6 cd ~/dind/kubernetes ../dind-cluster.sh up
You can then create pods, access them, and ping across nodes to other pods, and access external sites. The kubectl command can be run from the host, so you don’t have to access the master node.
When invoking other dind-cluster.sh commands (e.g. down, clean, routes), be sure to run them from the Kubernetes area (e.g. ~/dind/kubernetes/).
If you see a problem with the kube-proxy pod not coming up, take a look at “Kube-proxy failures – The Saga of Conntrack Max” section below for how to resolve this.
Note: The default for IP_MODE is “ipv4”, which brings up an IPv4 only cluster.
Say you have modified some of the Kubernetes code and want to recreate the cluster (after you have brought it down and cleaned). Here are the steps I did, from a setup which already has the DinD code and Kubernetes area, but additional changes have been made to Kubernetes code.
First, I’ll set all the environment variables that I happen to use, just because I’m starting in a new window:
cd ~/dind export IP_MODE=ipv6 export BUILD_KUBEADM=y export BUILD_HYPERKUBE=y
In my lab system, I wanted to use different IPs, and need to use a local DNS server, I did these settings as well:
export REMOTE_DNS64_V4SERVER=184.108.40.206 export DNS64_PREFIX=fd00:77:64:ff9b:: export DIND_SUBNET=fd00:77:: export SERVICE_CIDR=fd00:77:30::/110
Finally, the cluster can be brought up:
cd kubernetes ../dind-cluster.sh up
If another change is needed, you can do “../dind-cluster.sh clean”, update the Kubernetes code, and then run “../dind-cluster.sh up”.
Using Google Compute Engine
If you want to use Google Cloud Platform, setup an account (see my recent blog post on this) and then set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the JSON file with credentials.
There is an example script, gce-setup.sh that can be used as an example for how to use DinD with GCE. For IPv6, you can set the following environment variables and then source the script (from the top of the kubernetes repo, since we are using a local repo for the latest Kubernetes code).
export IP_MODE=ipv6 export BUILD_KUBEADM=y export BUILD_HYPERKUBE=y # Set additional env variables as mentioned above, if desired. cd ~/dind/kubernetes source ../gce-setup.sh
If you are a developer, and want to use customized DinD code, you can set the following flag, to tell DinD to build and use the custom (local) DinD image:
Note: GCE currently only has IPv4 access to the outside world, so the DIND_ALLOW_AAAA_USE flag cannot be set, and all accesses to external sites will use NAT64.
IPv6 and MacOS
Curently, docker on the Mac, does not support IPv6, so you cannot run DinD directly on the Mac in IPv6 mode. You can however, spin up a Ubuntu VM (e.g. VirtualBox) and run DinD inside that VM.
Kube-proxy failures – The Saga of Conntrack Max
Kube-proxy, upon startup, checks the conntrack max setting, and if it is more than four times larger than the hashsize, will attempt to increase the conntrack hashsize. For most cases, like GCE or a VM on a Mac, this will not occur. However, if you have a large system with many CPUs, then this adjustment will be attempted.
Unfortunately, there is a docker bug () where, in a nested docker environment (like DinD), the update fails. Kube-proxy checks to see if the file system is writeable (it says it is), and then tries to update a file and fails with a read-only error. The kube-proxy log will show this message:
write /sys/module/nf_conntrack/parameters/hashsize: operation not supported
DinD attempted to work-around this problem, by telling kube-proxy, via command line settings, to skip changing any settings related to conntrack. Unfortunately, kube-proxy was recently changed, such that the config file, if present (it is with Kubernetes 1.9+), will take precedence over the CLI settings. The config file does not have any settings for conntrack, and so defaults are used, which on some systems will cause kube-proxy to attempt to update the hash size (and it fails).
For example, I have a system with 32 CPUs and kube-proxy uses a default conntrack value of 32768 for max-per-core. We end up with a conntrack value of 1048576 (32768 * 32), which is more than four times the system’s hashsize of 65536.
I created issue #50 in the kubeadmin-dind-cluster project to address the problem. This issue has a patch for Kubeadm, which can be used as a temporary workaround on systems that exhibit the kube-proxy crash, when attempting to update hashsize. If you see the kube-proxy crash with docker log messages like below, you can include the patch from the issue into the code:
I1130 18:53:44.150679 1 conntrack.go:52] Setting nf_conntrack_max to 1048576 I1130 18:53:44.152679 1 conntrack.go:83] Setting conntrack hashsize to 262144 error: write /sys/module/nf_conntrack/parameters/hashsize: operation not supported
An alternative to doing a patch, is to look at the hashsize value trying to being set (e.g. 262144), and setting that manually on the host, before doing “dind-cluster.sh up”. For example:
sudo su echo "262144" > /sys/module/nf_conntrack/parameters/hashsize cat /sys/module/nf_conntrack/parameters/hashsize
For NAT64, a Tayga container built by Daneyon Hansen is used. It has a private IPv4 pool for mapping local IPv6 addresses to IPv4 addresses that can be NATed, and used for external site access. The pool is set to a /25 (126 hosts) subnet. If a larger pool is needed, the container would need modifications.
Lab Systems and DNS
You may see a case where DNS64 is unable to forward the DNS requests to an outside system like 220.127.116.11. In those cases, I’ve had success with using a company DNS server instead. Hence the programmable setting REMOTE_DNS64_V4SERVER. 🙂
Warning on noswap
When you bring up/tear down a cluster, you may see this warning about swap:
WARNING: No swap limit support
I’ve seen the warning on Ubuntu 16.04, and it was benign, however other people have seen issues with bringing up the cluster under CentOS 7.
To deal with this, you can check what devices are listed in the output of “cat /proc/swaps”. If there is an entry, you can turn off swap, by running the follow command (showing example for /dev/dm-1):
swapoff -v /dev/dm-1 rm /dev/dm-1