v1.2 – June 14th 2018
With current Kubernetes IPv6 only clusters (v1.9.0+), a brute force approach was taken, to deal with the outside world. Since there are some external sites that are IPv4 only, Kubernetes was set up with a NAT64 and DNS64 server to treat all external destinations as IPv4 only.
Here, we’ll talk about ways to more intelligently handle external sites, using IPv6 access, when possible. The result is an improvement in performance, both in space and time.
What We Have Today
Let’s use an example of a pod on a minion node of a three node, bare-metal, IPv6 only Kubernetes cluster, trying to ping google.com.
First, the pod requests a lookup of the destination name, to obtain the IP address. Since not all destinations support IPv6 (e.g. github.com), the DNS64 server in the cluster is configured to always use the A record (IPv4) and ignore any AAAA record (IPv6). The IPv4 address will be embedded into a synthesized IPv6 address, using the configured prefix. In this example, the address 220.127.116.11 is combined with the fd00:10:64:ff9b:: prefix to get fd00:10:64:ff9b::d83a:d94e.
The pod (fd00:40::3:0:0:4e7) will then send a ping request, out it’s interface (to fd00:10:64:ff9b::d839:d94e), as shown at (A) in the diagram below.
The ping request will cross the local bridge, br0, and the routing table on the node will direct the packet, over the pod network, to the master node. The packet will be sent (B) from the minion node’s eth1 interface (fd00:20::3) to the master node’s pod network interface (eth1). The route on the master node, will direct the packet to the NAT64 server (a container), over the veth interface.
The NAT64 server (C) creates mapping of source IPv6 address (at this point the minion node’s pod network interface fd00:20::3) to a private IPv4 address (172.18.0.53) from a locally maintained pool. It will extract the destination IPv4 address (18.104.22.168) and send the ping to the master node (D), where iptables employs SNAT to map the private IPv4 address to the node’s IPv4 address (e.g. 10.1.1.2).
Finally, the packet is sent out the main interface (E) to the next hop, which would also do SNAT for this local IPv4 address.
The ping response would follow the reverse route thought the NAT64 server, to the minon node, and finally the pod.
Improvements For IPv6 External Sites
We can, however, configure the DNS64 to allow AAAA records to be used, for external destinations that support IPv6 addressing.
In this example, the DNS lookup would return the AAAA record for google.com (2607:f8b0:4004:801::200e) and the pod shown at (A) would send a ping to that address, as shown in the diagram below.
The ping request would traverse the local bridge, br0, and the routing table on the minion node would direct the packet out the main interface (eth0), and using SNAT, would use the IP of the node as the source address (2001:db8::100), as shown at (B). The packet would be sent to the next hop, where SNAT may occur, if the minion node’s IPv6 address is not public.
The ping response would follow the reverse route, into the minion node, and to the pod.
This avoids sending the packets to the master node’s NAT64 server, where translation and mapping is performed, both a time and space savings (no mapping table needed).
Bare Metal Implementation Details
The Lazyjack tool has been modified (in v1.1.0+) to allow the user to specify whether or not destinations that support IPv6 addressing can be directly accessed, without using NAT64.
Under the dns64 section in the config.yaml, there is a new entry titled “allow_aaaa_use”, which if set to “true”, will use the AAAA records from DNS64 and directly access external IPv6 addresses. If omitted, or set to “false”, the existing mechanism of using only the A DNS record and performing NAT64 on all packets for external destinations.
Before using Lazyjack, the nodes of the cluster must be provisioned for IPv6. One each node, this includes:
- Enabling IPv6 and IPv6 forwarding on main interface.
- Giving the main interface (with Internet access) an IPv6 address (we used SLAAC).
- Having a default IPv6 route that sends traffic out the main interface (done via SLAAC).
- To preserve the default route, set sysctl accept_ra with a value of two. For example:
sudo sysctl net.ipv6.conf.eth0.accept_ra = 2
KubeAdm-dind-cluster (DinD) Implementation Details
As of PR 148 merging, the Kubeadm-dind-cluster tool (note the new repo location) for provisioning clusters has been updated to allow the user to enable the ability to use (IPv6) AAAA records for DNS lookups, so that unaltered IPv6 addresses can be used, rather than forcing the use of (IPv4) A records and requiring DNS64 to be used. This new capability can be enabled by setting the environment variable, DIND_ALLOW_AAAA_USE=true.
The k-d-c tool will then use a modified DNS64 configuration, and create the needed ip6tables entries on the host to allow forwarding of packets to the kubeadm-dind-net bridge, and perform SNAT for outgoing packets.
You can check the PR, and once merged, use the latest code on the master branch.