IPv6 Multi-node On Bare-Metal
In a previous blog entry, I was able to bring up a cluster on a three node bare-metal setup (with Calico plugin), and then switch to IPv6 and create pods with IPv6 addresses. At the time, I just did a cursory check and, made sure I could ping the pod using its IPv6 address.
Well, the devil is in the details. When I checked multiple pods, I found a problem where I could not ping a pod from a different node, or ping pod to pod, when they are on different nodes.
Looking at the routing table, I was seeing that there was a route for each local pod on a node, using the cali interface. But, there were no routes to pods on other node (using the tunl0 interface), like I was seeing with IPv4:
IPv4:
192.168.0.0/26 via 10.87.49.79 dev tunl0 proto bird onlink blackhole 192.168.0.128/26 proto bird 192.168.0.130 dev calie572c5d95aa scope link 192.168.0.192/26 via 10.87.49.77 dev tunl0 proto bird onlink
IPv6
2001:2::a8ed:126:57ef:8680 dev calie9323554a97 metric 1024 2001:2::a8ed:126:57ef:8681 dev calid6195fe85f3 metric 1024 blackhole 2001:2::a8ed:126:57ef:8680/122 dev lo proto bird metric 1024 error -22
When checking “calicoctl node status” it showed IPv4 BGP peers, but no IPv6 BGP peers. I found that in calico.yaml, I needed to have this:
# Auto-detect the BGP IP address. - name: IP value: "" - name: IP6 value: "autodetect" - name: IP6_AUTODETECT_METHOD value: "first-found"
From what I understand, leaving IP value empty, means it will autodetect and use that IP. For IPv6 though, if IP6 is set to empty value or the key is missing, the IPv6 BGP is disabled.
Also, I was using the :latest label for CNI, calico-node, and calico-ctl images. Changed those to :master to get the recent changes.
Now, when nodes join, I see BGP peer entries for both IPv4 and IPv6:
Calico process is running. IPv4 BGP status +--------------+-------------------+-------+----------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+----------+-------------+ | 10.87.49.79 | node-to-node mesh | up | 21:07:06 | Established | | 10.87.49.78 | node-to-node mesh | up | 21:07:12 | Established | +--------------+-------------------+-------+----------+-------------+ IPv6 BGP status +--------------+-------------------+-------+----------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+----------+-------------+ | 2001:2::79 | node-to-node mesh | up | 21:07:06 | Established | | 2001:2::78 | node-to-node mesh | up | 21:07:12 | Established | +--------------+-------------------+-------+----------+-------------+
I proceeded to create three pods using IPv4. I could ping from one host to each pod, and from one pod, to the other two pods on different nodes. Each host had routes like these:
192.168.0.2 dev cali812c8ee8317 scope link 192.168.0.64/26 via 10.87.49.79 dev tunl0 proto bird onlink 192.168.0.128/26 via 10.87.49.78 dev tunl0 proto bird onlink
Next, I switched to IPv6 (enabled in 10-calico.conf on each node, and added IPv6 pool on master node) and create three more pods. Had an issue, as master node had old docker image for CNI, which didn’t have latest fixes. Ended up deleting the image, redeploying CNI, and then deleting and recreating pods. See routes like this now:
2001:2::6d47:e62d:8139:d1e9 dev calicc4563e7a35 metric 1024 blackhole 2001:2::6d47:e62d:8139:d1c0/122 dev lo proto bird metric 1024 error -22 2001:2::8f3a:d659:6d15:1880/122 via 2001:2::79 dev br_api proto bird metric 1024 2001:2::a8ed:126:57ef:8680/122 via 2001:2::78 dev br_api proto bird metric 1024
Where br_api is my main interface (a bridge for a bonded interface). I’m able to ping from host to pod and pod to pod across hosts.
Note: this was not working for one of the pods, and the packet was not getting past the cali interface on that pod. I checked and on that node, forwarding was disabled (not sure why). I did the following, and now pings work:
sysctl net.ipv6.conf.all.forwarding=1
Not sure how to persist this (don’t see it in /etc/sysctl.conf or /etc/sysctl.d/* on any system).
Another curious thing. When I was checking tcpdump to trace the ICMP packets, I was seeing these type messages:
13:42:15.649100 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 51) devstack-77.56087 > 2001:2::79.bgp: Flags [P.], cksum 0x412f (incorrect -> 0x382c), seq 342:361, ack 343, win 242, options [nop,nop,TS val 63676370 ecr 63075039], length 19: BGP, length: 19 Keepalive Message (4), length: 19 13:42:15.649199 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 32) 2001:2::79.bgp > devstack-77.56087: Flags [.], cksum 0xa391 (correct), seq 343, ack 361, win 240, options [nop,nop,TS val 63114134 ecr 63676370], length 0
Wondering why the (BGP?) packet from devstack-77 system has an incorrect checksum, but don’t see that in response. I see the same thing on other nodes, again, only with responses from devstack-77:
13:44:08.682811 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 51) 2001:2::78.bgp > devstack-71.33001: Flags [P.], cksum 0xfef8 (correct), seq 343:362, ack 342, win 240, options [nop,nop,TS val 63182410 ecr 63183301], length 19: BGP, length: 19 Keepalive Message (4), length: 19 13:44:08.682864 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 32) devstack-71.33001 > 2001:2::78.bgp: Flags [.], cksum 0x411d (incorrect -> 0x5ab3), seq 342, ack 362, win 242, options [nop,nop,TS val 63226403 ecr 63182410], length 0
In any case, it looks like IPv6 communication is working! For reference, here is the calico.yaml file used: calico.yaml