Kubernetes and Contiv on Bare-Metal with L3/BGP
Building on the previous blog of running Kubernetes with Contiv on bare-metal (https://blog.michali.net/2017/03/07/kubernetes-with-contiv-plugin-on-bare-metal/), I’m trying to do this with L3/BGP. To do this, an upstream router will be used to act as a BGP route reflector. In my case, I’m using a Cisco Nexus 9
Preparing Hosts
From CIMC on each UCS box, I created another pair of VNICs, setup in access mode, with a VLAN (3290) that is within the allowed VLANs for the port-channel on the Top of Rack (ToR) switch.
From CentOS, I created another pair of interfaces (b0 and b1), and a bonded interface (b). I verified that the MACs on the slave interfaces matched the MACs on the VNICs created in CIMC.
Note: if you still have the “t” interface (with slaves t0 and t1, which are associated with trunk veths) from the blog entry for using Contiv with L2 interfaces, you need to disable that interface, as only one uplink is supported. I changed “ONBOOT = no” in /etc/sysconfig/network-scripts/ifcfg-t.
Preparing the ToR Switch
On the ToR switch, BGP should be set up. In my case, I have a pair of Cisco Nexus 9Ks, which have port-channels for each of the nodes (using bonded interfaces on the nodes). There is an allowed VLAN on the port channels (3290) that will be used for L3/BGP. First the needed features were enabled and a VRF was created (used 30.30.30.2 on one N9K and 30.30.30.3 on the other):
feature interface-vlan feature bgp
vrf context contiv rd 30.30.30.2:1 address-family ipv4 unicast
Then, the BGP AS was created and neighbors defined. My three nodes neighbor addresses will be 30.30.30.77/78/79. The router ID on one N9K is 30.30.30.2 and on the other 30.30.30.3 (these two are used for the bonded interfaces).
router bgp 65000 router-id 30.30.30.2 cluster-id 30.30.30.2 log-neighbor-changes address-family ipv4 unicast vrf conti neighbor 30.30.30.77 remote-as 65000 address-family ipv4 unicast route-reflector-client neighbor 30.30.30.78 remote-as 65000 address-family ipv4 unicast route-reflector-client neighbor 30.30.30.79 remote-as 65000 address-family ipv4 unicast route-reflector-client
Lastly, an interface VLAN was defined on each N9K (again with different IP on each):
interface Vlan3290 no shutdown vrf member contiv no ip redirects ip address 30.30.30.2/24 no ipv6 redirects
Starting Up Kubernetes
Following the previous blogs notes, on the master node, I started up Kubernetes with:
kubeadm init --api-advertise-addresses=10.87.49.77 --use-kubernetes-version v1.4.7 --service-cidr 10.254.0.0/24 kubectl taint nodes --all dedicated- kubectl get pods --all-namespaces -o wide
Be sure to save the join command, so that other nodes can be added later. All the pods, except for DNS, should be running.
Starting Up Contiv plugin
For this step, we use a newer version of the Contiv netplugin and we tweak the install.sh to fix a minor problem, until a newer release is pushed. Follow the normal process to obtain the plugin installer:
export VERSION=1.0.0-beta.3 curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz tar xf contiv-$VERSION.tgz
Then, modify install/k8s/contiv.yaml to change the netplugin and netmaster container’s image line from “contiv/netplugin:1.0.0-beta.3” to “contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC”. If you are tearing down a previous setup, and rebuilding, you may also want to add “- -x” to the “args:” section of the “name: contiv-netplugin” container section, so that any OVS bridges from previous runs are removed, before starting a new install. Here are diffs, for both changes:
cd ~/contiv/contiv-$VERSION/install/k8s
*** contiv.yaml.orig 2017-03-13 12:26:53.397292278 +0000 --- contiv.yaml 2017-03-13 12:46:16.548371216 +0000 *************** *** 25,33 **** # container programs network policy and routes on each # host. - name: contiv-netplugin ! image: contiv/netplugin:1.0.0-beta.3 args: - -pkubernetes env: - name: VLAN_IF value: __VLAN_IF__ --- 25,34 ---- # container programs network policy and routes on each # host. - name: contiv-netplugin ! image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC args: - -pkubernetes + - -x env: - name: VLAN_IF value: __VLAN_IF__ *************** *** 139,145 **** hostPID: true containers: - name: contiv-netmaster ! image: contiv/netplugin:1.0.0-beta.3 args: - -m - -pkubernetes --- 140,146 ---- hostPID: true containers: - name: contiv-netmaster ! image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC args: - -m - -pkubernetes
Then, modify install.sh in the same area, to remove the “./” from the netctl command that is setting the forwarding mode to routing, on line 245, so it looks like this:
netctl --netmaster http://$netmaster:9999 global set --fwd-mode routing
Once all the changes are made, run the install.sh script with the same args as in the other blog, only we add the “-w routing” argument to cause L3 to be used. This uses the IP of the main interface on the master node (this node), and specifies the “b” interface.
cd ~/contiv/contiv-$VERSION install/k8s/install.sh -n 10.87.49.77 -v b -w routing
Check that the new Contiv pods (contiv-api-proxy, contiv-etcd, contiv-netmaster, contiv-netplugin) are all running. You can check that the forwarding mode is set for routing:
export NETMASTER=http://10.87.49.77:9999 netctl global info
Create A Network
For a network, I created a default network using a VXLAN:
netctl net create -t default --subnet=20.1.1.0/24 default-net
Add Other Nodes
Use the join command, saved from the init command output, to add in the other worker nodes. You should see a contiv-netplugin and kube-proxy pod running for each worker node added. From what I can see, kube-dns will have three of four pods running and will show liveliness/readiness failures. This is not currently used (and will be removed at some point, I guess), so can be ignored (or deleted).
Create BGP Neighbors
Next, we need to create BGP connections to each of the nodes with:
netctl bgp create devstack-77 --router-ip="30.30.30.77/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2" netctl bgp create devstack-78 --router-ip="30.30.30.78/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2" netctl bgp create devstack-71 --router-ip="30.30.30.79/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2"
Yeah, I have a host named devstack-71, that has a main interface with IP ending in .79. I chose to use the same numbering for the BGP interface (inb01) that is created. I’m using the one ToR switch’s IP address as the neighbor, for each of these connections. If it fails, things should failover to the other ToR. For the host side, I’m picking an IP on the 30.30.30.x net, not conflicting with the one created on the ‘b” interface.
Trying It Out
I created pods (NGINX with 4 replicas) and verified that the pods were created and that I could ping from pod to pod (across nodes). I also created a network with VLAN encapsulation, using:
netctl net create orange -s 10.1.1.0/24 -g 10.1.1.1 -e vlan
And then, to the labels section of the metadata section of the manifest for NGINX, I added the following to be able to use that network:
io.contiv.network: orange
Note: for the pods created, I could ping between pods on the same node, but not pods on other nodes.
Update: I found out from the Contiv folks that the plugin doesn’t yet support virtual Port Channels for the uplink, that I’m using on the three node setup I have. As a result, if a container hashed to the other ToR’s port channel interface, it could not communicate with containers connected to the other ToR. I’ll need to retry, once support is available for vPCs. In the meantime, I just shut down the interfaces to nodes, on the other ToR switch.