March 20

Kubernetes and Contiv on Bare-Metal with L3/BGP

Building on the previous blog of running Kubernetes with Contiv on bare-metal (https://blog.michali.net/2017/03/07/kubernetes-with-contiv-plugin-on-bare-metal/), I’m trying to do this with L3/BGP. To do this, an upstream router will be used to act as a BGP route reflector. In my case, I’m using a Cisco Nexus 9

Preparing Hosts

From CIMC on each UCS box, I created another pair of VNICs, setup in access mode, with a VLAN (3290) that is within the allowed VLANs for the port-channel on the Top of Rack (ToR) switch.

From CentOS, I created another pair of interfaces (b0 and b1), and a bonded interface (b). I verified that the MACs on the slave interfaces matched the MACs on the VNICs created in CIMC.

Note: if you still have the “t” interface (with slaves t0 and t1, which are associated with trunk veths) from the blog entry for using Contiv with L2 interfaces, you need to disable that interface, as only one uplink is supported. I changed “ONBOOT = no” in /etc/sysconfig/network-scripts/ifcfg-t.

 

Preparing the ToR Switch

On the ToR switch, BGP should be set up. In my case, I have a pair of Cisco Nexus 9Ks, which have port-channels for each of the nodes (using bonded interfaces on the nodes). There is an allowed VLAN on the port channels (3290) that will be used for L3/BGP. First the needed features were enabled and a VRF was created (used 30.30.30.2 on one N9K and 30.30.30.3 on the other):

feature interface-vlan
feature bgp
vrf context contiv
  rd 30.30.30.2:1
  address-family ipv4 unicast

 

Then, the BGP AS was created and neighbors defined. My three nodes neighbor addresses will be 30.30.30.77/78/79. The router ID on one N9K is 30.30.30.2 and on the other 30.30.30.3 (these two are used for the bonded interfaces).

router bgp 65000
  router-id 30.30.30.2
  cluster-id 30.30.30.2
  log-neighbor-changes
  address-family ipv4 unicast
  vrf conti
    neighbor 30.30.30.77
      remote-as 65000
      address-family ipv4 unicast
        route-reflector-client
    neighbor 30.30.30.78
      remote-as 65000
      address-family ipv4 unicast
        route-reflector-client
    neighbor 30.30.30.79
      remote-as 65000
      address-family ipv4 unicast
        route-reflector-client

 

Lastly, an interface VLAN was defined on each N9K (again with different IP on each):

interface Vlan3290
  no shutdown
  vrf member contiv
  no ip redirects
  ip address 30.30.30.2/24
  no ipv6 redirects

 

Starting Up Kubernetes

Following the previous blogs notes, on the master node, I started up Kubernetes with:

kubeadm init --api-advertise-addresses=10.87.49.77 --use-kubernetes-version v1.4.7 --service-cidr 10.254.0.0/24
kubectl taint nodes --all dedicated-
kubectl get pods --all-namespaces -o wide

 

Be sure to save the join command, so that other nodes can be added later. All the pods, except for DNS, should be running.

 

Starting Up Contiv plugin

For this step, we use a newer version of the Contiv netplugin and we tweak the install.sh to fix a minor problem, until a newer release is pushed. Follow the normal process to obtain the plugin installer:

export VERSION=1.0.0-beta.3
curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
tar xf contiv-$VERSION.tgz

 

Then, modify install/k8s/contiv.yaml to change the netplugin and netmaster container’s image line from “contiv/netplugin:1.0.0-beta.3” to “contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC”. If you are tearing down a previous setup, and rebuilding, you may also want to add “- -x” to the “args:” section of the “name: contiv-netplugin” container section, so that any OVS bridges from previous runs are removed, before starting a new install. Here are diffs, for both changes:

cd ~/contiv/contiv-$VERSION/install/k8s
*** contiv.yaml.orig    2017-03-13 12:26:53.397292278 +0000
--- contiv.yaml 2017-03-13 12:46:16.548371216 +0000
***************
*** 25,33 ****
          # container programs network policy and routes on each
          # host.
          - name: contiv-netplugin
!           image: contiv/netplugin:1.0.0-beta.3
            args:
              - -pkubernetes
            env:
              - name: VLAN_IF
                value: __VLAN_IF__
--- 25,34 ----
          # container programs network policy and routes on each
          # host.
          - name: contiv-netplugin
!           image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC
            args:
              - -pkubernetes
+             - -x
            env:
              - name: VLAN_IF
                value: __VLAN_IF__
***************
*** 139,145 ****
        hostPID: true
        containers:
          - name: contiv-netmaster
!           image: contiv/netplugin:1.0.0-beta.3
            args:
              - -m
              - -pkubernetes
--- 140,146 ----
        hostPID: true
        containers:
          - name: contiv-netmaster
!           image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC
            args:
              - -m
              - -pkubernetes

 

 

Then, modify install.sh in the same area, to remove the “./” from the netctl command that is setting the forwarding mode to routing, on line 245, so it looks like this:

    netctl --netmaster http://$netmaster:9999 global set --fwd-mode routing

 

Once all the changes are made, run the install.sh script with the same args as in the other blog, only we add the “-w routing” argument to cause L3 to be used. This uses the IP of the main interface on the master node (this node), and specifies the “b” interface.

cd ~/contiv/contiv-$VERSION
install/k8s/install.sh -n 10.87.49.77 -v b -w routing

 

Check that the new Contiv pods (contiv-api-proxy, contiv-etcd, contiv-netmaster, contiv-netplugin) are all running. You can check that the forwarding mode is set for routing:

export NETMASTER=http://10.87.49.77:9999
netctl global info

 

Create A Network

For a network, I created a default network using a VXLAN:

netctl net create -t default --subnet=20.1.1.0/24 default-net

 

Add Other Nodes

Use the join command, saved from the init command output, to add in the other worker nodes. You should see a contiv-netplugin and kube-proxy pod running for each worker node added. From what I can see, kube-dns will have three of four pods running and will show liveliness/readiness failures. This is not currently used (and will be removed at some point, I guess), so can be ignored (or deleted).

 

Create BGP Neighbors

Next, we need to create BGP connections to each of the nodes with:

netctl bgp create devstack-77 --router-ip="30.30.30.77/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2"
netctl bgp create devstack-78 --router-ip="30.30.30.78/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2"
netctl bgp create devstack-71 --router-ip="30.30.30.79/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2"

 

Yeah, I have a host named devstack-71, that has a main interface with IP ending in .79. I chose to use the same numbering for the BGP interface (inb01) that is created. I’m using the one ToR switch’s IP address as the neighbor, for each of these connections. If it fails, things should failover to the other ToR. For the host side, I’m picking an IP on the 30.30.30.x net, not conflicting with the one created on the ‘b” interface.

 

Trying It Out

I created pods (NGINX with 4 replicas) and verified that the pods were created and that I could ping from pod to pod (across nodes). I also created a network with VLAN encapsulation, using:

netctl net create orange -s 10.1.1.0/24 -g 10.1.1.1 -e vlan

 

And then, to the labels section of the metadata section of the manifest for NGINX, I added the following to be able to use that network:

    io.contiv.network: orange

 

Note: for the pods created, I could ping between pods on the same node, but not pods on other nodes.

Update: I found out from the Contiv folks that the plugin doesn’t yet support virtual Port Channels for the uplink, that I’m using on the three node setup I have. As a result, if  a container hashed to the other ToR’s port channel interface, it could not communicate with containers connected to the other ToR.  I’ll need to retry, once support is available for vPCs. In the meantime, I just shut down the interfaces to nodes, on the other ToR switch.

 

 

Category: Kubernetes | Comments Off on Kubernetes and Contiv on Bare-Metal with L3/BGP
March 7

Kubernetes with Contiv plugin on bare-metal

Preparations

I used three Cisco UCS systems for the basis of the Kubernetes cluster and followed the preparation and proxy steps in the blog https://blog.michali.net/2017/02/14/kubernetes-on-a-lab-system-behind-firewall/ to get the systems ready for use.

With that setup, the systems had a pair of VNIC interfaces (a0, a1), joined into a bonded interface (a), and an associated bridge (br_api) on the UCS. There are two physical interfaces that go to a pair of Top of Rack (ToR) set up as a port-channel, for connectivity between systems. It could have been done with a single interface, but that’s what I had already in the lab.

For Contiv, we want a second interface to use for the tenant network, so I modified the configuration of each of the three systems to add another pair of interfaces (t0,t1), and a master interface to bond them together (t). In the CIMC console for the UCS systems, I added another pair of VNICs, t0 and t1, selected trunk mode,  and made sure the MAC addresses matched the HWADDR in the /etc/sysconfig/network-scripts/ifcfg-t* files in CentOS.  Again, a single interface could be used, instead of a bonded pair like what I had.

Since this is a lab system that is behind a firewall, I modified the no_proxy entries in .bashrc on each node to use:

printf -v lan '%s,' "10.87.49.77,10.87.49.78,10.87.49.79"
printf -v service '%s,' 10.254.0.{2..253}
export no_proxy="cisco.com,${lan%,},${service%,},127.0.0.1";

 

Effectively, all the IPs for the nodes (10.87.49.x) and the service subnet IPs (10.254.0.0/24 – note smaller than default /16 subnet). In addition, on each system, I made sure there was an /etc/hosts entry for each of the three nodes I’m using.

Besides installing Kubernetes, I also installed “net-tools” on each node.

 

Kubernetes Startup

KubeAdm is used to startup the cluster with the IP of the name interface for this master node, forcing v1.4.7 Kubernetes, and using the default service CIDR, but with a smaller range (so that fewer no-proxy entries needed):

kubeadm init --api-advertise-addresses=10.87.49.77 --use-kubernetes-version v1.4.7 --service-cidr 10.254.0.0/24
kubectl taint nodes --all dedicated-
kubectl get pods --all-namespaces -o wide

 

Save the join command output, so that the worker nodes can be joined later.

All of the services should be up, except for DNS, which, since this first trial will use L2, this will be removed. We’ve removed the taint on this master node, so it too can be a worker.

 

Contiv Preparation

We’ll pull down the version of Contiv that we want to work with, and will run the install.sh script:

export VERSION=1.0.0-beta.3
curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
tar xf contiv-$VERSION.tgz
cd contiv-$VERSION

./install/k8s/install.sh -n 10.87.49.77 -v t

 

This will use the 10.87.49.77 node (the one I’m on and will use as the netmaster), and will use interface t (the tenant interface that I created above) for tenant interface. The script installs netctl in /usr/bin, so that it can be used for network management, and it builds a .contiv-yaml file in the directory and applies it to the cluster.

Note that there are now Contiv pods running, and the DNS pod is gone.

 

Trying It Out

On each of the worker nodes, run the join command. Verify on the master, that the nodes are ready (kubectl get nodes) and that a Contiv netplugin and proxy pods for each of the workers are running (kubectl get pods –all-namespaces). On the master, there should be kubernetes and kube-dns services running (kubectl get svc –all-namespaces).

Using netctl, create a default network using VXLAN. First must set an environment variable, so the netctl can communicate with the master:

export NETMASTER=http://10.87.49.77:9999
netctl net create -t default --subnet=20.1.1.0/24 default-net

 

Next, create a manifest for some pods and apply them. I used nginx with four replicas, and verified that the pods were all running, dispersed over the three nodes, and all had IP addresses. I could ping from pod to pod, but not from node to pod (expected, as not supported at this time).

If desired, you can create a network using VLANs and then add a label “io.contiv.network: network-name” to the manifest to create pods on that network. For example, I created a network with VLAN 3280 (which was an allowed VLAN on the ToR port-channel):

netctl network create --encap=vlan --pkt-tag=3280 --subnet=10.100.100.215-10.100.100.220/27 --gateway=10.100.100.193 vlan3280

 

Then, in the manifest, I added:

metadata:
...
labels:
app: demo-labels
io.contiv.network: vlan3280

 

Once the manifest is applied, the pods should come up and have IP addresses. You can docker exec into the pods and ping from pod to pod. As with VXLAN, I cannot ping from node to pod.

Note: I did have a case where pods on one of the nodes were not getting an IP address and were showing this error, when doing a “kubectl describe pod”:

  6m        1s      105 {kubelet devstack-77}           Warning     FailedSync  Error syncing pod, skipping: failed to "SetupNetwork" for "nginx-vlan-2501561640-f7vi1_default" with SetupNetworkError: "Failed to setup network for pod \"nginx-vlan-2501561640-f7vi1_default(68cd1fb3-0376-11e7-9c6d-003a7d69f73c)\" using network plugins \"cni\": Contiv:Post http://localhost/ContivCNI.AddPod: dial unix /run/contiv/contiv-cni.sock: connect: no such file or directory; Skipping pod"

 

It looks like there were OVS bridges hanging around from failed attempts. Contiv folks mentioned this pull request for the issue – https://github.com/contiv/install/pull/62/files#diff-c07ea516fee8c7edc505b662327701f4. Until this change is available, the contiv.yaml file can be modified to add the -x option. Just go to ~/contiv/contiv-$VERSION/install/k8s/contiv.yaml and add in the -x option for netplugin.

        - name: contiv-netplugin
          image: contiv/netplugin:__CONTIV_VERSION__
          args:
            - -pkubernetes
            - -x
          env:


Once this file is modified, then you can do the Contiv Preparation steps above and run the install.sh script with this change.

 

Update: I was using service CIDR of 10.96.0.0/24, but Contiv folks indicated that I should be using 10.254.0.0/24 (I guess useful for Kubernetes services using service type ClusterIP). I updated this page, but haven’t retested – yet.

 

Category: Kubernetes | Comments Off on Kubernetes with Contiv plugin on bare-metal
March 6

Kubernetes with Contiv plugin in VM

Setup

An easy way to setup Contiv on a pair of nodes, is to use the demo installer that is on Github (https://github.com/contiv/install/tree/master/cluster). I did this on a Macbook Pro, with 16 GB of RAM by using these commands:

cd ~/workspace/k8s
git clone https://github.com/contiv/install.git contiv-install
cd contiv-install
BUILD_VERSION=1.0.0-beta.3 make demo-k8s

The make command, will move to the cluster directory and invoke a Vagrantfile to bring up two nodes with Contiv. It uses KubeAdm, starts up a cluster, builds and applies a YAML file, and creates a VXLAN based network. You only need to create pods, once that is completed.

Access

Once the make command has completed, you can access the master node with:

cd cluster
CONTIV_KUBEADM=1 vagrant ssh contiv-node1

From there, you can issue kubectl commands to view the nodes, pods, and apply YAML files for starting up pods. The worker node can be accessed the same way, by using “contiv-node2” as the host name. Use the netctl command to view/manipulate the networks. For example, commands like:

netctl network ls
netctl net create -t default --subnet=20.1.1.0/24 default-net
netctl group create -t default default-net default-epg
netctl net create vlan5 -s 192.168.5.0/24 -g 192.168.5.1 -pkt-tag 5 --encap vlan

Note: if you want to create a pod that uses a non-default network, you can use the following syntax in the pod spec:

cat > busybox.yaml <<EOT
apiVersion: v1
kind: Pod
metadata:
  name: busybox-harmony-net
  labels:
    app: demo-labels
    io.contiv.network: vlan100
spec:
  containers:
  - name: bbox
    image: contiv/nc-busybox
    command:
      - sleep
      - "7200"
EOT

 

This uses VLAN100 network that was previously created with:

netctl network create --encap=vlan --pkt-tag=100 --subnet=10.100.100.215-10.100.100.220/27 --gateway=10.100.100.193 vlan100

 

Tips

I found that this procedure did not work, when my Mac was connected via VPN to the network. It appears that the VPN mechanism was preventing the VM to ping the (mac) host, and vice versa. Could not even ping the vboxnet interface’s IP from the Mac. Once disconnected from VPN, everything worked fine.

With the default VXLAN that is created by the makefile, you cannot (yet) ping from the node to a VM (or vice versa). Pod to pod pings work, even across nodes.

When done, you can use the cluster-destroy make target to destroy the VMs that are created.

Category: Kubernetes | Comments Off on Kubernetes with Contiv plugin in VM