March 20

Kubernetes and Contiv on Bare-Metal with L3/BGP

Building on the previous blog of running Kubernetes with Contiv on bare-metal (https://blog.michali.net/2017/03/07/kubernetes-with-contiv-plugin-on-bare-metal/), I’m trying to do this with L3/BGP. To do this, an upstream router will be used to act as a BGP route reflector. In my case, I’m using a Cisco Nexus 9

Preparing Hosts

From CIMC on each UCS box, I created another pair of VNICs, setup in access mode, with a VLAN (3290) that is within the allowed VLANs for the port-channel on the Top of Rack (ToR) switch.

From CentOS, I created another pair of interfaces (b0 and b1), and a bonded interface (b). I verified that the MACs on the slave interfaces matched the MACs on the VNICs created in CIMC.

Note: if you still have the “t” interface (with slaves t0 and t1, which are associated with trunk veths) from the blog entry for using Contiv with L2 interfaces, you need to disable that interface, as only one uplink is supported. I changed “ONBOOT = no” in /etc/sysconfig/network-scripts/ifcfg-t.

 

Preparing the ToR Switch

On the ToR switch, BGP should be set up. In my case, I have a pair of Cisco Nexus 9Ks, which have port-channels for each of the nodes (using bonded interfaces on the nodes). There is an allowed VLAN on the port channels (3290) that will be used for L3/BGP. First the needed features were enabled and a VRF was created (used 30.30.30.2 on one N9K and 30.30.30.3 on the other):

feature interface-vlan
feature bgp
vrf context contiv
  rd 30.30.30.2:1
  address-family ipv4 unicast

 

Then, the BGP AS was created and neighbors defined. My three nodes neighbor addresses will be 30.30.30.77/78/79. The router ID on one N9K is 30.30.30.2 and on the other 30.30.30.3 (these two are used for the bonded interfaces).

router bgp 65000
  router-id 30.30.30.2
  cluster-id 30.30.30.2
  log-neighbor-changes
  address-family ipv4 unicast
  vrf conti
    neighbor 30.30.30.77
      remote-as 65000
      address-family ipv4 unicast
        route-reflector-client
    neighbor 30.30.30.78
      remote-as 65000
      address-family ipv4 unicast
        route-reflector-client
    neighbor 30.30.30.79
      remote-as 65000
      address-family ipv4 unicast
        route-reflector-client

 

Lastly, an interface VLAN was defined on each N9K (again with different IP on each):

interface Vlan3290
  no shutdown
  vrf member contiv
  no ip redirects
  ip address 30.30.30.2/24
  no ipv6 redirects

 

Starting Up Kubernetes

Following the previous blogs notes, on the master node, I started up Kubernetes with:

kubeadm init --api-advertise-addresses=10.87.49.77 --use-kubernetes-version v1.4.7 --service-cidr 10.254.0.0/24
kubectl taint nodes --all dedicated-
kubectl get pods --all-namespaces -o wide

 

Be sure to save the join command, so that other nodes can be added later. All the pods, except for DNS, should be running.

 

Starting Up Contiv plugin

For this step, we use a newer version of the Contiv netplugin and we tweak the install.sh to fix a minor problem, until a newer release is pushed. Follow the normal process to obtain the plugin installer:

export VERSION=1.0.0-beta.3
curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
tar xf contiv-$VERSION.tgz

 

Then, modify install/k8s/contiv.yaml to change the netplugin and netmaster container’s image line from “contiv/netplugin:1.0.0-beta.3” to “contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC”. If you are tearing down a previous setup, and rebuilding, you may also want to add “- -x” to the “args:” section of the “name: contiv-netplugin” container section, so that any OVS bridges from previous runs are removed, before starting a new install. Here are diffs, for both changes:

cd ~/contiv/contiv-$VERSION/install/k8s
*** contiv.yaml.orig    2017-03-13 12:26:53.397292278 +0000
--- contiv.yaml 2017-03-13 12:46:16.548371216 +0000
***************
*** 25,33 ****
          # container programs network policy and routes on each
          # host.
          - name: contiv-netplugin
!           image: contiv/netplugin:1.0.0-beta.3
            args:
              - -pkubernetes
            env:
              - name: VLAN_IF
                value: __VLAN_IF__
--- 25,34 ----
          # container programs network policy and routes on each
          # host.
          - name: contiv-netplugin
!           image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC
            args:
              - -pkubernetes
+             - -x
            env:
              - name: VLAN_IF
                value: __VLAN_IF__
***************
*** 139,145 ****
        hostPID: true
        containers:
          - name: contiv-netmaster
!           image: contiv/netplugin:1.0.0-beta.3
            args:
              - -m
              - -pkubernetes
--- 140,146 ----
        hostPID: true
        containers:
          - name: contiv-netmaster
!           image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC
            args:
              - -m
              - -pkubernetes

 

 

Then, modify install.sh in the same area, to remove the “./” from the netctl command that is setting the forwarding mode to routing, on line 245, so it looks like this:

    netctl --netmaster http://$netmaster:9999 global set --fwd-mode routing

 

Once all the changes are made, run the install.sh script with the same args as in the other blog, only we add the “-w routing” argument to cause L3 to be used. This uses the IP of the main interface on the master node (this node), and specifies the “b” interface.

cd ~/contiv/contiv-$VERSION
install/k8s/install.sh -n 10.87.49.77 -v b -w routing

 

Check that the new Contiv pods (contiv-api-proxy, contiv-etcd, contiv-netmaster, contiv-netplugin) are all running. You can check that the forwarding mode is set for routing:

export NETMASTER=http://10.87.49.77:9999
netctl global info

 

Create A Network

For a network, I created a default network using a VXLAN:

netctl net create -t default --subnet=20.1.1.0/24 default-net

 

Add Other Nodes

Use the join command, saved from the init command output, to add in the other worker nodes. You should see a contiv-netplugin and kube-proxy pod running for each worker node added. From what I can see, kube-dns will have three of four pods running and will show liveliness/readiness failures. This is not currently used (and will be removed at some point, I guess), so can be ignored (or deleted).

 

Create BGP Neighbors

Next, we need to create BGP connections to each of the nodes with:

netctl bgp create devstack-77 --router-ip="30.30.30.77/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2"
netctl bgp create devstack-78 --router-ip="30.30.30.78/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2"
netctl bgp create devstack-71 --router-ip="30.30.30.79/24" --as="65000" --neighbor-as="65000" --neighbor="30.30.30.2"

 

Yeah, I have a host named devstack-71, that has a main interface with IP ending in .79. I chose to use the same numbering for the BGP interface (inb01) that is created. I’m using the one ToR switch’s IP address as the neighbor, for each of these connections. If it fails, things should failover to the other ToR. For the host side, I’m picking an IP on the 30.30.30.x net, not conflicting with the one created on the ‘b” interface.

 

Trying It Out

I created pods (NGINX with 4 replicas) and verified that the pods were created and that I could ping from pod to pod (across nodes). I also created a network with VLAN encapsulation, using:

netctl net create orange -s 10.1.1.0/24 -g 10.1.1.1 -e vlan

 

And then, to the labels section of the metadata section of the manifest for NGINX, I added the following to be able to use that network:

    io.contiv.network: orange

 

Note: for the pods created, I could ping between pods on the same node, but not pods on other nodes.

Update: I found out from the Contiv folks that the plugin doesn’t yet support virtual Port Channels for the uplink, that I’m using on the three node setup I have. As a result, if  a container hashed to the other ToR’s port channel interface, it could not communicate with containers connected to the other ToR.  I’ll need to retry, once support is available for vPCs. In the meantime, I just shut down the interfaces to nodes, on the other ToR switch.

 

 

Category: Kubernetes | Comments Off on Kubernetes and Contiv on Bare-Metal with L3/BGP
March 7

Kubernetes with Contiv plugin on bare-metal

Preparations

I used three Cisco UCS systems for the basis of the Kubernetes cluster and followed the preparation and proxy steps in the blog https://blog.michali.net/2017/02/14/kubernetes-on-a-lab-system-behind-firewall/ to get the systems ready for use.

With that setup, the systems had a pair of VNIC interfaces (a0, a1), joined into a bonded interface (a), and an associated bridge (br_api) on the UCS. There are two physical interfaces that go to a pair of Top of Rack (ToR) set up as a port-channel, for connectivity between systems. It could have been done with a single interface, but that’s what I had already in the lab.

For Contiv, we want a second interface to use for the tenant network, so I modified the configuration of each of the three systems to add another pair of interfaces (t0,t1), and a master interface to bond them together (t). In the CIMC console for the UCS systems, I added another pair of VNICs, t0 and t1, selected trunk mode,  and made sure the MAC addresses matched the HWADDR in the /etc/sysconfig/network-scripts/ifcfg-t* files in CentOS.  Again, a single interface could be used, instead of a bonded pair like what I had.

Since this is a lab system that is behind a firewall, I modified the no_proxy entries in .bashrc on each node to use:

printf -v lan '%s,' "10.87.49.77,10.87.49.78,10.87.49.79"
printf -v service '%s,' 10.254.0.{2..253}
export no_proxy="cisco.com,${lan%,},${service%,},127.0.0.1";

 

Effectively, all the IPs for the nodes (10.87.49.x) and the service subnet IPs (10.254.0.0/24 – note smaller than default /16 subnet). In addition, on each system, I made sure there was an /etc/hosts entry for each of the three nodes I’m using.

Besides installing Kubernetes, I also installed “net-tools” on each node.

 

Kubernetes Startup

KubeAdm is used to startup the cluster with the IP of the name interface for this master node, forcing v1.4.7 Kubernetes, and using the default service CIDR, but with a smaller range (so that fewer no-proxy entries needed):

kubeadm init --api-advertise-addresses=10.87.49.77 --use-kubernetes-version v1.4.7 --service-cidr 10.254.0.0/24
kubectl taint nodes --all dedicated-
kubectl get pods --all-namespaces -o wide

 

Save the join command output, so that the worker nodes can be joined later.

All of the services should be up, except for DNS, which, since this first trial will use L2, this will be removed. We’ve removed the taint on this master node, so it too can be a worker.

 

Contiv Preparation

We’ll pull down the version of Contiv that we want to work with, and will run the install.sh script:

export VERSION=1.0.0-beta.3
curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
tar xf contiv-$VERSION.tgz
cd contiv-$VERSION

./install/k8s/install.sh -n 10.87.49.77 -v t

 

This will use the 10.87.49.77 node (the one I’m on and will use as the netmaster), and will use interface t (the tenant interface that I created above) for tenant interface. The script installs netctl in /usr/bin, so that it can be used for network management, and it builds a .contiv-yaml file in the directory and applies it to the cluster.

Note that there are now Contiv pods running, and the DNS pod is gone.

 

Trying It Out

On each of the worker nodes, run the join command. Verify on the master, that the nodes are ready (kubectl get nodes) and that a Contiv netplugin and proxy pods for each of the workers are running (kubectl get pods –all-namespaces). On the master, there should be kubernetes and kube-dns services running (kubectl get svc –all-namespaces).

Using netctl, create a default network using VXLAN. First must set an environment variable, so the netctl can communicate with the master:

export NETMASTER=http://10.87.49.77:9999
netctl net create -t default --subnet=20.1.1.0/24 default-net

 

Next, create a manifest for some pods and apply them. I used nginx with four replicas, and verified that the pods were all running, dispersed over the three nodes, and all had IP addresses. I could ping from pod to pod, but not from node to pod (expected, as not supported at this time).

If desired, you can create a network using VLANs and then add a label “io.contiv.network: network-name” to the manifest to create pods on that network. For example, I created a network with VLAN 3280 (which was an allowed VLAN on the ToR port-channel):

netctl network create --encap=vlan --pkt-tag=3280 --subnet=10.100.100.215-10.100.100.220/27 --gateway=10.100.100.193 vlan3280

 

Then, in the manifest, I added:

metadata:
...
labels:
app: demo-labels
io.contiv.network: vlan3280

 

Once the manifest is applied, the pods should come up and have IP addresses. You can docker exec into the pods and ping from pod to pod. As with VXLAN, I cannot ping from node to pod.

Note: I did have a case where pods on one of the nodes were not getting an IP address and were showing this error, when doing a “kubectl describe pod”:

  6m        1s      105 {kubelet devstack-77}           Warning     FailedSync  Error syncing pod, skipping: failed to "SetupNetwork" for "nginx-vlan-2501561640-f7vi1_default" with SetupNetworkError: "Failed to setup network for pod \"nginx-vlan-2501561640-f7vi1_default(68cd1fb3-0376-11e7-9c6d-003a7d69f73c)\" using network plugins \"cni\": Contiv:Post http://localhost/ContivCNI.AddPod: dial unix /run/contiv/contiv-cni.sock: connect: no such file or directory; Skipping pod"

 

It looks like there were OVS bridges hanging around from failed attempts. Contiv folks mentioned this pull request for the issue – https://github.com/contiv/install/pull/62/files#diff-c07ea516fee8c7edc505b662327701f4. Until this change is available, the contiv.yaml file can be modified to add the -x option. Just go to ~/contiv/contiv-$VERSION/install/k8s/contiv.yaml and add in the -x option for netplugin.

        - name: contiv-netplugin
          image: contiv/netplugin:__CONTIV_VERSION__
          args:
            - -pkubernetes
            - -x
          env:


Once this file is modified, then you can do the Contiv Preparation steps above and run the install.sh script with this change.

 

Update: I was using service CIDR of 10.96.0.0/24, but Contiv folks indicated that I should be using 10.254.0.0/24 (I guess useful for Kubernetes services using service type ClusterIP). I updated this page, but haven’t retested – yet.

 

Category: Kubernetes | Comments Off on Kubernetes with Contiv plugin on bare-metal
March 6

Kubernetes with Contiv plugin in VM

Setup

An easy way to setup Contiv on a pair of nodes, is to use the demo installer that is on Github (https://github.com/contiv/install/tree/master/cluster). I did this on a Macbook Pro, with 16 GB of RAM by using these commands:

cd ~/workspace/k8s
git clone https://github.com/contiv/install.git contiv-install
cd contiv-install
BUILD_VERSION=1.0.0-beta.3 make demo-k8s

The make command, will move to the cluster directory and invoke a Vagrantfile to bring up two nodes with Contiv. It uses KubeAdm, starts up a cluster, builds and applies a YAML file, and creates a VXLAN based network. You only need to create pods, once that is completed.

Access

Once the make command has completed, you can access the master node with:

cd cluster
CONTIV_KUBEADM=1 vagrant ssh contiv-node1

From there, you can issue kubectl commands to view the nodes, pods, and apply YAML files for starting up pods. The worker node can be accessed the same way, by using “contiv-node2” as the host name. Use the netctl command to view/manipulate the networks. For example, commands like:

netctl network ls
netctl net create -t default --subnet=20.1.1.0/24 default-net
netctl group create -t default default-net default-epg
netctl net create vlan5 -s 192.168.5.0/24 -g 192.168.5.1 -pkt-tag 5 --encap vlan

Note: if you want to create a pod that uses a non-default network, you can use the following syntax in the pod spec:

cat > busybox.yaml <<EOT
apiVersion: v1
kind: Pod
metadata:
  name: busybox-harmony-net
  labels:
    app: demo-labels
    io.contiv.network: vlan100
spec:
  containers:
  - name: bbox
    image: contiv/nc-busybox
    command:
      - sleep
      - "7200"
EOT

 

This uses VLAN100 network that was previously created with:

netctl network create --encap=vlan --pkt-tag=100 --subnet=10.100.100.215-10.100.100.220/27 --gateway=10.100.100.193 vlan100

 

Tips

I found that this procedure did not work, when my Mac was connected via VPN to the network. It appears that the VPN mechanism was preventing the VM to ping the (mac) host, and vice versa. Could not even ping the vboxnet interface’s IP from the Mac. Once disconnected from VPN, everything worked fine.

With the default VXLAN that is created by the makefile, you cannot (yet) ping from the node to a VM (or vice versa). Pod to pod pings work, even across nodes.

When done, you can use the cluster-destroy make target to destroy the VMs that are created.

Category: Kubernetes | Comments Off on Kubernetes with Contiv plugin in VM
February 23

Updates: IPv6 with KubeAdm and Calico

With some recent code changes (so this applies to using latest on master), I found that I needed to modify a few things…

Bare Metal

Where I have IP6_AUTODETECT_METHOD set to “first-found”, in calico.yaml, the environment variable needs to be renamed to IP6_AUTODETECTION_METHOD.

 

Vagrant/VM

I started encountering a failure when joining the second node in this setup. I found that it was using the IP 10.0.2.15 for the IPv4 BGP address and this is a problem on this setup. It turns out that VirtualBox will create a main interface (enp0s3) with the IP 10.0.2.15 for every VM created. Now, the Vagrantfile creates a second interface, enp0s8, that has a different IP for each node, on the subnet 10.96.0.0/16(?). To make Calico use the second interface, the calico.yaml file needs to have this clause added to the BGP section:

 # Auto-detect the BGP IP address.
 - name: IP
 value: ""
 - name: IP_AUTODETECTION_METHOD
 value: "can-reach=10.96.0.101"
 - name: IP6
 value: "autodetect"
 - name: IP6_AUTODETECTION_METHOD
 value: "first-found"

 

I used the can-reach value, but I think I could have done “interface=enp0s8” as well.

For IPv6, I added an IPv6 address to enp0s8, for each node, using a line like (with different IPs on each node, of course):

ip addr add 2001:2::15/64 dev enp0s8

 

Trying With Changes

After bringing up the cluster, creating the IPv6 pool, and enabling IPv6 on each node (/etc/cni/net.d/10-calico.conf), I created some pods, using this clause in the manifest:

metadata:
 name: my-nginx6
spec:
 replicas: 3
 template:
 metadata:
 labels:
 run: my-nginx6
 annotations:
 "cni.projectcalico.org/ipv6pools": "[\"2001:2::/64\"]"
 spec:
 containers:
 - name: my-nginx6
 image: nginx
 ports:
 - containerPort: 8080

 

They all had IPv6 addresses, but there were two issues. First, the two replicas were both created on node-02. I ended up creating eight replicas, so that there were two on node-01. With bare metal, I see that pods are pretty much distributed evenly on all nodes, but I don’t see that in the VM cases (utilization is higher on the master/worker node). One problem down, one to go…

Second, on each node, I don’t see a route to the other node. Looking at “calicoctl node status” (remember to set ETCD_ENDPOINTS as mentioned in other blogs) I see that BFG connections are not working:

Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+--------------------------------+
| 10.96.0.102 | node-to-node mesh | start | 15:14:38 | Active Socket: Connection |
| | | | | refused |
+--------------+-------------------+-------+----------+--------------------------------+

IPv6 BGP status
+--------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+--------------------------------+
| 2001:2::16 | node-to-node mesh | start | 15:14:38 | Active Socket: Connection |
| | | | | refused |
+--------------+-------------------+-------+----------+--------------------------------+

 

If I look in the calico-node container, I see that the bird and bird6 processes are not running and there are no config files in /etc/calico/confd/config/ on node-02 (is OK on node-01).

I also found that forwarding was not set for all interfaces on both of the nodes, so I did this as well:

sysctl net.ipv6.conf.all.forwarding=1

 

Talking to Gunjan Patel, we looked at the calico-node docker log and saw:

2017-02-23T17:09:07Z node-02 confd[54]: ERROR 501: All the given peers are not reachable (failed to propose on members [http://10.0.2.15:6666] twice [last erro\
r: Get http://10.0.2.15:6666/v2/keys/calico/v1/ipam/v4?quorum=false&recursive=true&sorted=false: dial tcp 10.0.2.15:6666: connection refused]) [0]

 

Looks like it is trying to use 10.0.2.15 for peering and failing. Gunja referred me to a commit he made up (https://github.com/gunjan5/calico-tutorials/commit/eac67014f0509156278dc9396185e784fa7f1aec?diff=unified).

After updating my calico.yaml with these changes, I see that the BGP peering connection is established, when checking node status. I continued on and created IPv6 pods and verified that could ping across nodes. Yay!

For reference, here is the calico.yaml file I’m using (today :)) – working.calico.yaml

That file, adding IPv6 addresses to each node’s enp0s8 interface, and (possibly) enabling forwarding on all IPv6 interfaces, should be enough to do the trick. Then, just add IPv6 pool, enable IPv6 on both nodes, and create pods.

On bare-metal, the calico.yaml specified the interface I wanted to use for the network, and I needed to enable forwarding on the one node (not sure how to persist that). I could then ping from node to container and container to container, across nodes.

 

Category: Kubernetes | Comments Off on Updates: IPv6 with KubeAdm and Calico
February 22

IPv6 Multi-node On Bare-Metal

In a previous blog entry, I was able to bring up a cluster on a three node bare-metal setup (with Calico plugin), and then switch to IPv6 and create pods with IPv6 addresses. At the time, I just did a cursory check and, made sure I could ping the pod using its IPv6 address.

Well, the devil is in the details. When I checked multiple pods, I found a problem where I could not ping a pod from a different node, or ping pod to pod, when they are on different nodes.

Looking at the routing table, I was seeing that there was a route for each local pod on a node, using the cali interface. But, there were no routes to pods on other node (using the tunl0 interface), like I was seeing with IPv4:

IPv4:

192.168.0.0/26 via 10.87.49.79 dev tunl0  proto bird onlink
blackhole 192.168.0.128/26  proto bird
192.168.0.130 dev calie572c5d95aa  scope link
192.168.0.192/26 via 10.87.49.77 dev tunl0  proto bird onlink

IPv6

2001:2::a8ed:126:57ef:8680 dev calie9323554a97  metric 1024
2001:2::a8ed:126:57ef:8681 dev calid6195fe85f3  metric 1024
blackhole 2001:2::a8ed:126:57ef:8680/122 dev lo  proto bird  metric 1024  error -22

 

When checking “calicoctl node status” it showed IPv4 BGP peers, but no IPv6 BGP peers. I found that in calico.yaml, I needed to have this:

# Auto-detect the BGP IP address.
 - name: IP
 value: ""
 - name: IP6
 value: "autodetect"
 - name: IP6_AUTODETECT_METHOD
 value: "first-found"

 

From what I understand, leaving IP value empty, means it will autodetect and use that IP. For IPv6 though, if IP6 is set to empty value or the key is missing, the IPv6 BGP is disabled.

Also, I was using the :latest label for CNI, calico-node, and calico-ctl images. Changed those to :master to get the recent changes.

Now, when nodes join, I see BGP peer entries for both IPv4 and IPv6:

Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 10.87.49.79 | node-to-node mesh | up | 21:07:06 | Established |
| 10.87.49.78 | node-to-node mesh | up | 21:07:12 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 2001:2::79 | node-to-node mesh | up | 21:07:06 | Established |
| 2001:2::78 | node-to-node mesh | up | 21:07:12 | Established |
+--------------+-------------------+-------+----------+-------------+

 

I proceeded to create three pods using IPv4. I could ping from one host to each pod, and from one pod, to the other two pods on different nodes. Each host had routes like these:

192.168.0.2 dev cali812c8ee8317 scope link
 192.168.0.64/26 via 10.87.49.79 dev tunl0 proto bird onlink
 192.168.0.128/26 via 10.87.49.78 dev tunl0 proto bird onlink

 

Next, I switched to IPv6 (enabled in 10-calico.conf on each node, and added IPv6 pool on master node) and create three more pods. Had an issue, as master node had old docker image for CNI, which didn’t have latest fixes. Ended up deleting the image, redeploying CNI, and then deleting and recreating pods.  See routes like this now:

2001:2::6d47:e62d:8139:d1e9 dev calicc4563e7a35 metric 1024
blackhole 2001:2::6d47:e62d:8139:d1c0/122 dev lo proto bird metric 1024 error -22
2001:2::8f3a:d659:6d15:1880/122 via 2001:2::79 dev br_api proto bird metric 1024
2001:2::a8ed:126:57ef:8680/122 via 2001:2::78 dev br_api proto bird metric 1024

 

Where br_api is my main interface (a bridge for a bonded interface). I’m able to ping from host to pod and pod to pod across hosts.

Note: this was not working for one of the pods, and the packet was not getting past the cali interface on that pod. I checked and on that node, forwarding was disabled (not sure why). I did the following, and now pings work:

sysctl net.ipv6.conf.all.forwarding=1

 

Not sure how to persist this (don’t see it in /etc/sysctl.conf  or /etc/sysctl.d/* on any system).

Another curious thing. When I was checking tcpdump to trace the ICMP packets, I was seeing these type messages:

13:42:15.649100 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 51) devstack-77.56087 > 2001:2::79.bgp: Flags [P.], cksum 0x412f (incorrect -> 0x382c), seq 342:361, ack 343, win 242, options [nop,nop,TS val 63676370 ecr 63075039], length 19: BGP, length: 19
 Keepalive Message (4), length: 19
13:42:15.649199 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 32) 2001:2::79.bgp > devstack-77.56087: Flags [.], cksum 0xa391 (correct), seq 343, ack 361, win 240, options [nop,nop,TS val 63114134 ecr 63676370], length 0

 

Wondering why the (BGP?) packet from devstack-77 system has an incorrect checksum, but don’t see that in response. I see the same thing on other nodes, again, only with responses from devstack-77:

13:44:08.682811 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 51) 2001:2::78.bgp > devstack-71.33001: Flags [P.], cksum 0xfef8 (correct), seq 343:362, ack 342, win 240, options [nop,nop,TS val 63182410 ecr 63183301], length 19: BGP, length: 19
 Keepalive Message (4), length: 19
13:44:08.682864 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 32) devstack-71.33001 > 2001:2::78.bgp: Flags [.], cksum 0x411d (incorrect -> 0x5ab3), seq 342, ack 362, win 242, options [nop,nop,TS val 63226403 ecr 63182410], length 0

 

In any case, it looks like IPv6 communication is working! For reference, here is the calico.yaml file used: calico.yaml

 

 

 

 

Category: Kubernetes | Comments Off on IPv6 Multi-node On Bare-Metal
February 17

Kubernetes/Calico plugin with IPv6 on bare-metal

Documenting a setup for investigating Kubernetes with IPv6 in a lab environment. This builds off of notes for using KubeAdm for Kubernetes with Calico plugin on a bare-metal system, which is behind a firewall in a lab (https://blog.michali.net/2017/02/14/kubernetes-on-a-…-behind-firewall).

These notes should work for Ubuntu 16.04, in addition to CentOS, which was what was used in that blog.

Preparation

In the prior blog, the no-proxy environment variable was setup and the cluster was initialized using an alternate subnet (10.20.30.x/24). Later, i found that it is easier to use the original subnet and just reduce the size. I used the alternative setup, added to that blog as an update.

When trying to switch to IPv6, you’ll need the calicoctl command. The easiest way is to install the calicoctl binary (as root):

curl -L --silent https://github.com/projectcalico/calico-containers/releases/download/v1.0.0/calicoctl -o /usr/local/bin/calicoctl
chmod +x /usr/local/bin/calicoctl

Otherwise, you can install go, pull the sources, build and install calicoctl (see end of blog for details).

Starting Up the Cluster

When the cluster is initialized, you can use:

kubeadm init --api-advertise-addresses=10.87.49.77 --service-cidr=10.96.0.0/24

 

Before applying the calico.yaml, there are additional changes needed. As mentioned in the other blog, the etcd_endpoints and ippool need to be modified. Beyond that, you need to make sure that you have the CNI code with the fix from commit b8fc5928 (merged 2/16/2017), which fixes issue #273. I did that by changing the CNI image line:

     image: quay.io/calico/cni:latest

 

This fixes a problem where some kernels were not honoring the FlagUp option, when creating the veth interfaces.

From this point on, you can apply calico.yaml, and then follow the steps in https://blog.michali.net/2017/02/11/using-kubeadm-and-calico-plugin-for-ipv6-addresses/ under “Reconfiguring for IPv6” to enable Ipv6 for future pod creation. Remember to use  “kubectl get svc –all-namespace” to obtain the IP and port for etcd and set the ETCD_ENDPOINTS environment variable, as the calicoctl command will work without this, but will not be accessing the correct key-store entries.

Notes

In the pod, I see these interfaces:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host
 valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1
 link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
 link/ether 6a:76:fc:00:4b:cc brd ff:ff:ff:ff:ff:ff
 inet6 2001:2::6d47:e62d:8139:d1c0/128 scope global
 valid_lft forever preferred_lft forever
 inet6 fe80::6876:fcff:fe00:4bcc/64 scope link
 valid_lft forever preferred_lft forever

There are these routes:

2001:2::6d47:e62d:8139:d1c0 dev eth0 proto kernel metric 256
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::c8e9:11ff:fe2c:c809 dev eth0 metric 1024

On the host, there is this related IP address:

22: cali1500372f1da@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
 link/ether ca:e9:11:2c:c8:09 brd ff:ff:ff:ff:ff:ff link-netnsid 3
 inet6 fe80::c8e9:11ff:fe2c:c809/64 scope link
 valid_lft forever preferred_lft forever

With these related routes:

2001:2::6d47:e62d:8139:d1c0 dev cali1500372f1da metric 1024
blackhole 2001:2::6d47:e62d:8139:d1c0/122 dev lo proto bird metric 1024 error -22

I did see one system where I could not ping between pods or pod and host, with IPv6 addresses. What I noticed was that, on that system, the cali# interfaces created, although up, did not have a Link Local Address. The pod, had a route to a LLA, which on another system (that worked), it was for the cali# interface. Need to investigate what is wrong on this system.

Manually Building Calicoctl

IF you want to do this the hard way, you can manually build and install the calicoctl tool. First I installed Go on the system:

curl -O http://storage.googleapis.com/golang/go1.7.4.linux-amd64.tar.gz

tar -xvf go1.7.4.linux-amd64.tar.gz
sudo mv go /usr/local

In ~/.bashrc add:

export PATH=/usr/local/go/bin:$PATH

To use,  set GOPATH to the top of a work area for source and add it to the path in your .bashrc file (and re-source it so that your environment is up to date):

export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

Now, calicoctl can be installed (detailed instructions https://github.com/projectcalico/calicoctl). Here is a summary of the steps:

mkdir -p ~/go/src/github.com/projectcalico
git clone https://github.com/projectcalico/calicoctl.git $GOPATH/src/github.com/projectcalico/calicoctl

Install glide:

mkdir $GOPATH/bin
curl https://glide.sh/get | sh
cd ~/go/src/github.com/projectcalico/calicoctl
glide install -strip-vendor
make binary
cd $GOPATH
go build src/github.com/projectcalico/calicoctl/calicoctl/calicoctl.go
mv calicoctl bin/
sudo cp bin/calicoctl /usr/local/bin
sudo chmod 755 /usr/local/bin/calicoctl

 

Category: Kubernetes | Comments Off on Kubernetes/Calico plugin with IPv6 on bare-metal
February 14

Installing Go

To install Go 1.7.4 on Linux, I did these steps…

curl -O http://storage.googleapis.com/golang/go1.7.4.linux-amd64.tar.gz
tar -xvf go1.7.4.linux-amd64.tar.gz
sudo mv go /usr/local

In ~/.bashrc add:

export PATH=/usr/local/go/bin:$PATH
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin

Re-source the .bashrc to obtain the settings. Packages can be installed like this…

go get -u github.com/tools/godep
go get -u github.com/jteeuwen/go-bindata/go-bindata
go get -u github.com/nsf/gocode
go get golang.org/x/tools/cmd/guru
go get github.com/rogpeppe/godef
Category: Go | Comments Off on Installing Go
February 14

Kubernetes on a lab system behind firewall

After numerous tries, I think I finally came across a setup that will allow me to run Kubernetes (via KubeAdm), using the Calico plugin, and a bare metal system, that is behind a firewall and needed proxy to access the outside. This blog describes the process I used to get this to work.

Preparation for CentOS

On the bare metal system (a Cisco UCS), running CentOS 7.3, the needed packages need to be installed. First, is to update with the kubernetes repo:

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://yum.kubernetes.io/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
EOF

I ran “yum update -y” to update the system. Next, the packages need to be installed. Note: I had set up this system weeks before, so hopefully I’ve captured all the steps (if not, let me know):

setenforce 0
yum install -y docker kubelet kubeadm kubectl kubernetes-cni

I did recall at one point of hitting a conflict with the docker install, with what was on the system (maybe from mucking around on this system installing things before). In any case, make sure docker is installed and working. In my system, “docker version” shows 1.13. You may want to check “docker version” first, and if already installed, skip trying to reinstall.

Preparation for Ubuntu 16.04

For Ubuntu, the Kubernetes repo needs to be added along with keys, and then everything installed.

sudo su
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
echo deb http://apt.kubernetes.io/ kubernetes-xenial main >> /etc/apt/sources.list.d/kubernetes.list
apt-get update -y
apt-get install -y kubelet kubeadm kubectl kubernetes-cni

Proxy Setup

With everything installed (I hope :)), I next set the proxy up with http_proxy and https_proxy (lower and uppercase environment variables) pointing to the proxy server, and no_proxy set to IPs that should not go through the proxy server. For this system, no_proxy had the host IP, 127.0.0.1, and then the IPs for the IPv4 pool and IPs for the service IPs. The defaults use large subnets, so I reduced these to help make the no-proxy setting more manageable.

For the IPv4 pool, I’m using 192.168.0.0/24 (reduced size from default), and for the service IP subnet, I’m using 10.20.30.0/24 (instead of 10.96.0.0/12). I used these lines in .bashrc to create the no_proxy setting:

printf -v lan '%s,' 10.86.7.206
printf -v pool '%s,' 192.168.0.{1..253}
printf -v service '%s,' 10.20.30.{1..253}
export no_proxy="cisco.com,${lan%,},${service%,},${pool%,},127.0.0.1";
export NO_PROXY=$no_proxy

Make sure you’ve got these environment variables sourced.

Update: Alternative Proxy Setup

You can keep the default 10.96.0.10 IP in 10-kubeadm.conf, and instead use “–service-cidr=10.96.0.0/24” on the kubeadm init line, to reduce the size of the subnet.

In the .bashrc file, use this for service pool:

printf -v lan '%s,' 10.87.49.77
printf -v pool '%s,' 192.168.0.{1..253}
printf -v service '%s,' 10.96.0.{1..253}
export no_proxy="cisco.com,${lan%,},${service%,},${pool%,},127.0.0.1";
export NO_PROXY=$no_proxy

Calico.yaml Configuration

Obtain the latest calico.yaml (I used this one from a tutorial – https://github.com/gunjan5/calico-tutorials/blob/master/kubeadm/calico.yaml – commit a10bfd1d, but you may have success with http://docs.projectcalico.org/master/getting-started/kubernetes/installation/hosted/, I just haven’t tried it, or sorted out the differences).

Two changes are needed to this file. The etcd_endpoints needs to specify the host IP, and the ippool cidr should be changed from /16 to /24, so that we have a manageable number of no_proxy entries.

Since we are changing the default subnet for services, I changed /etc/systemd/system/kubelet.service.d/10-kubeadm.conf to use 10.20.30.10 for cluster-dns arg of KUBELET_DNS_ARGS environment setting. Be sure to restart the systemd service (systemctl daemon-reexec) after making this change. Otherwise, when you start up the cluster, the services will show the new 10.20.30.x IP addresses, but the kubelet process will still have the default –cluster-dns value of 10.96.0.10. This threw me for a while, until Ghe Rivero mentioned this on the KubeAdm slack channel (thanks!).

Update: If you stick with 10.96.0.10 for cluster-dns, you don’t need to change 10-kubeadm.conf (skip the previous paragraph).

Are We There Yet?

Hopefully, I have everything prepared (I’ll know next time I try to set up from scratch). If so, here are the steps used to start things up (as root user!):

kubeadm init --api-advertise-addresses=10.86.7.206 --service-cidr=10.20.30.0/24

Update: If you use the alternative method for service subnet, you’ll use –service-cidr=10.96.0.0/24, and the IPs will be difference in “kubectl get svc” command below.

This will display the kubeadm join command, for other nodes to be added to  cluster (I haven’t tried that yet for this setup).

kubectl taint nodes --all dedicated-
kubectl apply -f calico.yaml
kubectl get pods --all-namespaces -o wide

At this point (after some time), you should be able to see that all the pods are up, and have and IP address of the host, except for the DNS pod, which will have an IP from the 192.168.0.0/24 pool:

[root@bxb-ds-52 calico]# kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                        READY     STATUS    RESTARTS   AGE       IP              NODE
kube-system   calico-etcd-wk533                           1/1       Running   0          7m        10.86.7.206     bxb-ds-52
kube-system   calico-node-qxh84                           2/2       Running   0          7m        10.86.7.206     bxb-ds-52
kube-system   calico-policy-controller-2087702136-n19jf   1/1       Running   0          7m        10.86.7.206     bxb-ds-52
kube-system   dummy-2088944543-3sdlj                      1/1       Running   0          31m       10.86.7.206     bxb-ds-52
kube-system   etcd-bxb-ds-52                              1/1       Running   0          31m       10.86.7.206     bxb-ds-52
kube-system   kube-apiserver-bxb-ds-52                    1/1       Running   0          31m       10.86.7.206     bxb-ds-52
kube-system   kube-controller-manager-bxb-ds-52           1/1       Running   0          31m       10.86.7.206     bxb-ds-52
kube-system   kube-discovery-1769846148-lb51s             1/1       Running   0          31m       10.86.7.206     bxb-ds-52
kube-system   kube-dns-2924299975-c95bg                   4/4       Running   0          31m       192.168.0.128   bxb-ds-52
kube-system   kube-proxy-n0pld                            1/1       Running   0          31m       10.86.7.206     bxb-ds-52
kube-system   kube-scheduler-bxb-ds-52                    1/1       Running   0          31m       10.86.7.206     bxb-ds-52

You can also check that the services are in the service pool defined:

[root@bxb-ds-52 calico]# kubectl get svc --all-namespaces -o wide

NAMESPACE     NAME          CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE       SELECTOR
default       kubernetes    10.20.30.1    <none>        443/TCP         32m       <none>
kube-system   calico-etcd   10.20.30.2    <nodes>       6666/TCP        8m        k8s-app=calico-etcd
kube-system   kube-dns      10.20.30.10   <none>        53/UDP,53/TCP   31m       name=kube-dns

Now, you should be able to use kubectl to apply manifests for containers (I did one with NGINX), and verify that the container can ping other containers, the host, and other nodes on the host’s network.

What’s Next

I want to try to…

  • Joining a second node and see if containers are placed there correctly.
  • Retrying this process from scratch, to make sure this blog reported all the steps.

 

Category: Kubernetes | Comments Off on Kubernetes on a lab system behind firewall
February 13

Update on KubeAdm with Calico

After playing with this a bit, I did a few tweaks, based on some discussions with Calico folks. First, the host IPs being used for the nodes are 10.96.0.101 and 10.96.0.102. This is within the same subnet as is used by Kubernetes for the service IPs (10.96.0.0/12). To get around this, I modified the Vagrantfile to use a different IPs for the host nodes created.

An alternative is to use the option “–service-cidr” on “kubeadm init” to pick a different range for the service subnet, and modify /etc/systemd/system/kubelet.service.d/10-kubeadm.conf to set the IP for DNS to be within the range (restarting systemd to apply). If you use a manifest from master, you may need additional settings (setting clusterIP – I haven’t tried that). This is a more manual method though.

For my tests, I changed the Vagrantfile as follows:

primary_ip = "10.20.30."

...

      ip = "#{primary_ip}#{i * 10}"

The calico.yaml file needs to be modified too, to use 10.20.30.10 as the etcd_endpoints IP, instead of 10.96.0.101. From this point, you can do a “vagrant up” and then follow the rest of the steps to create the cluster and then switch to IPv6 and create containers.

Note: these same changes apply to the CentOS Vagrantfile in the blog.

Category: Kubernetes | Comments Off on Update on KubeAdm with Calico
February 11

Vagrantfile for KubeAdm/Calico using CentOS

This is an alternate Vagrantfile that uses CentOS 7, instead of Ubuntu 16.04 for creating a two node KubeAdm cluster for Kubernetes with Calico plugin. See https://blog.michali.net/2017/02/11/using-kubeadm-an…r-ipv6-addresses/ for info on how this is used. Besides the different image, the provision has some minor changes. Here’s the file contents, which I’ll eventually put into a github repo: Continue reading

Category: Kubernetes | Comments Off on Vagrantfile for KubeAdm/Calico using CentOS