July 20

Docker In Docker with GCE


My previous post mentioned about setting up Docker in Docker (DinD). This one builds upon that, using Google Compute Engine (GCE) in the workflow, with the goal of running End-to-End (E2E) tests. A local host will be used for all the DinD development, and GCE will be used to bring up the cluster running the nodes in a GCE VM.

I’m going to give it a try on native MacOS, a VM running on a Mac (using Vagrant and VirtualBox), and eventually a bare metal system. BTW, Google has some getting starting information on the GCP, GCE, and other tools.

This journey will involve incrementally working up to the desired result. There’s a lot to do, so let’s get started…


Google Compute Engine

Account Setup

The first step is to setup a Google Cloud Platform account. You can sign up for a free trial and get a $300 credit, usable for the first 12 months. They won’t auto-charge your credit card, after the trial period ends.

As part of the “Getting Started” steps, I went to the API Credentials page and created and API key:

The next step is to create the “Service Account Key” and set roles for it. For now, I think you can skip these steps and proceed in the process, as I have not identified the right roles settings to create a VM instance. Instead, there are steps in the “Cloud SDK” section below, where an auth login is done, and that seems to set the correct roles to be able to create a VM.

For reference, the steps for the service key are to first select the “Service Account Key” from the pull-down menu:

Google has info on how to create the key. I selected a role as project owner for the service account. I suspect there are more roles that are needed here. Next, select to create a JSON file.

Take the downloaded JSON file and save it somewhere (I just put it in the same directory as the DinD repo). Specify this file in the environment variable setting for GCE, for example:

export GOOGLE_APPLICATION_CREDENTIALS=~/workspace/dind/<downloaded-file>.json


Just note that you either set the GOOGLE_APPLICATION_CREDENTIALS, or you use the “auth login” step below. If this environment variable is set, when doing the auth login, you’ll get an error.


Host Setup/Installs

Before we can go very far there are some things that need to be setup…

Make sure that your OS is up-to-date (Ubuntu: “sudo apt-get update -y && sudo apt-get upgrade -y”). Beyond any development tools (e.g. go, git) you may want to have around, there are some specific tools that need to be installed for using DinD and GCE.


DinD and Kubernetes Repos

For DinD, you can do:

git clone https://github.com/Mirantis/kubeadm-dind-cluster.git dind


Ubuntu: I place this under my home directory, ~/dind.

Native Mac: I happened to place it at ~/workspace/dind.

For Kubernetes, go to your Go source area, create a k8s.io subdirectory, and clone the repo:

git clone https://github.com/kubernetes/kubernetes.git


Install docker and docker-machine

Docker should be the latest (17.06.0-ce) and docker-machine needs to be 0.12.1 or later, due to some bugs. Install docker…

Ubuntu: Use the steps from KubeAdm Docker In Docker blog notes. Be sure to enable user to run docker without sudo. Check the version with “docker version”.

Native Mac: Install Docker For Mac. This will install docker and docker-machine, but you need to check the versions. Most likely (as of this writing),  you’ll need to update docker-machine.

For docker-machine, you can install with:

curl -L https://github.com/docker/machine/releases/download/v0.12.1/docker-machine-`uname -s`-`uname -m` >/tmp/docker-machine
chmod +x /tmp/docker-machine
sudo cp /tmp/docker-machine /usr/local/bin/docker-machine


Cloud SDK

Instructions on cloud.google.com state to install/update to python 2.7, if you don’t have it already. Next, download the SDK package and extract:

wget https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-162.0.0-darwin-x86_64.tar.gz
tar xzf google-cloud-sdk-162.0.0-darwin-x86_64.tar.gz

You can run google-cloud-sdk/install.sh to setup the path at login to include the tools. Log out and back in, so that the changes take effect.

Now, you can run “gcloud init” and follow the instructions to initialize everything…

I used the existing project that was defined for my Google Cloud account, when the account was created. I selected to enable GCE. For zone/region, I picked one for US east coast.

Ubuntu: As part of this process, the script asked me to go to a URL with my browser, and once I logged in using my Google account, and gave Google Cloud access, a key was displayed to paste into the prompt and compete init.

Now, “gcloud info” and “gcloud help” can be run to see the operations available. For authentication with API, I did:

gcloud auth application-default login

Native Mac: This brings up a browser window pointed to a URL where you log in and give access to the SDK.

Ubuntu Server: Copy and pasted the displayed URL, and then from the browser, copied to token and pasted it into the prompt and continued.


Setting Project-wide Access Controls

To use GCE, an ssh key-pair needs to be set up. I used these commands (no passphrase entered):

ssh-keygen -t rsa -f ~/.ssh/google_compute_engine -C <username>
chmod 400 ~/.ssh/google_compute_engine


I think for the username, I should have used my Google email address (the “account” name), but I wasn’t sure, and had just used my login username “pcm”. On another machine I used my email.

To add the key as a project wide key, and to check that it is set up, use:

gcloud compute project-info add-metadata --metadata-from-file sshKeys=~/.ssh/google_compute_engine.pub
gcloud compute project-info describe



If you haven’t already, clone a Kubernetes repo, which will be used by DinD and to run the tests. I pulled latest on master and the commit was at 12ba9bdc8c from July 17, 2017.


Final Check

You should have docker 17.06.0-ce, docker-machine 0.12.1 (or newer), recent Kubernetes repo.

Ubuntu: I had DinD repo at ~/dind/ and Kubernetes at ~/go/src/k8s.io/kubernetes/.

Native Mac: I had DinD repo at ~/workspace/dind/ and Kubenetes at ~/workspace/eclipse/src/k8s.io/kubernetes/.



Running E2E Tests Using GCE and Docker In Docker

Before running the tests, the cluster needs to be started. From the DinD area, source the gce-setup.sh script:

. gce-setup.sh

Be sure to watch all the log output for errors, especially in the beginning, where it is starting up the GCE instance. I’ve see errors with TLS certificates, and it continues as if it was working, but was not using GCE and actually created a local cluster. You can check the status of the cluster, by doing:

export PATH="$HOME/.kubeadm-dind-cluster:$PATH"
kubectl get nodes
kubectl get pods --all-namespaces


From the Google Console Compute page,  check that the VM instance is running. You can even SSH into the instance.

You can then move to the kubernetes repo area and run the E2E tests by using the dind-cluster.sh script in the DinD area. For example, with my Ubuntu setup (adjust the paths for your areas):

cd ~/go/src/k8s.io/kubernetes
~/dind/dind-cluster.sh e2e


This runs all the tests and you can examine the results at the end. For example:

Ran 144 of 651 Specs in 281.944 seconds
FAIL! -- 142 Passed | 2 Failed | 0 Pending | 507 Skipped

Ginkgo ran 1 suite in 4m42.880046761s


Cleaning Up

After you are done testing, you can tear down the cluster by using the dind-cluster.sh script. For example, in my Ubuntu setup (adjust the path for your setup):

~/dind/dind-cluster.sh down


You can then do the “clean” argument, if you want to delete images.

When you are done with your GCE instance, you can use the following command to delete the instance (assuming the default name of ‘k8s-dind’ for the instance, as created by the gce-setup.sh script), locally and remotely:

docker-machine rm -f k8s-dind


Running E2E Tests Using CGE (no DinD)

You can just run the E2E tests, using GCE, without using DinD.  In these instructions, I did this in a Ubuntu 16.04 VM. I suspect the same will apply to native Mac.

After moving to the top of the Kubernetes repo, I ran the following clear docker environment variables (testing failed before and this was suggested, in addition to ensuring docker commands can be run by the user and the docker daemon is running):

unset ${!DOCKER_*}


I’m not sure where this is documented, but in order to bring up/tear down the cluster properly, you need to first (only once) have done:

gcloud components install alpha
gcloud components install beta


To build, bring up the cluster, test, and shut down the cluster, use the following, replacing the project and zone values, as needed:

go run hack/e2e.go -- -v --provider=gce \
    --gcp-project <my-default-proj-name> \
    --gcp-zone <my-zone> \
    --build --up --test --down


Now, before you do this, you may want to also filter the test run, so that it doesn’t run every test (which takes a long time).  You can also use a subset of the options shown, so you could run this command with just “–build –up”, then run it with “–test”, and finally, run it with “–down”.

When using “–test” argument, you can add the filters. To run the conformance tests, you could add this to the command line:


This takes about an hour to run the test part. You can skip serial tests with:

--test_args="--ginkgo.focus=\[Conformance\] --ginkgo.skip=\[Serial\]"


That shaved off a few minutes, but gave a passing run, when I tried it…

Ran 145 of 653 Specs in 3486.193 seconds
SUCCESS! -- 145 Passed | 0 Failed | 0 Pending | 508 Skipped PASS

Ginkgo ran 1 suite in 58m6.520458681s


To speed things up, you can add the following prefix to the “go run” line for the test:



With that, the same tests only took under six minutes, but had 5 failures. A re-run, took under five minutes and had only one failure. I guess the tests aren’t too stable. 🙂

See the Kubernetes page on E2E testing for more examples of test options that you can do.

When you are done, be sure to run the command with the “–down” option so that the cluster is torn down (and all four of the instances in GCE are destroyed).


Building Sources

If you want to build images for the run (say you have code changes in controller or kubectl), you can do these two environment settings:



Next, since it will be building binaries, you need to be sourcing the gce-setup.sh script from within you Kubernetes repo root. For example, on my setup, I did:

cd ~/go/src/k8s.io/kubernetes
. ~/dind/gce-setup.sh


Note: The updated binaries will be placed into the containers that are running in the VM instance on Google Cloud. You can do “gcloud compute ssh root@k8s-dind” to access the instance (assuming the default instance name), and then from there, access the container with “docker exec -it kube-master /bin/bash” to access one of the containers.



  • When you run “gcloud info” it gives you lots of useful info about your project. In particular, there are lines that tell you the config file and log file locations:
User Config Directory: [/home/vagrant/.config/gcloud]
Active Configuration Name: [default]
Active Configuration Path: [/home/vagrant/.config/gcloud/configurations/config_default]
Logs Directory: [/home/vagrant/.config/gcloud/logs]
Last Log File: [/home/vagrant/.config/gcloud/logs/2017.07.19/]


  • In reading the docs, I found that the precedence for configuration settings are:
    1. Command line argument
    2. Default in metadata server
    3. Local client default
    4. Environment variable


Known Issues

As of this writing, here are the known issues (work-arounds are indicated in the blog):

  • Need docker-machine version 0.12.1 or newer
  • On native Mac, Docker for Mac does not support IPv6
  • Zone is hard coded in gce-setup.sh Fix upstreamed.
  • The dind-cluster.sh (and some of the version specific variants) have the -check-version-skew argument for the e2e.go program syntax incorrect. Fix upstreamed.
  • You have to confirm there are no errors, when running gce-setup.sh, and verify that the GCE instance is running.
Category: Kubernetes | Comments Off on Docker In Docker with GCE
July 13

KubeAdm Docker in Docker

In several of my blog posts, I’ve mentioned about using KubeAdm to start up a cluster and then do some development work. Some of the Kubernetes instructions mention using local-up-cluster.sh to bring up a single local cluster.

An alternative is to use Docker in Docker (DinD), where master and two minion nodes are brought up as containers on the host. Inside these “node” containers, there are containers for the cluster components running. For example, in the kube-master container, the controller, API server, scheduler, etc. containers will be running.

DinD supports both local and remote workflows, as well.


Using a VM

To run this in a VM (I used Vagrant/VirtualBox on a Mac), you’ll need to setup Ubuntu 16.04 (server in my case). I tried this with CentOS 7, but DinD failed to come up (see below).

Once you have the OS installed, have logged in, you can start the process. First, make sure that everything is up-to-date, and install the “extras” package:

sudo apt-get update -y
sudo apt-get upgrade  -y
sudo apt-get install linux-image-extra-$(uname -r) linux-image-extra-virtual


Next, install Docker by first downloading the keys, and adding the repository:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update -y


Check that install will be from right place, by running this command:

apt-cache policy docker-ce


Install, and check that it is running:

sudo apt-get install -y docker-ce
sudo systemctl status docker


To allow the normal user to run docker commands, without using sudo, do:

sudo usermod -aG docker ${USER}


I checked the “docker version” (17.06.0-ce), “docker info | grep Storage” (aufs), and “unamkernel e -a” (4.4.0-51). With everything looking OK, I installed DinD:

mkdir ~/dind
cd ~/dind
wget https://cdn.rawgit.com/Mirantis/kubeadm-dind-cluster/master/fixed/dind-cluster-v1.6.sh
chmod +x dind-cluster-v1.6.sh


The cluster can now be brought up with:

./dind-cluster-v1.6.sh up


Once, this finishes, you have a three node cluster running in VMs. The output mentions of a dashboard available via a browser, but since I was running Ubuntu server, I couldn’t check that out (I was unable to forward to my host either). You can access the cluster using kubectl with:

export PATH="$HOME/.kubeadm-dind-cluster:$PATH"
kubectl get nodes
kubectl get pods --all-namespaces


Using a Bare Metal System

The process is identical as described in the VM case. If the bare metal system is behind a firewall, and a proxy is required,  you’ll run into issues (see below).


Problems Seen (and some workarounds, but unresolved)

Running on native MacOS

If you have Docker for Mac installed, you can bring up DinD on native MacOS. However, IPv6 is not yet supported for Docker for Mac, so I didn’t try this method (but others’ have for IPv4).

After installing Docker for Mac (to get docker command), you can wget DinD or clone the DinD repo (see below). Follow the same steps to run DinD, like with a VM.


Systems behind firewalls

First, docker will have problems talking to external servers to do pulls, etc. You can setup docker for a proxy server, by creating a file /etc/systemd/system/docker.service.d/http-proxy.conf with lines:



Use your host and port number for the HTTP_PROXY/HTTPS_PROXY entries, your hosts IP and your domain (preceded by a dot) for the NO_PROXY. You can then reload the daemon and restart docker:

systemctl daemon-reload
systemctl start docker


I also set these three environment variables in my .bashrc file, so that they are added to the environment settings. For NO_PROXY, I also included, 10.192.0.{1..20}, 10.96.0.{1..20} (service network), and 10.244.0.{1..20} (some IPs on the pod network).

With those environment variable settings, I modified the dind-cluster-v1.6.sh script to add the proxy environment variables to the docker run command in dind:run portion of script:

  # Start the new container.
  docker run \
         -d --privileged \
         -e HTTP_PROXY="${HTTP_PROXY:-}" -e HTTPS_PROXY="${HTTPS_PROXY:-}" \
         --net kubeadm-dind-net \


This passes in the needed proxy information into the kube-master conatiner, so that external sites could be accessed.

Unfortunately, there is still a problem. The kube-master container’s docker is not setup for proxy access, so pulls fail from inside the container. You can look at the docker logs and see the pulls failing.

A workaround (hack) for now, is to add the same http-proxy.conf file to the kube-master container, reload docker daemon, and restart docker. Eventually, the API server (which was previously exiting), would come up, along with the rest of the cluster.

I suspect that the same issue will occur for all the (inner) containers, so we need a solution that sets up docker correctly for a proxy environment.


Using CentOS 7

I have not been successful with this, trying a VM or bare-metal. As DinD is starting up, I see a docker failure. Inside the kube-master container, docker has exited, and displays a message saying “Error starting daemon: error initializing graphdriver: driver not supported”.

Doing some investigation, I see that on the (outer) host, CentOS is using the “devicemapper” storage driver (verus “aufs” for Ubuntu). As of this writing, this is the only driver supported. Inside the kube-master container, the storage driver is “vfs”, which via the scripts, is using “overlay2” (the same as what Ubuntu uses). However, the OS is RHEL 4.8.5. It appears that this driver is not supported.

Update: As of commit 477c3e3, this should be working (I haven’t tested yet). They changed the driver from “overlay2” to “overlay”.


Building and Running DinD From Sources

Instead of using the prebuilt scripts, you can clone the DinD repo:

git clone https://github.com/Mirantis/kubeadm-dind-cluster.git ~/dind
cd dind


The following environment variables should be set (and having a clone of the Kubernetes repo), to cause things to be built, as part bringing up a cluster:

./dind-cluster.sh up


You’ll need to do some hacking (as of this writing), to make this work. First, there is an issue with docker 17.06 ce, where the “docker wait” command hangs, if the container doesn’t exist. The workaround for now is to fall back to docker 17.03, instead of 17.06. You can follow the instructions on the Docker site, based on your operating system.

For Ubuntu, you can do “sudo apt-get install docker-ce=<version>” (not sure if that will be just 17.03). I didn’t do that, and instead hacked (as a temp fix) the destroy_container() function in the Kubernetes build/common.sh file.

Second, the dind-cluster.sh script (and the fixed/dind-cluster-v1.5.sh and fixed/dind-cluster-v1.7.sh scripts called from this script), have a line:

go run hack/e2e.go --v --test -check_version_skew=false --test_args='${test_args}'"


Apparently, the -check_version_skew argument has been changed to -check-version-skew. You can alter the script(s) to fix this issue.

Category: Kubernetes | Comments Off on KubeAdm Docker in Docker
June 7

Making Use of Kubernetes Test Infra Tools

The test team has a great summary page on test infrastructure. This blog just summarizes some of the pages, and as I learn more, will have some notes on the tools.

When you submit a Pull Request, there are several tests run, with the results reported in the PR:


If you click on the “Details”, it will take you to the gubernator page with the test results, failures (if any), and logs. You can go the Gubernator home page to see the jobs, where you can click on a job to see the history for a specific test (e.g. ci-kubernetes-build-1.7).

Test Grid

From the job page, there is a link to a detail page for another tool, TestGrid. This tools shows test results over time for jobs. The top level page has links for groups of tests, like “release 1.6 blocking”. From there, you can look at the results for a specific job. For example, you can see the kubelet 1.6 test results for the week, under the release 1.6 blocking tests.

The Summary link is very useful, for a group, as it will show how many tests failed and how many ran, for each test in the group, over the past week.

PR Dashboard

At the Gubernator home page is a link to the Pull Request Dashboard. This will show PRs of interest to you (you’re referenced in some manner). You may see Needs Attention for PRs that need review/approval, Approvable for reviews you could approve (if you have that capability), Incoming for review, and Outgoing for reviews you authored.

You can change the  user at the top to see someones dashboard, which can be useful, when looking for reviewers, as you can see their workload.



The top level test infrastructure tool Prow, shows PRs and jobs for several queues (?). The default is pre-commit, which is triggered to run when comments made on unmerged PRs. Another is the post-submit queue, which is triggered on every merge and/or push to branch. The periodic queue, is one that runs based on a timer (e.g. every 24 hours).  There is a batch queue that has several PRs being tested at once.

On the listing you can see the status of the job (check, X, or orange dot for in-progress), PR number(s), job name, start date/time, and duration. Clicking on the PR, takes you there. Clicking on the job, takes you to the test results.

You can do additional filtering (repo, author, job).


Submit Queue

The Submit Queue shows the PRs in queues. There are additional links to see PRs, merge history, and end-to-end test information with some health graphs. The info link shows the rules for how PRs are ordered in the merge queue, merge requirements, bot status, health, and a link to bot commands.

FYI: Erick Fejta gave a great presentation on the test infrastructure (a lot over my head :)). The slides are here.



For those interested in the big picture, there is Velodrome. This has a bunch of graphs with metrics, like merge rate, number of open pull requests, number of comments, number of commenters, etc.

At the top left, there is a pulldown with other metrics besides “Github Metrics”, including developer velocity and monitoring.


Triage Dashboard

If you are wondering a out failures by code area, visit the Triage Dashboard. You can see a graph of failures over time, along with a snippet of the error seen, and the job(s).

There are bunch of filters that can be applied, including text to search in the failure messages. Afraid I don’t have the secret decoder ring to fully understand this dashboard (yet).



Erick Fejta did a great recorded presentation at the 6/6/2017 SIG testing meeting on how the test infra currently works (slides). A great explanation of a very complex setup. The tools above are mentioned there.

Category: Kubernetes | Comments Off on Making Use of Kubernetes Test Infra Tools
May 30

End-To-End Testing

Updated 6/8/2017

I had been trying to follow the community page on end-to-end testing, but striking out. I gave it a try on native Mac (specifying KUBERNETES_PROVIDER=vagrant), on bare-metal, and inside a Virtual box VM running on a Mac. Each gave me different problems, which I’ll spare elaborating on in this blog. Instead, I’ll cut to the chase and describe what works…

One of the Kubernetes Developers (@ncdc) was kind enough to give me info on a working method, which I’ll elaborate on here.


First, though, here is what I have for a setup:

  • Mac host (shouldn’t matter).
  • CentOS 7  Vagrant box with 40GB drive, running via VirtualBox.
  • VM has 8GB RAM and 2 CPUs configured.
  • Go 1.8.1 installed and $GOPATH setup. Created ~/go/src/k8s.io as a local work area.
  • Tools installed: git, docker, emacs (or your favorite editor).
  • Pull of Kubernetes repo from the work area (latest try used commit b77ed78):
    • git clone https://github.com/kubernetes/kubernetes.git

Started up the docker daemon with:

sudo systemctl enable docker && sudo systemctl start docker


After trying this whole E2E process on a fresh VM setup, I found that when I tried to run the tests, the ginkgo app was not found in the expected areas under _output/. To remedy this, I did “make”, which builds everything, including ginkgo, and places it in _output/local/go/bin/ginkgo. Maybe there is a way to just build ginkgo, but for now, this works.


Starting The Cluster

From my Kubernetes repo area, ~/go/src/k8s.io/kubernetes, I made sure that etcd was installed and PATH was updated, as suggested:

export PATH=$PATH:`pwd`/third_party/etcd


Next, build hyperkube and kubectl:

make WHAT='cmd/hyperkube cmd/kubectl'


You can then start up the cluster, using:

     ./hack/local-up-cluster.sh -o _output/bin/

The API_HOST is set to the node’s main interface, which for a VirtualBox VM is usually If you are running under root account (I haven’t tried that, so YMMV), you won’t need sudo and the “-E PATH=$PATH” clause. Feel free to use a different LOG_LEVEL, if desired, too.


Running Tests

Once everything is up, you’ll get a message on how to use the cluster from another window. So, I opened another terminal window, did “vagrant ssh” to access my VM, and changed to the Kubernetes directory. I did these commands to prepare for a test run:

sudo chown -R vagrant /var/run/kubernetes $HOME/.kube
export KUBECONFIG=/var/run/kubernetes/admin.kubeconfig


Where my user is “vagrant”. To prevent e2e failures, @ncdc told me to always do the chown command, after stopping and then restarting the cluster.

The cluster can be examined with kubectl script, like “cluster/kubectl.sh get nodes”

Now, you can run the end-to-end tests, with your desired ginkgo.focus. Here is an example:

go run ./hack/e2e.go -- -v -test -test_args '--ginkgo.v --ginkgo.focus Kubectl.expose'


At the end of the run, you’ll see this type of output:

Ran 1 of 631 Specs in 28.812 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 630 Skipped PASS

Ginkgo ran 1 suite in 29.156205396s
Test Suite Passed
2017/05/30 13:40:55 util.go:131: Step './hack/ginkgo-e2e.sh -ginkgo.v -ginkgo.focus Kubectl.expose' finished in 29.254525787s
2017/05/30 13:40:55 e2e.go:80: Done


I ran Conformance tests, with:

go run ./hack/e2e.go -- -v -test -test_args '--ginkgo.v --ginkgo.focus \[Conformance\]'


At the end of the output, I could see the test results:

Ran 148 of 646 Specs in 3493.031 seconds
FAIL! -- 127 Passed | 21 Failed | 0 Pending | 498 Skipped --- FAIL: TestE2E (3493.05s)


All of these failures were under framework/pods.go and related to Volumes. Not sure what is wrong, but looks like some were failures to create pods due to security context:

Learning How To Focus

The ginkgo.focus argument is a regular expression that maps to the It() clauses in code in test/e2e/*. You cannot use quotes or spaces (so, use \s). For example, if I see that there are test cases that use the word Selector:

git grep Selector | egrep "It[(]"
test/e2e/network_policy.go: It("should enforce policy based on PodSelector [Feature:NetworkPolicy]", func() {
test/e2e/network_policy.go: It("should enforce multiple, stacked policies with overlapping podSelectors [Feature:NetworkPolicy]", func() {
test/e2e/network_policy.go: It("should enforce policy based on NamespaceSelector [Feature:NetworkPolicy]", func() {
test/e2e/scheduling/predicates.go: It("validates that NodeSelector is respected if not matching [Conformance]", func() {
test/e2e/scheduling/predicates.go: It("validates that NodeSelector is respected if matching [Conformance]", func() {
test/e2e/scheduling/predicates.go: It("validates that a pod with an invalid podAffinity is rejected because of the LabelSelectorRequirement is invalid", func() {

I can run the test with:

go run ./hack/e2e.go -- -v -test -test_args '--ginkgo.v --ginkgo.focus Selector'



It shows this near the end of the output:
Summarizing 2 Failures:

[Fail] [k8s.io] NetworkPolicy [It] should enforce policy based on NamespaceSelector [Feature:NetworkPolicy]

[Fail] [k8s.io] NetworkPolicy [It] should enforce policy based on PodSelector [Feature:NetworkPolicy]

Ran 6 of 646 Specs in 399.804 seconds
FAIL! -- 4 Passed | 2 Failed | 0 Pending | 640 Skipped --- FAIL: TestE2E (399.83s)


Being a regular expression, I could refine this to only running the three tests in networking_policy.go. First, I saw that the desired tests were under this section:

var _ = framework.KubeDescribe("NetworkPolicy", func() {
f := framework.NewDefaultFramework("network-policy")
    It("should enforce policy based on PodSelector [Feature:NetworkPolicy]", func() {

I then modified the test to run with this focus:

go run ./hack/e2e.go -- -v -test -test_args '--ginkgo.v --ginkgo.focus NetworkPolicy.*Selector'
Summarizing 2 Failures:

[Fail] [k8s.io] NetworkPolicy [It] should enforce policy based on NamespaceSelector [Feature:NetworkPolicy]

[Fail] [k8s.io] NetworkPolicy [It] should enforce policy based on PodSelector [Feature:NetworkPolicy]

Ran 3 of 646 Specs in 156.527 seconds
FAIL! -- 1 Passed | 2 Failed | 0 Pending | 643 Skipped --- FAIL: TestE2E (156.61s)


There are some example labels that can be used for the focus (and some examples that you could use are on that page as well). Hint: I wouldn’t run the test without any focus set…it takes a really long time.

When you are all done, in the first window, just press control-C to shutdown the cluster. Don’t forget to do the chown command above, if you restart the cluster.

Important Notes

A few things I found out…

Having a large disk drive will be important, as it is not easily resizable with Vagrant. I found that 40 GB was more than enough. Some vagrant boxes are only 20GB, and I’ve run out of space after using it for a while.


Be sure when you run the test, that you have the -v option after the double dash, or specify it inside the test_args string.


If you are changing your code, and then want to retest, you can run make for just cmd/hyperkube, and then re-run local-up-cluster.sh. Hyperkube is an all-in-one binary with kube-apiproxy, kubelet, kube-scheduler, kube-controller-manager, and kube-proxy.


You can use this setup for development work, although you’ll likely want to include additional tools, and maybe even play with kubeadm.


As of 5/31/2017, there is a bug in the tests that is preventing kubelet from starting. The fix is being worked under PR 46709. In the meantime though, you can start up the cluster with this:

sudo FEATURE_GATES=AllAlpha=false LOG_LEVEL=4 API_HOST= ENABLE_RBAC=true -E PATH=$PATH ./hack/local-up-cluster.sh -o _output/bin/

UPDATE: On 6/8/2017, I pulled the latest kubernetes and didn’t have to use this temp fix, so the change is upstreamed now.


Category: Kubernetes | Comments Off on End-To-End Testing
May 11

I’ve pushed up a Kubernetes change for review… now what?

Updated  V2 – 6/8/2017

I’m assuming you’ve already gone to the Community page on contributing, for information on how to find issues to work on, how to build and test Kubernetes, signing the Contributors License Agreement (CLA), and followed the link on how to do a pull request.

You code is up there, ready for review. Now what?

How do you know who is reviewing the code, and what the exact steps are?

What if it is not getting reviewed?

Hopefully, the notes here will help…

By the way…

If it’s your first commit, I highly recommend doing something super simple, so that you can get the process down, and not have something technically challenging as well. Using labels, you can look for low-hanging-fruit issues or issues for new contributors.

Pull Request Posted…Now What?

Once you’ve forked the Kubernetes repo, pushed your changes up to your repo, clicked on the “Compare and Pull Request”, and filled out the pull request, you’ll see several things happen.

The k8s-robot will take some actions to make sure that you have a signed CLA…

And that the commit needs an OK to be tested…

The k8s-reviewer bot may leave a comment indicating that the PR is “reviewable” over at reviewable.kubernetes.io (at time of this posting, this seems to be enabled only occasionally):

Lastly, the bot will assign a reviewer and provide some useful info…

Note that it indicates the OWNERS file. You can go to that file and see the approvers, and depending on the file, reviewers that could review the commit.

What do you do, if the reviewer doesn’t respond to the review?

From the kubernetes-dev Google group, Erick Fejta gave three suggestions. Here’s some elaboration on them…


Un-assign the current reviewer

To do this, you can add a comment for the review that looks like this:

Now, I did this, but forgot the ‘@’ symbol. It didn’t do anything, and folks pointed out that going back and editing the comment to add the at-sign would not help, because the bot only looks at new comments. Whoops.


Use the slack channel

  • Sign up for the Kubernetes “Team”, by going to slack.kubernetes.io. Once done, you can login at kubernetes.slack.com.
  • Find the channel for the code you are changing. In my case, I had changed code in k8s.io/apimachinery/pkg/… so I used the sig-api-machinery channel.
  • Ask on the channel for a recommendation of reviewers
  • Use the /assign comment (again with an at-sign before the user name(s) to assign the review to people recommended).

You can also look for the reviewer in Slack, and when they are available, touch base with them to see if they have bandwidth to review. Thats what I did in one case, and the person indicated they were wicked busy.


Use the Owners file

Above I showed that the bot indicated the OWNERS file. For one of my commits, it had this OWNERS file for federation with a bunch of reviewers. Some, like the apimachinery one, may only have approvers shown (which I guess double as reviewers).

Now, you could randomly assign from that list, but a better method it to take into consideration the work load of the reviewers. To do that, go to the Pull Request Dashboard and specify the reviewer’s name. You’ll get a page that has this right below the page banner, along with a list of the reviews the person is working on:

As you can see, ‘lavalamp’ was pretty busy, at the time of this review. You can enter other people’s names from the OWNERS file, into the text field and see how busy they are to make a better decision. I suspect you could even try to ping them on Slack to see if they have time to review. Also, you can click on “Me” and see your outgoing commits/reviews.

How To House Train Your Bot

There is info on the various bot commands, what they do, and who can use them. For example, I had a case where I put NONE in the release note section of the PR form, instead of “`NONE“`, and the bot had labeled my PR as needing a release note. I did the following, and the bot then removed the incorrect label and added the correct one:

No CNCF-CLA label?

I had one commit, where the k8s-ci-robot didn’t add in the cncf-cla label, like it should have. Here’s what you’d normally see:

In this commit, I had forgotten to add the “fixed #12345” in the first commit comment to indicate the issue fixed, but from what I hear, this should not affect the labelling, which should use the author and email address. I checked against other commits, and the info was the same.

I’m not sure why the bot didn’t add the label (even with a “no” instead of “yes”). Erick Fejta tried closing and reopening the PR, but that didn’t seem to work.

I did a rebase with Kubernetes master and then pushed up the code again, and this time, the bot added the label for “cncf-cla: yes”. One thing I had done differently, was that I was using a new VM for local development and in that VM, I had the remote for my kubernetes repo set up using https: protocol. I had, however, set up github to use SSH keys (forgot about that, when setting up the VM).

I changed the remote to use git: protocol, like this:

git remote set-url origin git@github.com:pmichali/kubernetes.git

I then re-pushed the change to my github on the same branch (with the -f option to force). The PR got the new commit, and the bot added the label!


Label Me Purple

There is a way to add labels to an issue. I saw the following done by a contributor (can be done by anyone):

As you can see, it added a label to the issue. There is also a “/area” bot command, which also creates a label. You can see the available labels.


Can I Kick Off Testing?

Well, it depends… If you are a Kubernetes member, you can do a “@k8s-bot ok to test” comment. For newbies, you won’t be a member, and will need to wait for a member to do this command to enable testing of the pull request.

Granted, if one of your tests fail, you can use the directions in the bot comment to resubmit that specific test. For example, I had a test failure reported and I just cut and pasted in the mentioned “@k8s-bot pull-kubernetes-federation-e2e-gce test this” to re-check this, once the failure (a flake) was fixed.

There is also a “/retest” comment that can be entered to have all the tests re-run.

See my blob entry on test infra tools for info on looking at test results, checking on the health of test jobs, etc.


What If My Code Is Not Ready For Review?

If you want to create a PR of some work-in-progress code, so people can see your changes, but are not ready for review, you can add the prefix “WIP” to the subject.

Looks Good To Me

Once the reviewer is happy with your changes, they’ll mark it as lgtm:

If an approver is not already assigned you, or the reviewer will want to assign them (you can refer to the OWNERS list):

Once they approve, you’ll see a bunch of bot activity to invoke the tests and complete the process, merging in the PR(assuming all things go well):

It’ll also provide a button to allow you to remove your branch, as it is no longer needed:


What’s Reviewable?

I noticed that some people go to the”Files changed” tab, and then add comments to the code. Other’s click on the Reviewable button on the page:

This takes you to a page, where you can review the code, add comments, acknowledge changes, indicate done, see what files have been reviewed and by whom, and see all the revisions. I’m still trying to figure out all the ins and outs of this tool, and will try to add notes here. TODO

Note, however, not all PRs will have the “reviewable” button (I’m not sure it is enabled all the time – maybe its use is in beta?).

One thing I see at the bottom of the page at the reviewable.kubernetes.io page for my commit is:

The last link indicates to open an issue with the Kubernetes repo to connect it to Reviewable. Not sure if I should make that request. Maybe someone can comment.


Category: Kubernetes | Comments Off on I’ve pushed up a Kubernetes change for review… now what?
March 20

Kubernetes and Contiv on Bare-Metal with L3/BGP

Building on the previous blog of running Kubernetes with Contiv on bare-metal (https://blog.michali.net/2017/03/07/kubernetes-with-contiv-plugin-on-bare-metal/), I’m trying to do this with L3/BGP. To do this, an upstream router will be used to act as a BGP route reflector. In my case, I’m using a Cisco Nexus 9

Preparing Hosts

From CIMC on each UCS box, I created another pair of VNICs, setup in access mode, with a VLAN (3290) that is within the allowed VLANs for the port-channel on the Top of Rack (ToR) switch.

From CentOS, I created another pair of interfaces (b0 and b1), and a bonded interface (b). I verified that the MACs on the slave interfaces matched the MACs on the VNICs created in CIMC.

Note: if you still have the “t” interface (with slaves t0 and t1, which are associated with trunk veths) from the blog entry for using Contiv with L2 interfaces, you need to disable that interface, as only one uplink is supported. I changed “ONBOOT = no” in /etc/sysconfig/network-scripts/ifcfg-t.


Preparing the ToR Switch

On the ToR switch, BGP should be set up. In my case, I have a pair of Cisco Nexus 9Ks, which have port-channels for each of the nodes (using bonded interfaces on the nodes). There is an allowed VLAN on the port channels (3290) that will be used for L3/BGP. First the needed features were enabled and a VRF was created (used on one N9K and on the other):

feature interface-vlan
feature bgp
vrf context contiv
  address-family ipv4 unicast


Then, the BGP AS was created and neighbors defined. My three nodes neighbor addresses will be The router ID on one N9K is and on the other (these two are used for the bonded interfaces).

router bgp 65000
  address-family ipv4 unicast
  vrf conti
      remote-as 65000
      address-family ipv4 unicast
      remote-as 65000
      address-family ipv4 unicast
      remote-as 65000
      address-family ipv4 unicast


Lastly, an interface VLAN was defined on each N9K (again with different IP on each):

interface Vlan3290
  no shutdown
  vrf member contiv
  no ip redirects
  ip address
  no ipv6 redirects


Starting Up Kubernetes

Following the previous blogs notes, on the master node, I started up Kubernetes with:

kubeadm init --api-advertise-addresses= --use-kubernetes-version v1.4.7 --service-cidr
kubectl taint nodes --all dedicated-
kubectl get pods --all-namespaces -o wide


Be sure to save the join command, so that other nodes can be added later. All the pods, except for DNS, should be running.


Starting Up Contiv plugin

For this step, we use a newer version of the Contiv netplugin and we tweak the install.sh to fix a minor problem, until a newer release is pushed. Follow the normal process to obtain the plugin installer:

export VERSION=1.0.0-beta.3
curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
tar xf contiv-$VERSION.tgz


Then, modify install/k8s/contiv.yaml to change the netplugin and netmaster container’s image line from “contiv/netplugin:1.0.0-beta.3” to “contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC”. If you are tearing down a previous setup, and rebuilding, you may also want to add “- -x” to the “args:” section of the “name: contiv-netplugin” container section, so that any OVS bridges from previous runs are removed, before starting a new install. Here are diffs, for both changes:

cd ~/contiv/contiv-$VERSION/install/k8s
*** contiv.yaml.orig    2017-03-13 12:26:53.397292278 +0000
--- contiv.yaml 2017-03-13 12:46:16.548371216 +0000
*** 25,33 ****
          # container programs network policy and routes on each
          # host.
          - name: contiv-netplugin
!           image: contiv/netplugin:1.0.0-beta.3
              - -pkubernetes
              - name: VLAN_IF
                value: __VLAN_IF__
--- 25,34 ----
          # container programs network policy and routes on each
          # host.
          - name: contiv-netplugin
!           image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC
              - -pkubernetes
+             - -x
              - name: VLAN_IF
                value: __VLAN_IF__
*** 139,145 ****
        hostPID: true
          - name: contiv-netmaster
!           image: contiv/netplugin:1.0.0-beta.3
              - -m
              - -pkubernetes
--- 140,146 ----
        hostPID: true
          - name: contiv-netmaster
!           image: contiv/netplugin:1.0.0-beta.3-03-08-2017.18-51-20.UTC
              - -m
              - -pkubernetes



Then, modify install.sh in the same area, to remove the “./” from the netctl command that is setting the forwarding mode to routing, on line 245, so it looks like this:

    netctl --netmaster http://$netmaster:9999 global set --fwd-mode routing


Once all the changes are made, run the install.sh script with the same args as in the other blog, only we add the “-w routing” argument to cause L3 to be used. This uses the IP of the main interface on the master node (this node), and specifies the “b” interface.

cd ~/contiv/contiv-$VERSION
install/k8s/install.sh -n -v b -w routing


Check that the new Contiv pods (contiv-api-proxy, contiv-etcd, contiv-netmaster, contiv-netplugin) are all running. You can check that the forwarding mode is set for routing:

netctl global info


Create A Network

For a network, I created a default network using a VXLAN:

netctl net create -t default --subnet= default-net


Add Other Nodes

Use the join command, saved from the init command output, to add in the other worker nodes. You should see a contiv-netplugin and kube-proxy pod running for each worker node added. From what I can see, kube-dns will have three of four pods running and will show liveliness/readiness failures. This is not currently used (and will be removed at some point, I guess), so can be ignored (or deleted).


Create BGP Neighbors

Next, we need to create BGP connections to each of the nodes with:

netctl bgp create devstack-77 --router-ip="" --as="65000" --neighbor-as="65000" --neighbor=""
netctl bgp create devstack-78 --router-ip="" --as="65000" --neighbor-as="65000" --neighbor=""
netctl bgp create devstack-71 --router-ip="" --as="65000" --neighbor-as="65000" --neighbor=""


Yeah, I have a host named devstack-71, that has a main interface with IP ending in .79. I chose to use the same numbering for the BGP interface (inb01) that is created. I’m using the one ToR switch’s IP address as the neighbor, for each of these connections. If it fails, things should failover to the other ToR. For the host side, I’m picking an IP on the 30.30.30.x net, not conflicting with the one created on the ‘b” interface.


Trying It Out

I created pods (NGINX with 4 replicas) and verified that the pods were created and that I could ping from pod to pod (across nodes). I also created a network with VLAN encapsulation, using:

netctl net create orange -s -g -e vlan


And then, to the labels section of the metadata section of the manifest for NGINX, I added the following to be able to use that network:

    io.contiv.network: orange


Note: for the pods created, I could ping between pods on the same node, but not pods on other nodes.

Update: I found out from the Contiv folks that the plugin doesn’t yet support virtual Port Channels for the uplink, that I’m using on the three node setup I have. As a result, if  a container hashed to the other ToR’s port channel interface, it could not communicate with containers connected to the other ToR.  I’ll need to retry, once support is available for vPCs. In the meantime, I just shut down the interfaces to nodes, on the other ToR switch.



Category: Kubernetes | Comments Off on Kubernetes and Contiv on Bare-Metal with L3/BGP
March 7

Kubernetes with Contiv plugin on bare-metal


I used three Cisco UCS systems for the basis of the Kubernetes cluster and followed the preparation and proxy steps in the blog https://blog.michali.net/2017/02/14/kubernetes-on-a-lab-system-behind-firewall/ to get the systems ready for use.

With that setup, the systems had a pair of VNIC interfaces (a0, a1), joined into a bonded interface (a), and an associated bridge (br_api) on the UCS. There are two physical interfaces that go to a pair of Top of Rack (ToR) set up as a port-channel, for connectivity between systems. It could have been done with a single interface, but that’s what I had already in the lab.

For Contiv, we want a second interface to use for the tenant network, so I modified the configuration of each of the three systems to add another pair of interfaces (t0,t1), and a master interface to bond them together (t). In the CIMC console for the UCS systems, I added another pair of VNICs, t0 and t1, selected trunk mode,  and made sure the MAC addresses matched the HWADDR in the /etc/sysconfig/network-scripts/ifcfg-t* files in CentOS.  Again, a single interface could be used, instead of a bonded pair like what I had.

Since this is a lab system that is behind a firewall, I modified the no_proxy entries in .bashrc on each node to use:

printf -v lan '%s,' ",,"
printf -v service '%s,' 10.254.0.{2..253}
export no_proxy="cisco.com,${lan%,},${service%,},";


Effectively, all the IPs for the nodes (10.87.49.x) and the service subnet IPs ( – note smaller than default /16 subnet). In addition, on each system, I made sure there was an /etc/hosts entry for each of the three nodes I’m using.

Besides installing Kubernetes, I also installed “net-tools” on each node.


Kubernetes Startup

KubeAdm is used to startup the cluster with the IP of the name interface for this master node, forcing v1.4.7 Kubernetes, and using the default service CIDR, but with a smaller range (so that fewer no-proxy entries needed):

kubeadm init --api-advertise-addresses= --use-kubernetes-version v1.4.7 --service-cidr
kubectl taint nodes --all dedicated-
kubectl get pods --all-namespaces -o wide


Save the join command output, so that the worker nodes can be joined later.

All of the services should be up, except for DNS, which, since this first trial will use L2, this will be removed. We’ve removed the taint on this master node, so it too can be a worker.


Contiv Preparation

We’ll pull down the version of Contiv that we want to work with, and will run the install.sh script:

export VERSION=1.0.0-beta.3
curl -L -O https://github.com/contiv/install/releases/download/$VERSION/contiv-$VERSION.tgz
tar xf contiv-$VERSION.tgz
cd contiv-$VERSION

./install/k8s/install.sh -n -v t


This will use the node (the one I’m on and will use as the netmaster), and will use interface t (the tenant interface that I created above) for tenant interface. The script installs netctl in /usr/bin, so that it can be used for network management, and it builds a .contiv-yaml file in the directory and applies it to the cluster.

Note that there are now Contiv pods running, and the DNS pod is gone.


Trying It Out

On each of the worker nodes, run the join command. Verify on the master, that the nodes are ready (kubectl get nodes) and that a Contiv netplugin and proxy pods for each of the workers are running (kubectl get pods –all-namespaces). On the master, there should be kubernetes and kube-dns services running (kubectl get svc –all-namespaces).

Using netctl, create a default network using VXLAN. First must set an environment variable, so the netctl can communicate with the master:

netctl net create -t default --subnet= default-net


Next, create a manifest for some pods and apply them. I used nginx with four replicas, and verified that the pods were all running, dispersed over the three nodes, and all had IP addresses. I could ping from pod to pod, but not from node to pod (expected, as not supported at this time).

If desired, you can create a network using VLANs and then add a label “io.contiv.network: network-name” to the manifest to create pods on that network. For example, I created a network with VLAN 3280 (which was an allowed VLAN on the ToR port-channel):

netctl network create --encap=vlan --pkt-tag=3280 --subnet= --gateway= vlan3280


Then, in the manifest, I added:

app: demo-labels
io.contiv.network: vlan3280


Once the manifest is applied, the pods should come up and have IP addresses. You can docker exec into the pods and ping from pod to pod. As with VXLAN, I cannot ping from node to pod.

Note: I did have a case where pods on one of the nodes were not getting an IP address and were showing this error, when doing a “kubectl describe pod”:

  6m        1s      105 {kubelet devstack-77}           Warning     FailedSync  Error syncing pod, skipping: failed to "SetupNetwork" for "nginx-vlan-2501561640-f7vi1_default" with SetupNetworkError: "Failed to setup network for pod \"nginx-vlan-2501561640-f7vi1_default(68cd1fb3-0376-11e7-9c6d-003a7d69f73c)\" using network plugins \"cni\": Contiv:Post http://localhost/ContivCNI.AddPod: dial unix /run/contiv/contiv-cni.sock: connect: no such file or directory; Skipping pod"


It looks like there were OVS bridges hanging around from failed attempts. Contiv folks mentioned this pull request for the issue – https://github.com/contiv/install/pull/62/files#diff-c07ea516fee8c7edc505b662327701f4. Until this change is available, the contiv.yaml file can be modified to add the -x option. Just go to ~/contiv/contiv-$VERSION/install/k8s/contiv.yaml and add in the -x option for netplugin.

        - name: contiv-netplugin
          image: contiv/netplugin:__CONTIV_VERSION__
            - -pkubernetes
            - -x

Once this file is modified, then you can do the Contiv Preparation steps above and run the install.sh script with this change.


Update: I was using service CIDR of, but Contiv folks indicated that I should be using (I guess useful for Kubernetes services using service type ClusterIP). I updated this page, but haven’t retested – yet.


Category: Kubernetes | Comments Off on Kubernetes with Contiv plugin on bare-metal
March 6

Kubernetes with Contiv plugin in VM


An easy way to setup Contiv on a pair of nodes, is to use the demo installer that is on Github (https://github.com/contiv/install/tree/master/cluster). I did this on a Macbook Pro, with 16 GB of RAM by using these commands:

cd ~/workspace/k8s
git clone https://github.com/contiv/install.git contiv-install
cd contiv-install
BUILD_VERSION=1.0.0-beta.3 make demo-k8s

The make command, will move to the cluster directory and invoke a Vagrantfile to bring up two nodes with Contiv. It uses KubeAdm, starts up a cluster, builds and applies a YAML file, and creates a VXLAN based network. You only need to create pods, once that is completed.


Once the make command has completed, you can access the master node with:

cd cluster
CONTIV_KUBEADM=1 vagrant ssh contiv-node1

From there, you can issue kubectl commands to view the nodes, pods, and apply YAML files for starting up pods. The worker node can be accessed the same way, by using “contiv-node2” as the host name. Use the netctl command to view/manipulate the networks. For example, commands like:

netctl network ls
netctl net create -t default --subnet= default-net
netctl group create -t default default-net default-epg
netctl net create vlan5 -s -g -pkt-tag 5 --encap vlan

Note: if you want to create a pod that uses a non-default network, you can use the following syntax in the pod spec:

cat > busybox.yaml <<EOT
apiVersion: v1
kind: Pod
  name: busybox-harmony-net
    app: demo-labels
    io.contiv.network: vlan100
  - name: bbox
    image: contiv/nc-busybox
      - sleep
      - "7200"


This uses VLAN100 network that was previously created with:

netctl network create --encap=vlan --pkt-tag=100 --subnet= --gateway= vlan100



I found that this procedure did not work, when my Mac was connected via VPN to the network. It appears that the VPN mechanism was preventing the VM to ping the (mac) host, and vice versa. Could not even ping the vboxnet interface’s IP from the Mac. Once disconnected from VPN, everything worked fine.

With the default VXLAN that is created by the makefile, you cannot (yet) ping from the node to a VM (or vice versa). Pod to pod pings work, even across nodes.

When done, you can use the cluster-destroy make target to destroy the VMs that are created.

Category: Kubernetes | Comments Off on Kubernetes with Contiv plugin in VM
February 23

Updates: IPv6 with KubeAdm and Calico

With some recent code changes (so this applies to using latest on master), I found that I needed to modify a few things…

Bare Metal

Where I have IP6_AUTODETECT_METHOD set to “first-found”, in calico.yaml, the environment variable needs to be renamed to IP6_AUTODETECTION_METHOD.



I started encountering a failure when joining the second node in this setup. I found that it was using the IP for the IPv4 BGP address and this is a problem on this setup. It turns out that VirtualBox will create a main interface (enp0s3) with the IP for every VM created. Now, the Vagrantfile creates a second interface, enp0s8, that has a different IP for each node, on the subnet To make Calico use the second interface, the calico.yaml file needs to have this clause added to the BGP section:

 # Auto-detect the BGP IP address.
 - name: IP
 value: ""
 value: "can-reach="
 - name: IP6
 value: "autodetect"
 value: "first-found"


I used the can-reach value, but I think I could have done “interface=enp0s8” as well.

For IPv6, I added an IPv6 address to enp0s8, for each node, using a line like (with different IPs on each node, of course):

ip addr add 2001:2::15/64 dev enp0s8


Trying With Changes

After bringing up the cluster, creating the IPv6 pool, and enabling IPv6 on each node (/etc/cni/net.d/10-calico.conf), I created some pods, using this clause in the manifest:

 name: my-nginx6
 replicas: 3
 run: my-nginx6
 "cni.projectcalico.org/ipv6pools": "[\"2001:2::/64\"]"
 - name: my-nginx6
 image: nginx
 - containerPort: 8080


They all had IPv6 addresses, but there were two issues. First, the two replicas were both created on node-02. I ended up creating eight replicas, so that there were two on node-01. With bare metal, I see that pods are pretty much distributed evenly on all nodes, but I don’t see that in the VM cases (utilization is higher on the master/worker node). One problem down, one to go…

Second, on each node, I don’t see a route to the other node. Looking at “calicoctl node status” (remember to set ETCD_ENDPOINTS as mentioned in other blogs) I see that BFG connections are not working:

Calico process is running.

IPv4 BGP status
| | node-to-node mesh | start | 15:14:38 | Active Socket: Connection |
| | | | | refused |

IPv6 BGP status
| 2001:2::16 | node-to-node mesh | start | 15:14:38 | Active Socket: Connection |
| | | | | refused |


If I look in the calico-node container, I see that the bird and bird6 processes are not running and there are no config files in /etc/calico/confd/config/ on node-02 (is OK on node-01).

I also found that forwarding was not set for all interfaces on both of the nodes, so I did this as well:

sysctl net.ipv6.conf.all.forwarding=1


Talking to Gunjan Patel, we looked at the calico-node docker log and saw:

2017-02-23T17:09:07Z node-02 confd[54]: ERROR 501: All the given peers are not reachable (failed to propose on members [] twice [last erro\
r: Get dial tcp connection refused]) [0]


Looks like it is trying to use for peering and failing. Gunja referred me to a commit he made up (https://github.com/gunjan5/calico-tutorials/commit/eac67014f0509156278dc9396185e784fa7f1aec?diff=unified).

After updating my calico.yaml with these changes, I see that the BGP peering connection is established, when checking node status. I continued on and created IPv6 pods and verified that could ping across nodes. Yay!

For reference, here is the calico.yaml file I’m using (today :)) – working.calico.yaml

That file, adding IPv6 addresses to each node’s enp0s8 interface, and (possibly) enabling forwarding on all IPv6 interfaces, should be enough to do the trick. Then, just add IPv6 pool, enable IPv6 on both nodes, and create pods.

On bare-metal, the calico.yaml specified the interface I wanted to use for the network, and I needed to enable forwarding on the one node (not sure how to persist that). I could then ping from node to container and container to container, across nodes.


Category: Kubernetes | Comments Off on Updates: IPv6 with KubeAdm and Calico
February 22

IPv6 Multi-node On Bare-Metal

In a previous blog entry, I was able to bring up a cluster on a three node bare-metal setup (with Calico plugin), and then switch to IPv6 and create pods with IPv6 addresses. At the time, I just did a cursory check and, made sure I could ping the pod using its IPv6 address.

Well, the devil is in the details. When I checked multiple pods, I found a problem where I could not ping a pod from a different node, or ping pod to pod, when they are on different nodes.

Looking at the routing table, I was seeing that there was a route for each local pod on a node, using the cali interface. But, there were no routes to pods on other node (using the tunl0 interface), like I was seeing with IPv4:

IPv4: via dev tunl0  proto bird onlink
blackhole  proto bird dev calie572c5d95aa  scope link via dev tunl0  proto bird onlink


2001:2::a8ed:126:57ef:8680 dev calie9323554a97  metric 1024
2001:2::a8ed:126:57ef:8681 dev calid6195fe85f3  metric 1024
blackhole 2001:2::a8ed:126:57ef:8680/122 dev lo  proto bird  metric 1024  error -22


When checking “calicoctl node status” it showed IPv4 BGP peers, but no IPv6 BGP peers. I found that in calico.yaml, I needed to have this:

# Auto-detect the BGP IP address.
 - name: IP
 value: ""
 - name: IP6
 value: "autodetect"
 value: "first-found"


From what I understand, leaving IP value empty, means it will autodetect and use that IP. For IPv6 though, if IP6 is set to empty value or the key is missing, the IPv6 BGP is disabled.

Also, I was using the :latest label for CNI, calico-node, and calico-ctl images. Changed those to :master to get the recent changes.

Now, when nodes join, I see BGP peer entries for both IPv4 and IPv6:

Calico process is running.

IPv4 BGP status
| | node-to-node mesh | up | 21:07:06 | Established |
| | node-to-node mesh | up | 21:07:12 | Established |

IPv6 BGP status
| 2001:2::79 | node-to-node mesh | up | 21:07:06 | Established |
| 2001:2::78 | node-to-node mesh | up | 21:07:12 | Established |


I proceeded to create three pods using IPv4. I could ping from one host to each pod, and from one pod, to the other two pods on different nodes. Each host had routes like these: dev cali812c8ee8317 scope link via dev tunl0 proto bird onlink via dev tunl0 proto bird onlink


Next, I switched to IPv6 (enabled in 10-calico.conf on each node, and added IPv6 pool on master node) and create three more pods. Had an issue, as master node had old docker image for CNI, which didn’t have latest fixes. Ended up deleting the image, redeploying CNI, and then deleting and recreating pods.  See routes like this now:

2001:2::6d47:e62d:8139:d1e9 dev calicc4563e7a35 metric 1024
blackhole 2001:2::6d47:e62d:8139:d1c0/122 dev lo proto bird metric 1024 error -22
2001:2::8f3a:d659:6d15:1880/122 via 2001:2::79 dev br_api proto bird metric 1024
2001:2::a8ed:126:57ef:8680/122 via 2001:2::78 dev br_api proto bird metric 1024


Where br_api is my main interface (a bridge for a bonded interface). I’m able to ping from host to pod and pod to pod across hosts.

Note: this was not working for one of the pods, and the packet was not getting past the cali interface on that pod. I checked and on that node, forwarding was disabled (not sure why). I did the following, and now pings work:

sysctl net.ipv6.conf.all.forwarding=1


Not sure how to persist this (don’t see it in /etc/sysctl.conf  or /etc/sysctl.d/* on any system).

Another curious thing. When I was checking tcpdump to trace the ICMP packets, I was seeing these type messages:

13:42:15.649100 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 51) devstack-77.56087 > 2001:2::79.bgp: Flags [P.], cksum 0x412f (incorrect -> 0x382c), seq 342:361, ack 343, win 242, options [nop,nop,TS val 63676370 ecr 63075039], length 19: BGP, length: 19
 Keepalive Message (4), length: 19
13:42:15.649199 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 32) 2001:2::79.bgp > devstack-77.56087: Flags [.], cksum 0xa391 (correct), seq 343, ack 361, win 240, options [nop,nop,TS val 63114134 ecr 63676370], length 0


Wondering why the (BGP?) packet from devstack-77 system has an incorrect checksum, but don’t see that in response. I see the same thing on other nodes, again, only with responses from devstack-77:

13:44:08.682811 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 51) 2001:2::78.bgp > devstack-71.33001: Flags [P.], cksum 0xfef8 (correct), seq 343:362, ack 342, win 240, options [nop,nop,TS val 63182410 ecr 63183301], length 19: BGP, length: 19
 Keepalive Message (4), length: 19
13:44:08.682864 IP6 (class 0xc0, hlim 64, next-header TCP (6) payload length: 32) devstack-71.33001 > 2001:2::78.bgp: Flags [.], cksum 0x411d (incorrect -> 0x5ab3), seq 342, ack 362, win 242, options [nop,nop,TS val 63226403 ecr 63182410], length 0


In any case, it looks like IPv6 communication is working! For reference, here is the calico.yaml file used: calico.yaml





Category: Kubernetes | Comments Off on IPv6 Multi-node On Bare-Metal