Major Viewmaster Refactoring

V3 on 05-15-2025

With the Viewmaster movie library, I had noticed that there are over 600 titles, and when displaying in “detail” mode, it was loading all those cover URLs. The thought was that maybe I could save the cover images locally, when movies are entered into the database, and I could load the existing ones via a migration.

I created a new table for the cover URLs and files, and realized that there was quite a bit of duplication in the database, and thought that I should try to normalize the data. Through the process of making new tables and trying to migrate, I uncovered several other issues.

Long story short, I ended up undoing everything, and then designing a new table with migration to address the issues and to try to minimize duplicate information in the database. In summary, these are the concerns/issues I found out in the process:

IMDB information was duplicated from both TV movie series, multi-disk sets that happened to have the same IMDB info, and single movies where I had multiple formats (DVD, BR, 4K). I wanted to separate out the “shared” IMDB info to reduce duplication.
For the TV series and multi-disk sets, the recorded duration in IMDB was wrong, w.r.t. each disk (e.g. duration, release date), and often the titles needed to be different (indicating the season number or specific disk). This meant that some IMDB info needed to be “overridden”.
I found a few movies where the IMDB did not have a cover URL, so I needed to create a dummy cover that could be used. Likewise, I hit one specialty disk, where there was no IMDB entry at all.
The IMDB would provide multiple genres for most movies, and my database was setup for just having one. In addition, some of the genres were spelled differently (e.g. SCIENCE FICTION vs SCI-FI, WAR vs MILITARY) and some were not in my lists of genres (e.g. SPORTS). Regardless of what I chose, I would like to know what choices were recommended by IMDB.
Since IMDB was shared among movies, I needed to remove the database entry and the cover file, when the IMDB info was no longer used.
By the end of the process, I did notice a few problems with performance, so I’ll have some follow-up work to try to fix that.

The Plan

Since there were lots of changes, I planned on bumping the version from 0.2.3 to 0.3.0. Here is the overall pan of attack that I had.

Define a new table (ImdbInfo) that contained just the IMDB information that I did not expect to change, even on shared disks. The IMDB number, plot, actors, directors, cover URL and file.
Because movies could override some IMDB info, I wanted to have the “original” stored in the new table, but keep modifiable copies in the Movie table. This includes duration, release date, rating, and title.
The Movie table would still have disk specific info, like format, aspect ratio, audio, cost, good/bad indication, collection name (if applicable), and paid/gift indication.
The Movie table would still have the selected genre (I had called it ‘category’), but the ImdbInfo table would contain the recommended genres for the IMDB title.
As part of the migration process to add the new table, I would also download and save any cover files to the Persistent Volume used for static files.

You can clone the v0.3.0 code from GitHub to see all the changes from v0.2.3, and latest content.

Table Definition

Here is the new table definition for the IMDB information:

class ImdbInfo(models.Model):
   """IMDB information for a movie (or series of movies)."""

    title_name = models.CharField(
        max_length=60, help_text="Up to 60 characters for title. May be overridden."
    )
    release_date = models.IntegerField(
        help_text="Four digit year of release. May be overridden."
    )
    genres = models.CharField(help_text="List of genres applicable to the movie.")
    mpaa_rating = models.CharField(
        max_length=5,
        default="?",
        choices=RATING_CHOICES,
        help_text="Select the MPAA rating. May be overridden.",
    )
    run_time = models.TimeField(
        help_text="Duration in hh:mm format. May be overridden."
    )
    # These will be common to every movie with this IMDB #
    identifier = models.CharField(
        max_length=20, unique=True, help_text="IMDB movie ID."
    )
    plot = models.CharField(blank=True, default="", help_text="Plot summary.")
    actors = models.CharField(blank=True, default="", help_text="Top cast.")
    directors = models.CharField(blank=True, default="", help_text="Director(s).")
    cover_url = models.URLField(
        blank=True, default="", help_text="URL where poster image is located."
    )
    cover_file = models.ImageField(
        blank=True,
        null=True,
        upload_to="covers",
        storage=FileSystemStorage(allow_overwrite=True),
    )

Some important things to note here. First, the fields that will remain in the Movie table, use different names. This is because multiple forms will be used at movie creation/edit and we need to be able to uniquely address the fields (there is no discriminator based on the form. The sizes and types of the fields are the same.

Second, for the cover file, I selected file system storage to allow overwriting. Otherwise, if you save a cover file with the same name, it will create a new file with a suffix. I saw that when first dealing with multiple disk sets, before I resolved how to ensure only one IMDB for them.

Migration

In the several failed attempts at doing this change, I realized that there were several movie genre types that I needed to translated to what I had, and I wanted to add the “SPORTS” genre. Before creating the new table definition, I modified the CATEGORY_CHOICES dict and then ran ‘python manage.py makemigrations’ command to create the migration file to update the table. I ran “migrate” to affect the change to the database (no change to existing entries).

With that out of the way, I then build the needed new table definition and ran “makemigrations” again to create a file to add that table, and alter the Movie table to have a ForeignKey to the new ImdbInfo table. Then, I created custom code in the migration file, to do data migration. This was added as another command, after the table changes were done:

 migrations.RunPython(copy_imdb_info, migrations.RunPython.noop),

On a “migrate” it would run the code in copy_imdb_info(), and in a reverse migration, it wouldn’t do anything, and would proceed to remove the new table and the added ForeignKey in the Movie table.

For copy_imdb_info(), the code cycles through all the entries in the current Movie table.

For each movie, the IMDB ID would be identified, and if this was the first time encountered, a new ImdbInfo entry created. All the data from the entry would be obtained via API request (versus copying from the Movie, which may have customized info). Existing extraction functions were used to obtain the info, and the cover file would be stored locally. Lastly, the ID of the new or existing ImdbInfo entry would be stored in the Movie table’s new ForeignKey field, imdb_info.

For the movie genres, I had updated the list of genres, based on some new ones that I thought I’d want. Then, for the list of genres provided by IMDB (a comma separated string), I used a new filter_genres() function to extract each one, convert them to ones that matched the list that I had (renaming/substituting as needed), and then would re-build a comma separated string of the genres for storage in the new table.

Other Changes

With the database migrated (and we still have the movies with original data), I proceeded to change the code to use the new tables.

For the movie listing, the “show details” option now would show some information from the Movie table and some from the ImdbInfo table. I created a custom template tag that would return the URI for the cover. This would use one of the following, in order… the cover file, if available, the cover URL, if available, a static image indicating there was no cover, if the cover was not valid, or a static image indicating there was no IMDB info for the movie. It would display a red line around the cover, if it was using the cover via a URL and not local file.

For create/edit of movies, we would now have two forms. One for the movie, and one for the IMDB info. Several of the “shared” fields of the IMDB info, would be hidden fields, so that they are passed along.

On “GET” portion of create/edit, if there was an IMDB ID selected (via find, for create, or from the movie for edit), we would query the IMDB and populate the ImdbInfo form with the data. For the genres, if there is IMDB info, we build a list of those entries, followed by all the rest of the possible genres, separated by a dashed line. If there is no IMDB info, the full list of possible genres is displayed. If this was an edit of a movie, the previously selected value would be the default genre.

On “POST” portion of create/edit, there are four phases. First, we the get the movie form, movie identifier (or “0” if new). Second, we check to see if the user has requested to “clear” the IMDB info, otherwise we try to get the IMDB identifer (or “unknown” if there is none). Third, we get the ImdbInfo form data (could be an existing entry, new entry, or none).

Lastly, we validate and save. There are several cases that can occur, and different actions required:

Case	Action
There is no IMDB identifier specified.	If there was an existing movie, with IMDB info, note the entry
New IMDB info and info is valid	Save new ImdbInfo entry.
New IMDB info and info is invalid	Show form again, along with field errors
Existing IMDB info, but no cover file	Save the cover file locally
Existing IMDB info with cover file	No additional actions
Movie info is not valid	Show form again, along with field errors

The last step will be to save the movie, associating it with the corresponding IMDB info (or None), and then, if there was a previous IMDB info, check to see if it is referenced any more, and if not, delete the entry and remove the cover file.

In the GET processing and the form template, any differences in the shared IMDB settings that a movie may have overridden, are noted and displayed in the form, so that one can see when a movie has modified what was in the IMDB info. The field will have a red box border, to show that it was overridden. This applies to the release date, rating, and duration. The title is not highlighted.

There was an issue with displaying movies by genre/date/title, and using case insensitive filtering, so they were corrected as well.

One thing to keep in mind is that, when switching between running in development mode and running in production mode, we are sharing the same database, so it is up to date. However, when creating cover files, they are stored in the PVC for production and on my development machine for development and can be out of sync.

I did the migration on my development machine, and did most testing there, and then copied the cover files to the running pod. For example, with a current app pod viewmaster-579bd8d869-ph4nq, I would do:

kubectl cp public/media/covers/* \
    viewmaster/viewmaster-579bd8d869-ph4nq:/vol/web/media/covers/

The “viewmaster/” prefix on the pod name is the namespace.

Deployment Changes

During this process, I was having problems serving up the cover files, which were stored in the MEDIA_ROOT (an area in the project tree for development, and in /vol/web/ for the production pod).

I had done some AI queries, and found out a few things. First, I had MEDIA_URL set to “media/” and it should have been “/media/” so that it is an absolute path. There was the same issue with the current STATIC_URL.

Second, when running in production (debug=False), Django does have the static clause for URL patterns that can be used for mapping URIs to the media area. It also sounded like Django and gunicorn (used for production pod), are not very good at serving up static files and that one should use NGINX (or equivalent), which is better suited.

I used the AI prompting to guide me through how to modify my deployment so that the pod would now have both my Django container AND an NGINX container. Requests coming in, for content, would get dispatched to gunicorn and the Django app, and requests for static/media files would be handled directly by NGINX.

This required modifying the deployment YAML to include NGINX and reference the same PV for config and static files, change the service to use port 80 and redirect to port 8080 for gunicorn, and providing a NGINX config YAML. I ended up breaking up django.yaml into parts, of which, to update my deployment, I would now run:

kubectl apply -f nginx-config.yaml

kubectl apply -f deployment.yaml

kubectl apply -f service.yaml

This is assuming that the “viewmaster” namespace was already created and I used it to modify the deployment. Obviously, I had rebuilt the app, with version changed to 0.3.0 (and updated the hard-coded version in movie-list.html), and pushed up the change.

A side (benefit) was that the app is accessible in production by the LoadBalancer IP, without using a port.

There was a pvc.yaml created as well, since the PVC creation was removed from the deployment.yaml. I already have the PVC, so it didn’t need to be used.

Of note, the deployment.yaml will do a workingDir command to change directory, and then invoke gunicorn with the desired arguments. The Dockerfile still does that same operation, for the case when we are in development mode and running locally (using Django’s server).

In urls.py, the following was added to the end of urlpatterns…

] + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

This allows the media area to be available, when running in development mode with debug=True.

I also tweaked the helper scripts as I needed to specify to use the “app” container, and not the “nginx” container.

Testing

I’ve been able to verify this working both in development and with production. Making sure that cover files are generated on create (or edit, if missing), and deleted when no longer used. In both the list and create/edit forms, I made sure that a red border was drawn around covers that were using the URL and not the file.

Follow-Up Changes

I did a few more changes and tagged v0.3.1. For the create/edit form, if there is a cover file, it is used, instead of always using a URL there. Like the movie list, when a cover URL is used, a red border is drawn around the cover. The create/edit also will display the “no IMDB” and “no image” covers for movies that do not have IMDB info, and movies that do not have a valid cover URL, respectively.

For clean-up, I removed any references to the fields in the Movie model that are now only used in the ImdbInfo model. Then, another migration (0011) was added, that would remove the plot, actors, directors, cover_ref, and movie_id fields from the Movie model. A reverse migration was include to restore the contents of these fields, when going back to version 0010. I tested migrating to 0011 and verified that the app worked, and then reverse migrated to version 0010 and made sure the fields were repopulated in the Movie table.

Are We Done Yet?

Nope! Ever since I did this split of tables and using cover files, the movie listing page was loading VERY slowly, taking 4-6 seconds. I tried some performance mods, like lazy loading of images, setting up NGINX to cache static and media files, but it still was better, but still horrible. I temporarily altered the code that checked for cover file, and just unconditionally used the cover URL, and there was no change.

Using the Chrome Lighthouse performance measuring, I finally realized that it is not the cover images being served locally (or the use of CDN for bootstrap CSS and JS), but the actual page processing. I saw these stats:

Duration	4.4 s
Queuing and connecting	1.42 ms
Request sent and waiting	4.36 s
Content downloading	31.36 ms
Waiting on main thread	48.17 ms

AI suggested to look at SQL queries, view and template profiling, to see why the server took so long to respond,a nd recommended to install Django Debug Toolbar, and do some profiling to see what was up. I did that, and found that there was a HUGE amount of time doing SQL queries and there were some repeatedly done, during the movie listing.

I looked at the queries done, along with their reference to the code location, and, after doing queries for the count of movies, cost per format, and count of paid movies, it did a query for all movies, ordered by title, as was selected. But, then I was seeing queries for ImdbInfo entries, by ID, and that was repeated by the number of movies (659 times), and was done numerous times. It looked like this was in the template code for the movie listing.

I asked AI for what I suspected, and sure enough, in the view it does a query of movies for the form, for example:

movies = Movie.objects
if mode == "alpha":
    movies = movies.order_by(Lower("title"))
elif mode == "cat_alpha":

And then in the template, I use:

<div class="col-lg">{{ movie.rating }}</div>
<div class="col-lg-5">STARS: {% actors movie.imdb_info %}</div>

Where the actors custom template tag will just reference the field, or provide a default value. These references of imdb_info field will cause addition (N+1) queries are being done.

The solution is to use select_related() in the query, so I now have:

movies = Movie.objects.select_related('imdb_info')

Now, instead of having 665 queries in 6909.86 mS, we have 6 queries in 177.14 mS! Working well now.

To summarize the changes made (tagged at v0.3.2) are:

Fixed database query, when using new ImdbInfo table
Added NGINX config to improve performance of static and media file access
Added Django Debug Toolbox for development analysis.
Providing bootstrap CSS & JS 5.3.0 locally for performance.
Corrected Javascript handling of select pull-down to work in Chrome.

Whew!

Future?

Here are the things I may want to work on…

Turn on HTTPS, certs, and add OTP to be able to access remotely? (Could setup HTTP/2 as well)
Consider pagination for the movie listing, for better responsiveness. Though not bad currently.
Use some of these performance improvement techniques, Django Debug Toolbox, and GitLeaks (tool to ensure confidential info is not stored in Git) for other web apps that I have made.

Category: Kubernetes, Raspberry PI | Comments Off

March 31

Wireguard in Kubernetes – V2

One of the first things I added to Kubernetes was OpenVPN, so that I can use devices away from home in a secure manner. It worked well, but had some shortcomings…

First, it resulted in a slower connection. Granted, use of this meant that there were two passes across my home Internet connection (device to home via tunnel, home to destination, destination to home, home to device via tunnel), it still wasn’t too fast.

Second, when using OpenVPN, I couldn’t (or didn’t know how to) make use of the PI-Hole advertisement blocker running in my cluster. That meant even more content being processed, when using OPenVPN.

Third, it was a little tedious setting up clients. Creating client configurations on the OPenVPN pod, extracting them off, transferring them to devices, like phones, so I could load them into the OpenVPN client.

Fourth, there were some places, where I could not seem to establish a connection via the tunnel, using OpenVPN. I never really figured out why, as I was usually using my phone, versus laptop, so I couldn’t really diagnose what the issue was.

Enter Wireguard

Wireguard is touted as a faster, newer, more secure (latest cryptographic algorithms), and more performant than OpenVPN. I was anxious to get it running under Kubernetes. Through the process, I did learn a few things…

Speed! Running https://www.speedtest.net/ from my laptop at home, I would see these numbers:

Method	Download	Upload
Normal Wi-Fi connection	513.81 Mbps	41.68 Mbps
Proton (Free) VPN	278.05 Mbps	32.53 Mbps
OpenVPN at home	11.77 Mbps	9.94 Mbps
Wireguard at home	77.31 Mbps	40.72 Mbps

Obviously, Proton is faster as there is only one pass across my Internet connection, whereas OpenVPN and Wireguard require two passes across my connection. Wireguard is 4-6x faster that OpenVPN.

As part of the configuration with Wireguard, you specify the IP address of the DNS server to be used. In my case, I provided the LoadBalancer IP of my Pi-Hole server in my Kubernetes cluster, so that I had advertisement blocking available. This is really great addition.

Client setup with Wireguard involves a similar process of creating a configuration file, and getting it onto the client device, but it is a bit easier, as you can create keys and config files on your development machine. You still need to place a server config file on the Wireguard pod, but it is an easier process. I made scripts to simplify the process more. There is one nice feature with Wireguard, where you can convert the configuration file to a QR code, and then scan it from the device. This worked very well for phones and was easy to do.

One thing that OpenVPN had that I liked, was the display of connection information, along with a graph of the current transfer rate:

Wireguard just shows the cumulative packet received/sent count. Not a big deal, but was kind of nice.

I used information from https://blog.jamesclonk.io/posts/wireguard-on-kubernetes/, which had instructions on setting up Wireguard under Kubernetes and using Ad-Blocker to provide advertisement blocking. I created several scripts to help automate the process.

Prerequisites

To get Wireguard working, you’ll need the following…

A Kubernetes cluster running (obviously).
PI-Hole server running, if you want advertisement blocking.
Poetry installed on your development machine.
Python installed (I’m using 3.13, currently).
Install Wireguard clients on the devices you intend to use.
A registered domain name, so that your Wireguard server can be accessed remotely. If you had OpenVPN running, you’d already have this.
On your router, you need to forward packets for port 51820 to the LoadBalancer IP that you setup for the Wireguard service.

On your development machine, clone my Github repo…

cd ~/workspace/kubernetes
git clone https://github.com/pmichali/wireguard.git
cd wireguard

Run “poetry install” and then “poetry shell” to setup the needed packages. We want wireguard-tools installed on the development machine, so that public/private keys can be created. On MacOS, I invoked “brew install wireguard-tools”.

General Configuration

We’ll use a wireguard.ini file to hold some of the general configuration settings. There is a wireguard.ini.sample that you can copy and then fill out the values. The file has:

[DEFAULT]
subnet = #.#.#
external_ip = A_DOMAIN_NAME_OR_IP
dns = LOADBALANCER_IP_OF_PIHOLE_SERVICE
clients = NAMES,OF,CLIENT,DEVICES

We’re assuming that the tunnel will be a class C network, so the first three octets of the IP address will be provided here. For example, “10.10.10”. For each device, we will specify the last octet of the IP address, when we are doing device configurations.

Your registered domain name would be provided for the external_ip value. This allows devices to find your Wireguard service, when out on the Internet.

The IP of the DNS server to use, is specified for the “dns” value. You can use a public DNS, but to get the benefit of advertiser blocking, I used the LoadBalancer IP of my PI-Hole’s service.

Finally, provide a list of (arbitrary) client names that you want to use. No spaces, after the commas.

Create Device Keys

Public and private keys are needed for the server and each of the clients. There is a python script that will use the wg command from wireguard-tools to create the keys. You’ll run it for each device, using the syntax:

python create-device.py NAME INDEX#

Where NAME is an arbitrary name for the device (except “server” is used for the server keys). For the clients, these unique names will be what you listed under the clients key in the wireguard.ini file above.

The INDEX# is an octet used for the device’s tunnel IP address. This will be 1 for the server, and 2-254 for clients. It gets appended to the “subnet” specified in the wireguard.ini to form the IP for each device.

This script will create a NAME.ini file in the data sub-directory (created, if needed). The file will contain the generated public and private keys, and the INDEX#.

So, at a minimum, this is run for the server, and then again for each client:

python create-device.py server 1
python create-device.py foo 4
python create-device.py bar 5

Create Configurations and QR Codes

Now that we have generated all the keys for the server and clients, and have identified the final IP address octet for each of the devices, the server configuration file, client configuration files, and the QR images that represent the client configuration files can be created.

Run the following command to create everything:

python build.py

This will create a data/wireguard-secrets.yaml file that is used to create a Kubernetes secret holding the server configuration. It has the server private key and IP address, IPTABLES rules for the tunnel, all the client public keys and IP addresses, and sets the tunnel MTU. I found that, by default, the tunnel MTU was 1400 and that was causing a problem with my MacOS laptop client. It is set to 1420 to resolve the problem (see notes later on how this was determined).

The server.tmpl file provides a template for the YAML file. You can customize the configuration, if needed.

For each client, it will create a data/client-NAME.conf file with the client private key, client tunnel IP address, DNS IP (from the wireguard.ini), server public key, domain name (or IP) and port used to access the Wireguard server from the Internet, and allowed IPs (0.0.0.0/0 and ::0, which specify that all traffic will use the tunnel).

You could specify the IPs used for the tunnel, if you wanted to have some traffic use the tunnel and some not use the tunnel.

The client.tmpl file is used to control the content in each of the client configuration files. You can customize the configuration, if needed.

The last step of this script is to use the generated client configuration files to create a PNG file with the QR code containing that configuration. You can either, load the .conf file into the client Wireguard app, or have it read the QR code.

Start Up The Wireguard Server

Now that we have the secret containing the server config, we can start the Wireguard server. Apply the secret YAML, and then the deployment:

kubectl apply -f data/wireguard-secrets.yanl
kubectl apply -f wireguard.yaml

You should see a deployment, service, and pod running. There is a exec-into-pod script that can be invoked to access the wireguard pod. From there, you can run “wg show” to see the list of peers and connection statuses, or “wg showconf wg0” to see the configuration settings. The show command has several options available. Use “-h” to see what can be performed.

Connect!

With a client, load its configuration file, or scan the QR code. Then, activate the link, and confirm that all traffic is now going through the tunnel. You can look at the server’s “wg show” output, use Wireshark, etc. Be sure to try going to several web sites to ensure that the MTU is correct for your client.

Final Notes

MTU

When I was trying my Mac laptop, most HTTPS websites would not load. It would hang at the authorization stage. That is when I found there was an issue with the MTU. I found a link on optimizing MTU that said to calculate the right MTU, do pings w/o fragmentation, specifying a size. When you determine the maximum size that works, add 28 and that is the tunnel MTU to use.

For MacOS, for example, I found that this command, with a size 1392, was the highest I could go:

ping -D -s 1392 cisco.com

Adding 28, gives 1420, so that is the MTU I used. The default of 1400, was causing problems.

Adding A Client

If you need to add a client, do these steps:

Run the create-device.py script to generate the keys.
Add the client name to the list of clients in the wireguard.ini file.
Re-run the build.py script to update the secret with server config and to create the client config.
Apply the data/wireguard-secrets.yaml file so the new server config is stored.
Run the delete-pod helper script to cause a new pod to be created with the updated server configuration.
Load the data/client-NAME.conf or read the data/NAME.png QR file in the Wireguard client.

Deleting A Client

Deleting a config would be similar, removing the client name from wireguard.ini, building a new secret, applying the YAML, and deleting the pod to load the new config.

Category: Kubernetes, Raspberry PI | Comments Off

March 6

Viewmaster Updates

I finally had a chance to improve the Viewmaster app, adding the following features (version v0.2.1, unless marked):

Movies include (optional) plot, actor(s), director(s), and poster with the help of IMDB info.
Persisting the display mode, whether details are seen, and whether Laser Discs are shown.
Improved the display of movie information (and show more info, when looking at detailed info).
Command line utility to update movies quickly, with IMDB information.
Updated to newer Poetry version for dependency management.
Local Development of Django app.
v0.2.2: Ability to manually enter IMDB ID during movie lookup for adding/editing movies.
v0.2.2: Python script to create bash script for local development use.
v0.2.2: Miscellaneous tweaks.
v0.2.3: Use a pull-down for display mode instead of buttons.
v0.2.3: Allow search by title, actors, directors, or plot.

Details On Changes

Here are the details of all the things that were changed in the app.

IMDB Info

By far, the biggest change was using the OMDB.com API to obtain info on the move. You can search by full/partial title, and get a list of media (movies, games, TV series, etc). Then, using the movie ID provided, you can get plot, actor(s), director(s), release date, genres, MPAA rating, duration, URL link to cover (poster), review ratings, and more.

The database was modified to add additional, optional, fields for the plot, actor(s) director(s), movie ID, and poster URL link.

When adding a new movie, you will first see a form to enter a partial title for the movie. A search is done, and the candidate movies are shown. Clicking on an entry will select and load the IMDB details into the form for completing the addition. The If there is no match in IMDB, or the choices are not correct, you can click the “Skip” button to continue without any IMDB info filled out.

With IMDB info, besides plot, actors, and directors, the release date, MPAA rating, and duration will be filled out. For the genre selection, the pull down list will show the entries that the IMDB had first, and then all the rest of the available genres.

When editing a movie, if there already is IMDB info, it will go directly to the edit form. If there is no IMDB info, the title will be used to do a search and the candidate movies will be shown. Like adding, you can select one, or click “Skip” to continue without IMDB info.

If the release year, rating, or duration in the IMDB database differ from what you had already, the values will be shown in RED (you can look at the log to see what the differences are). In v0.2.2, the old values are shown below the entry field, so that you can see the change. You can keep the change, or enter the original value into the field. For the genre selection, the pull down list will show the entries that the IMDB had first, and then all the rest, with your existing genre selection as the chosen one.

v0.2.2: When viewing the results of an IMDB lookup on movie title during add/edit operations, besides selecting one of the candidates, you can enter the IMDB ID (available in the URL, when looking at movie info at https://www.imdb.com/). This ID starts with the letters “tt” and then has a number. Once, selected, the add/edit movie form will be populated with that information.

Persisting Display Info

Now, when a display mode is selected (e.g. alpha, genre, date,…), this setting will persist, even after doing adds, edits, or searching.

Likewise, if you select to show Laser Disc info, and/or details for movies, this setting will persist, until you change the setting and select a mode.

Improved Movie List

For the list of movies, when you hover over a movie, it will be highlighted, and you can click anywhere on the row to select the item.

The summary view is similar to what was there before, but the detail view now has the cover displayed along with multi-line output showing the plot, actors, and directors info.

Importing IMDB Info

It’s a slow process editing all the movies to select IMDB info to be imported. So, a command line utility was created that will go through each movie that does not have IMDB info, show the movie title, release date, rating and duration, and then a list of candidates.

You can skip the candidates, or select one by number, for which it will show you the title, release date, rating, duration, IMDB ID, plot, actors, and directors. For the release date, rating, and duration it provides an indication of whether the info matches (and if not, what was different). You can confirm the choice or go back to select another candidate.

On each prompt, the default choice is shown in square brackets. For candidate selection, that is usually “0” (skip), and for the confirmation it is “S” (save). You can exit the process at several places, and each time this tool is run, it will continue with any movies missing IMDB info.

Poetry Update

As some time has passed, Poetry has gone through updates, and some made significant changes to the config files used, so I changed things up, and used this time to update versions of packages used.

Local Development

The whole point of the Viewmaster project work I initially did, was to port it from a Docker container to run under Kubernetes. However, during development work, it is nice to just tweak the app and be able to locally see the effects on your development machine, rather than re-building, pushing, restarting the pod.

I decided that I would have the locally run app use the “production” database, rather than creating a Postgres server on my Mac to provide a test database. Riskier, but I also setup a script to make easy backups of the database.

Miscellaneous Tweaks

For v0.2.2, added a “show-pod-log” script to allow easy viewing of the Viewmaster pod log in Kubernetes. There already are scripts to exec into the app and database pods. As mentioned below, there is a Python script that will create the dev-setup.bash script used for local development.

Display Mode

For v0.2.3, the display mode is a single pull-down menu, instead of a set of buttons (that are mutually exclusive). You can also change the check boxes, and then re-select the same mode, and the display will be refreshed.

Enhanced Search

For v0.2.3, a select pull-down was added to the search field so that the search can be relative to the title, actors, directors, or plot. When the search input is entered or magnifying glass clicked, the display will show the (case-insensitive) results.

How To Use New Version

I (finally) created a tag on the Github repo, called V0.1.2 for the original version, and then created a new tag v0.2.1 for the latest version (I should have done more tags along the way). In any case, this section will describe how update the existing app under Kubernetes, how to run the app locally, and how to setup to use the command line IMDB importing tool.

Backup First!

Before starting this venture, it’s a good idea to backup the database for Viewmaster. The new version has a script to help access the database pod, and you can then run the backup command, provide the password, exit, and then move the database export to your computer. Here are the steps:

cd ~/workspace/kubernetes/viewmaster
./exec-into-db-pod
pg_dump -U viewmasterer -W -F t viewmasterdb > viewmasterdb.tar
[enter password]
exit

./move-backup

This will rename the file with the date/time, so you have a copy of the database, in case things go awry.

Updating to v0.2.3

This is assuming you have an installation of the original Viewmaster app running. If you’re starting from scratch, you need to follow the instructions from the original article, and make a few changes, as described here, to work with the latest. Additional values for secrets, using the newer version are most of the differences.

The first step to update is to move to the GIT repo for the app (cloned from https://github.com/pmichali/viewmaster.git) and pull the latest code by checking out the tag v0.2.3.

You should already have a deploy/viewmaster-secrets.env file from the previous version. Use deploy/viewmaster-secrets.env.template to see the new fields. Currently, that is just the OMDB_API_KEY. To set that value, go to https://www.omdbapi.com/ and click on the API tab. Enter your email address and a free API key will be emailed to you. This allows 1,000 requests per day. If you want, you can become a patron member for unlimited access. Place this value in the viewmaster-secrets.env file. You need to replace the existing secret with this new one:

cd ~/workspace/kubernetes/viewmaster/deploy
kubectl delete secret -n viewmaster viewmaster-secrets

kubectl create secret generic viewmaster-secrets -n viewmaster \
        --from-env-file=viewmaster-secrets.env

Next, you need to build and push this version of the app for Kubernetes (Note: on a Mac, you need to have Docker desktop running):

cd ~/workspace/kubernetes/viewmaster
docker buildx build . -t YOUR_DOCKER_ID/viewmaster-app:v0.2.3
docker push YOUR_DOCKER_ID/viewmaster-app:v0.2.3

Deploy the app again, so that it deletes the old pod, downloads the v0.2.3 pod, and runs it:

kubectl apply -d deploy/django.yaml

Check to make sure the pod comes up and check the log to make sure that Django collected the static files, and that the database migrations occurred to create all the new fields. You can use the show-pod-log script to check the log.

From the web site, you should see the new version number at the top of the movie list page, and a different look. Verify that you can add movies, edit movies, persist mode settings, and that there are searches (for add) and lookups (for edit) of movies in the IMDB.

Running Locally

Instead of going through long build/push/delete cycles to modify the app, it is nice to be able to run Django locally on your computer. Note: that I chose to use the production database, so I have live data to use. One could choose to spin up a Postgres server and have a dummy database, if desired. I’ll leave that as an exercise for the reader.

The Postgres database running in the kubernetes cluster does not have an externally visible IP and the viewmaster-postgres.viewmaster.svc.cluster.local domain name is not known from my laptop. To remedy this, we must expose an IP for the database pod. This can be done with:

kubectl expose service viewmaster-postgres -n viewmaster --port=5432 --target-port=5432 --name=viewmaster-postgres-dev --type=LoadBalancer

Do a “kubectl get svc -n viewmaster viewmaster-postgres-dev” and note the external IP for the service. Then, from the top of the tree, run:

python3 make-dev-setup-script.py EXTERNAL-IP-FOR-DBASE

This will create the file movie_library/dev-setup.bash, which can be sourced to setup an environment like what the Django app would see in a Kubernetes pod. Note: it does some special tweaks to the SECRET_KEY and POSTGRES_PASSWORD so that they can be used in bash export commands.

The Django server can now be run locally:

cd ~/workspace/kubernetes/viewmaster/movie_library
source dev-setup.bash
python3 manage.py runserver

You should be able to access the app from your browser at http://127.0.0.1:8000. You can also do “collectstatic”, “makemigrations”, “migrate”, “dbshell”, and other commands.

Using the IMDB importer

I created a command line app that will search the viewmaster database, collecting movies without an IMDB ID, and show a count of movies that need processing. For each movie, the IMDB database is searched with the title, and matching candidates are shown, allowing one to be selected for updating the viewmaster info. When done (or exiting) it will tell you how many movies have been processed.

To run the script, the same dev-setup.bash script must be sourced, so that environment variables are obtained, and then the python script can be run:

cd ~/workspace/kubernetes/viewmaster/movie_library
source dev-setup.bash
python3 viewmaster/imdb_import.py

Here is some sample execution:

Have 413 movies to process

Movie: 'Fallen ' (175)
Release: 1998 Rating: R Duration: 02:05:00


0) SKIP SELECTION FOR THIS MOVIE

Enter choice 0,x [0]:
Skipping movie

Movie: 'Fury' (200)
Release: 2014 Rating: R Duration: 02:15:00


0) SKIP SELECTION FOR THIS MOVIE
1) 2015 Mad Max: Fury Road (movie)
2) 2014 Fury (movie)
3) 2023 Shazam! Fury of the Gods (movie)
4) 2015 Kung Fury (movie)
5) 2007 Balls of Fury (movie)
6) 1972 Fist of Fury (movie)
7) 2014 Cuban Fury (movie)
8) 1989 Blind Fury (movie)
9) 1978 The Fury (movie)
10) 1936 Fury (movie)

Enter choice 0-10,x [0]: 2

TITLE: Fury
RELEASED: 2014 ✅
RATING: R ✅
DURATION: 02:14 ❌ (database had 02:15)
IMDB ID: tt2713180
A grizzled tank commander makes tough decisions as he and his crew fight their way across Germany in April, 1945.
ACTORS: Brad Pitt, Shia LaBeouf, Logan Lerman
DIRECTORS: David Ayer

Enter choice Save, Back, Ignore, eXit[S]:

Saved updates to movie

On the first movie, “Fallen”, the viewmaster database had “Fallen ” and with the extra space, it could not find the movie. The solution there, is to use the the web version, edit the movie, and correct the movie title so that a subsequent edit will allow the lookup.

On the second movie, there were numerous matches to “Fury”. I picked the second one (it usually is the first one), and then selected the default value in square brackets (save). You can see that IMDB had a slightly different duration.

v0.2.2: With this version, in addition to the candidates being listed, there is an additional selection, that allows you to manually enter the IMDB ID. If done, it will then show details of that selection, allowing you to save, ignore, got back, or exit. Here is an example:

Movie: 'MGM Sampler Disk' (365)
Release: 1993 Rating: NR Duration: 00:00:00


0) SKIP SELECTION FOR THIS MOVIE
1) Manually enter movie ID

Enter choice 0-1,x [0]: 1
Enter IMDB ID: tt0211493

TITLE: MGM/UA Home Video Laserdisc Sampler
RELEASED: 1992 ❌ (database had 1993)
RATING: NR ✅
2025-03-08T09:35:43 [WARNING ] Unable to parse time string 'N/A'
DURATION: 00:00 ✅
IMDB ID: tt0211493
N/A
ACTORS: N/A
DIRECTORS: N/A

Things To Be Aware Of

If you already have the original app installed, you’ll need to delete and re-add the K8s secrets, as there are more values being stored.

If you have existing movies in the database, there can be some challenges finding them in IMDB. I had a few titles misspelled, so they could not be matched. I would typically check the title and edit the movie to match what is shown in IMDB, and then edit the movie again so that the lookup would succeed.

With free access to OMDB, you are limited to 1,000 requests a day. Not a problem for this app, where we just need to go through the movies once, to obtain the info.

There were quite a few cases where the duration in IMDB database, was slightly different than what I had entered for the movie (from the back cover of the movie box).

There were some funky titles, like “Alien 3”, which has a superscript three character (I cut and pasted it from a test run doing an add movie operation, searching for “Alien”.

I had one case, of an old movie, that was not in IMDB.

I had another case of a title that I could not find a match for that year, but when visiting https://www.imdb.com, I could find the movie version of interest. For v0.2.2, I used the manual ID entry to resolve.

Lastly, I hit one case where a six disk TV series resolved to a single IMDB entry. To be able to integrate the IMDB info I had to take these steps for each of the “season” disks:

Note the release year and duration.
Rename the title to just have the show name, and not any season info.
Edit again, so that the IMDB data can be found.
1. Change the title back to include the season indication.
2. Change the duration to what was there before (from the box cover).
3. Set the release date for that season
Save the movie.

In IMDB it had the release as 2010-2015, but the app can only extract a single date for a movie, so it would always show 2010. The duration was totally wrong in IMDB.

What’s Left To Do

There are a few more things that I’d like to do (maybe)…

Maybe save the movie cover (poster), instead of having to access the URL for image.
Decide if I want to enable remote access, and if so, maybe add OTP 2FA.
Show admin link, if user has admin privileges.

Category: Uncategorized | Comments Off

February 2

Rebuilding a Kubernetes Node

When I created my Kubernetes cluster, initially, I had (naively) partitioned the 1TB disk on each node into several areas for root, home, and logs. What I found out later, was that the log area wasn’t necessarily large enough, so I was running into some issues with disk space.

I decided that I would, at a later time, re-image these nodes. Well, that time is now. I have eight nodes, of which five have these multiple partition drives. Three are worker nodes and two are control plane nodes.

Prep Work

Before doing anything, I made backups of the Postgres databases I had in the cluster for my apps by exporting the databases. I have a script for each app that does this. For example:

cat > exec-init-db-pod <<'EOT'
kubectl exec -it -n viewmaster `kubectl get pod -n viewmaster -l tier=postgres | cut -f1 -d" " | tail -1` -- /bin/bash
EOT
chmod +x exec-init-db-pod

This gets me into the database pod for an app I have in the namespace ‘viewmaster’. I then create a backup:

pg_dump -U <DB_USERNAME> -W -F t <DB_NAME> > viewmasterdb.tar

Entering the database password, when prompted. I exit and then run this command from my Mac to pull down the backup:

cat > move-backup <<'EOT':
kubectl cp -n viewmaster `kubectl get pod -n viewmaster -l tier=postgres | cut -f1 -d" " | tail -1`:viewmasterdb.tar "viewmasterdb-${TIMESTAMP}.tar"
EOT
chmod +x move-backup

With Longhorn running in my cluster, I had also set it up to do periodic snapshots of the volumes, so hopefully, I’ve got everything I need, incase things go South (one never really knows, until something bad happens and the cluster has to be rebuilt).

Worker Nodes

Figuring I would tackle these first as they would be easier, I started with the node ‘cypher’, and then did ‘niobe’ and ‘mouse’ together, as most operations are using kubespray commands, and I can specify more than one node.

Node Removal

I read that one would typically cordon the node (to prevent pods from being scheduled on it), and then drain the node (to remove pods that are running on the node and have them run elsewhere). Kubespray has a playbook to remove a node, and it appears to do the draining, so with ‘cypher’, I did a ‘kubectl cordon cypher’, before running the playbook. With the other two nodes, I just ran the playbook and it was fine. Here are the steps I did…

Before removal, I checked into what pods (especially ones I created) were running on the nodes with:

kubectl get pods -A -o wide --field-selector spec.nodeName=cypher

To remove the node, I did:

export TARGET_NODE=cypher

cd ~/workspace/kubernetes/picluster
poetry shell
cd ../kubespray

ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 remove-node.yml -e node=${TARGET_NODE}

For the other two nodes, I set TARGET_NODE=”niobi,mouse”, before running the remove-node.yml playbook.

I ran “kubectl get nodes -o wide” to make sure that the nodes were removed from the cluster. I kept them in the inventory, as I was going to re-add the nodes, after re-imaging the drive.

Re-imaging The SSD Drive

Since I do not have a monitor near the cluster, I pulled out the nodes and brought them to the study (one at a time), connected a keyboard, mouse, monitor, ethernet cable, and power module. Here are the steps done for each node…

Holding the SHIFT key down, I powered on the RPI, and watched it enter the net boot mode. It downloads the image and then reboots with the RPI imager. I followed Part II in my cluster bring-up procedure to specify RPI4, select the 64-bit Ubuntu 24.04.1 server image, and choose the SSH drive for storage. I then chose to edit the custom settings to set the node name the same as it was before, and set the user and password to my username (and a simple password). I selected the America/New_York time zone, saved the changes, and then confirmed to image the whole drive.

Note that my router has a reserved DHCP entry with the desired IP for each node, based on the MAC of the ehternet interface, so the node will retain the same IP address.

When done, and the node has booted, I logged in from the console and set my SSH key using ‘ssh-keygen -t ed25519’. I then did a ‘ssh-copy-id <IP-OF-MY-MAC-HOST> and entered in my password to copy the key. On my Mac, I had to remove the known_host entry for the IP of this node. I did a “ssh-copy-id <IP-OF-THE-REIMAGED-NODE>” and used the password to copy the public key for the Mac to the node. I verified that I can SSH to the node and the node can SSH to my Mac.

Continuing with Part II of the process, I SSH’ed to the node, and replaced /etc/netboot/50-cloud-init.yaml with the static IP, DNS IPs, and search domain (use the template in Part II and replace the IP for the node). From the console of the node, I did a “sudo netplan apply”.

This is enough of a basic configuration, so that I can now use ansible playbooks to do the rest of the work. I did a “sudo shutdown -h 0” for the node, unplugged everything, and then reinstalled it back into the rack of my cluster.

Preparing The Nodes

Following Part IV of the Kubernetes cluster setup, I set the TARGET_NODE to “cypher” (and later to “niobi,mouse”), and while still under the same Poetry shell, did a ping check of the cluster to make sure I can communicate with the node being updated:

cd ~/workspace/kubernetes/picluster
poetry shell

ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/ping.yaml --private-key=~/.ssh/id_ed25519

With that working, I ran through each of the commands to set things up (entering the node password for the first command, when prompted):

ansible-playbook -i "${TARGET_HOST}," playbooks/passwordless_sudo.yaml -v --private-key=~/.ssh/id_ed25519 --ask-become-pass

ansible-playbook -i "${TARGET_HOST}," playbooks/ssh.yaml -v --private-key=~/.ssh/id_ed25519

ansible-playbook -i "${TARGET_HOST}," playbooks/hostname.yaml -v --private-key=~/.ssh/id_ed25519

ansible-playbook -i "${TARGET_HOST}," playbooks/os_update.yaml --extra-vars "inventory=all reboot_default=false proxy_env=[]" --private-key=~/.ssh/id_ed25519

ansible-playbook -i "${TARGET_HOST}," playbooks/tools.yaml -v --private-key=~/.ssh/id_ed25519

This sets up for password-less sudo, requires passphrase only SSH, sets the FQDN, updates the OS, and installs tools desired. Since the cluster uses kube-vip, we need to define the hostname for the kube-apiserver, so that when the node boots, it can contact the API server before coreDNS is running. It involves adding the following line to /etc/cloud/templates/hosts.debian.tmpl:

<LOAD-BALANCER-IP-FOR-API-SERVER> lb-apiserver.kubernetes.local

There are some more RPI setups to do (this is assuming you have setup ~/workspace/SKU_RM0004/ as specified in Part IV)…

ansible-playbook -i "${TARGET_HOST}," playbooks/uctronics.yaml -v --private-key=~/.ssh/id_ed25519

ansible-playbook -i "${TARGET_HOST}," playbooks/cgroups.yaml -v --private-key=~/.ssh/id_ed25519

ansible-playbook -i "${TARGET_HOST}," playbooks/iptables.yaml -v --private-key=~/.ssh/id_ed25519

This sets up the LCD display used on the UCTRONICS front panel, configures cgroups, load the overlay modules, setup iptables for bridged traffic, and will allow IPv4 forwarding.

You can check the UCTRONIC display, check the IP address for the node (ip addr), check the FQDN (hostname –fqdn), and check the kernel is what you expected (uname -a). We are ready to add the node back to the cluster…

Re-Adding The Node

Moving over to the kubespray area, we will run the cluster.yml playbook, which, since the new is still in the inventory, it will be added to the cluster:

cd ~/workspace/kubernetes/kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 cluster.yml

It takes a long time, but after completion, you can check that the node is present (kubectl get nodes -o wide), and check that all resources are ready/running (kubectl get all -A). The cluster should be good to go now.

Control Plane/ETCD Node

This is a bit trickier. First, if one of the control plane nodes is the FIRST entry in the inventory, you need to move it in the ordering, so that it is not the first entry. The file is ~/workspace/kubernetes/piclsuter/inventory/my-cluster/hosts.yaml in my case. From what I see them explaining in the Kubespray documentation it looks like the reordering is applied to the control plane, etcd, and nodes sections. Not sure if I have to do it in all three places, but I will, so as not to angry the Kubespray gods :).

This is the case for me today, as I want to re-image my nodes ‘apoc’ and lock’ and the former is first in the inventory list.

Second, there is supposed to be an odd number of etcd nodes. I have three etcd nodes, but two of them are nodes I need to re-image. It “looks” like I can temporarily run with an even number of nodes during the re-imaging process.

Third, the Kubespray example node configuration shows etcd running on the control plane nodes. I really don’t know if this is a requirement or just a simple convention used by Kubespray.I asked on Slack, and it sounds like etcd can run on any node, but it is not recommended to run on etcd on a worker node (if you do, the node should be tainted to not allow workloads scheduled on it).

The assumption here is that the control plane nodes are considered part of the cluster admin, and do not run workloads. If etcd is run on a worker node, and some workload is compromised, it can compromise the cluster.

So, it sounds like the practical options are A) run for a while with an even number of etcd nodes, B) place etcd on a worker node, but taint it to prevent scheduling of normal workloads, or C) run etcd cluster external to the kubernetes cluster.

I’m going to try option (A), and I think that Kubespray will do the right thing w.r.t. updating the etcd nodes, as needed.

The first step is to remove one of the control plane nodes, apoc:

cd ~/workspace/kubernetes/picluster
poetry shell
cd ../kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 remove-node.yml -e node="apoc"

I did “sudo shutdown -h 0” on the node, and moved it to the study, where I could follow the same steps as above and mentioned in Part II of my series, to re-image the node. After that, I again, followed the steps in Part IV to check that I can ping the node, and then provision the node to prepare for cluster addition. A quick rundown of the commands done:

cd ../picluster
export TARGET_HOST=apoc

ansible-playbook -i inventory/mycluster/hosts.yaml playbooks/ping.yaml --private-key=~/.ssh/id_ed25519

ansible-playbook -i "${TARGET_HOST}," playbooks/passwordless_sudo.yaml -v --private-key=~/.ssh/id_ed25519 --ask-become-pass
ansible-playbook -i "${TARGET_HOST}," playbooks/ssh.yaml -v --private-key=~/.ssh/id_ed25519
ansible-playbook -i "${TARGET_HOST}," playbooks/hostname.yaml -v --private-key=~/.ssh/id_ed25519
ansible-playbook -i "${TARGET_HOST}," playbooks/os_update.yaml --extra-vars "inventory=all reboot_default=false proxy_env=[]" --private-key=~/.ssh/id_ed25519
ansible-playbook -i "${TARGET_HOST}," playbooks/tools.yaml -v --private-key=~/.ssh/id_ed25519

On apoc, added this line to /etc/cloud/templates/hosts.debian.tmpl so that the API server’s IP can be resolved, before DNS is up and running on this node:

<KUBE_VIP_IP_FOR_API_SERVER> lb-apiserver.kubernetes.local

A few more RPI specific configurations…

ansible-playbook -i "${TARGET_HOST}," playbooks/uctronics.yaml -v --private-key=~/.ssh/id_ed25519
ansible-playbook -i "${TARGET_HOST}," playbooks/cgroups.yaml -v --private-key=~/.ssh/id_ed25519
ansible-playbook -i "${TARGET_HOST}," playbooks/iptables.yaml -v --private-key=~/.ssh/id_ed25519

With the node provisioned, it is time to try to add it back into the cluster. The Kubespray docs say that for a etcd node to add some additional arguments, so trying that:

cd ../kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 --limit=etcd,kube_control_plane -e ignore_assert_errors=yes cluster.yml

After a very long run time (just over 50 minutes, on my cluster), it completed successfully. Next, the docs suggest to run the upgrade playbook, again with extra args, to update the etcd cluster:

ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -v --private-key=~/.ssh/id_ed25519 --limit=etcd,kube_control_plane -e ignore_assert_errors=yes upgrade-cluster.yml

After this completes (almost an hour, on my cluster), they suggest to edit /etc/kubernetes/manifests/kube-apiserver.yaml on each control plane node to make sure that the –etc-servers API parameter has the correct IPs for all of the etcd nodes. I just checked the contents of mine, and all three etcd servers were listed.

I did a quick check to make sure all the pods and resources were ready/running in the cluster. There were two older coredns replica sets that were not running (but a current one running). I deleted them. There was one longhorn engine pod showing 0/1 READY with status Running, but a few minutes later, it was running. Looks like everything is fine.

I just need to do this for the other control-plane/etcd node, and I’m done. Note: It completed w/o any errors, but there were some issues with longhorn resources. A pod in crash loop, and several daemonsets and deployments not fully running. I deleted the pod, so it restarted. I still had some difficulty getting all of the daemonsets for the longhorn manager running. Had to delete several longhorn manager pods that were in the 1/2 READY state, and finally everything was running.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

January 16

Kubernetes The Harder Way Explorations

I’m trying to bring up a test Kubernetes cluster (just for learning), by using Kubernetes The Hard Way steps, with one big twist… I want to do this on my M2 MacBook (arm64 based).

From the GitHub page, this is what they say about the steps for doing this..

This tutorial requires four (4) ARM64 based virtual or physical machines connected to the same network. While ARM64 based machines are used for the tutorial, the lessons learned can be applied to other platforms.

Prerequisites (1)
Setting up the Jumpbox (2)
Provisioning Compute Resources (3)
Provisioning the CA and Generating TLS Certificates (4)
Generating Kubernetes Configuration Files for Authentication (5)
Generating the Data Encryption Config and Key (6)
Bootstrapping the etcd Cluster (7)
Bootstrapping the Kubernetes Control Plane (8)
Bootstrapping the Kubernetes Worker Nodes (9)
Configuring kubectl for Remote Access (10)
Provisioning Pod Network Routes (11)
Smoke Test (12)
Cleaning Up (13)

Try #1: Docker containers (failed)

Initially, I thought I would just use Docker on the Mac, to create the four nodes used for this. It started out pretty well, provisioning, creating certs, creating config files and copying, and starting etcd.

I even optimized the process a bit with:

Dockerfile to build nodes with the needed packages.
Script to generate SSH keys for all nodes.
Script to run container for each node with IP addresses, and defining /etc/hosts for all nodes with FQDNs and IPs.
Script to distribute SSH keys and known hosts info
Several scripts that just contain the commands needed for each step, and any ssh commands to move from node to node.

My first problem occurred when trying to bring up the control plane (step 8), starting up services. The issue is that my nodes (I tried debian:bookworm and ubuntu:noble bases) did NOT have systemd running. MY guess is that it was because the Docker containers are using the same kernel as my host, and that does not use systemd.

Initially, I tackled this by using a systemV init script template and then filled it out with info from the systemd .service files needed. I placed arguments in a /etc/default/SERVICE_NAME file and would source that and use a variable to add it to the script. Services were coming up for the control plane node, and it was looking good.

When I got to the next step (9), bringing up the first worker node. The systemd init script had two steps for starting the service:

ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd

This was the second problem. I found an old GitHub repo to convert systemd init scripts to systemV. Unfortunately, it was very old, and created for Python2. I ran py2to3 on it, to use with python3.13.1 that I’m using and made a few changed (some import changes, print command syntax, and some mixed tab/space and indenting issues). The script ran and created a file that “looked” OK, so I ran it on the systemd unit files that I had for the workers.

However, there were two concerns with the results. One, was that for a systemd script with arguments, that lines were converted from this:

ExecStart=/usr/local/bin/kubelet \
--config=/var/lib/kubelet/kubelet-config.yaml \
--kubeconfig=/var/lib/kubelet/kubeconfig \
--register-node=true \
--v=2

to this:

start_daemon -p $PIDFILE /usr/local/bin/kubelet \
start_daemon -p $PIDFILE -config=/var/lib/kubelet/kubelet-config.yaml \
start_daemon -p $PIDFILE -kubeconfig=/var/lib/kubelet/kubeconfig \
start_daemon -p $PIDFILE -register-node=true \
start_daemon -p $PIDFILE -v=2

Now, I don’t know much about systemV init scripts, but I’m wondering if this conversion is correct. With the ones I did manually, I had the service name and then ${OPTIONS} with all the args. I figured I’d just try it and see if it works correctly.

The other concern was that the first service I need to apply, has an ExecStartPre and ExecStart line:

ExecStartPre=/sbin/modprobe overlay
ExecStart=/bin/containerd

It was converted to this:

start_daemon /sbin/modprobe overlay
...
start_daemon -p $PIDFILE /bin/containerd

I was eager to see if that would work correctly, so I gave it a try. This is when I hit the third and fatal problem. There was no modprobe command. I installed kmod, so that the command worked, but found that there was no overlay module. The lsmode did not list the overlay module.

My thought here was that because these Docker containers are using the MacOS kernel, this module was not available. I may be wrong, but I think this method is sunk.

Try #2: Virtualization (failed)

It looks like there are a lot of choices here, Parallels (commercial), VirtualBox, QEMU, etc. I found a link about ways to run virtualization on a arm64 based Mac. I have VirtualBox on my Mac already, but I was intrigued with UTM, which is a virtualization/emulation app for iOS and MacOS, and is based on QEMU.

I created a DockerHub repo, that has supporting scripts to make setting up the environment. For this attempt, I used tag “initial-try” in the repo.

Prerequisites

This is assuming you are on a arm64 based Mac (M1+), have a DockerHub account for building/pushing images, and have downloaded Ubuntu 24.04 server ISO (or whatever you want to use, granted there may be some modifications needed to the steps).

Prep Work

I installed UTM following their instructions, and then created a new virtualization machine, using the Ubuntu 24.04 ISO image I had laying around. I did the default 4 GB ram, 64 GB disk, and added a directory on my host use use as a shared area, in case I wanted to transfer files to/from the host. You can skip this, if desired.

Before starting the VM, I opened the settings, went to network settings(which are set to shared network), and clicked on the “Show Advanced Settings” checkbox.

I told it to use the network 10.0.0.0/24, with DHCP using IPs form 10.0.0.1 to 10.0.0.100. You could leave it as-is, if desired. I just wanted an easy to type IP.

I ran the Ubuntu install process (selecting to use the Ubuntu updates that are available). I named the host “utm” and setup the disk with LVM (the default). As part of the process, I selected to install openssh and import my DockerHub public key, so I can SSH in without a password, and to install docker.

Upon completion, I stopped the VM, and went to UTM settings for the VM and cleared the ISO image from the CD drive, so that it would not boot again to the installer CD. Restarted the VM and verified that I could log in and SSH in to the IP 10.0.0.3. I ran “lsmod” and could see the “overlay” module, so that was a good sign.

To complete the setup of the shared area, I did the following commands:

sudo mkdir /mnt/utm

As root, edit /etc/fstab and add:

# Share area
share /mnt/utm 9p trans=virtio,version=9p2000.L,rw,_netdev,nofail,auto 0 0

You can update the mount with:

sudo systemctl daemon-reload
sudo mount -a

You can now see the shared area files under /mnt/utm. Note the ownership and group of the files, which will match that of the Mac, which is not what you want likely. To make them match this VM (but still be the same on the host), we’ll make another mount and make sure you have bindfs installed:

mkdir ~/share
sudo apt-get install bindfs -y

Add the following to /etc/fstab, substituting the MacOs owner (UID) and group (GID) values that you noted above, and your username for the account you are on:

# bindfs mount to remap UID/GID
/mnt/utm /home/USERNAME/share fuse.bindfs map=UID/1000:@GID/@1000,x-systemd.requires=/mnt/utm,_netdev,nofail,auto 0 0

For, me, my UID was already 1000 (I had changed it on my Mac so that it was the same as Linux systems I have, normally it is like 501 or 502), and my GID was 20. Update the mount again:

sudo systemctl daemon-reload
sudo mount -a

You should now see the files in ~/share with the user and groups matching your VM. One of the files in this shared area is a file called “pcm”, the username I’m using, with:

pcm ALL=(ALL) NOPASSWD: ALL

I did “sudo cp share/pcm /etc/sudoers.d/” so that from now on, I don’t need a password for sudo commands. If you want, you can just create a file named with your username (and has your username as the first word) and place it in /etc/sudoers.d/.

Next, I want to setup Docker, so that I can run w/o sudo. This is assuming you already have an account on DockerHub, and have setup a passkey for login via the command line. I used these commands to setup the docker user:

sudo groupadd docker
sudo usermod -aG docker $USER
sudo gpasswd -a $USER docker
newgrp docker

I had to reboot (they say to log out and back in, but that did not work for me).

Lastly, I installed tools I wanted for development (emacs, ripgrep).

Trying The Hard Way (Again)…

I’m now ready to give this a go. to review, here are the steps they have…

Prerequisites (1)
Setting up the Jumpbox (2)
Provisioning Compute Resources (3)
Provisioning the CA and Generating TLS Certificates (4)
Generating Kubernetes Configuration Files for Authentication (5)
Generating the Data Encryption Config and Key (6)
Bootstrapping the etcd Cluster (7)
Bootstrapping the Kubernetes Control Plane (8)
Bootstrapping the Kubernetes Worker Nodes (9)
Configuring kubectl for Remote Access (10)
Provisioning Pod Network Routes (11)
Smoke Test (12)
Cleaning Up (13)

On the VM, pull my repo:

git clone https://github.com/pmichali/k8s-the-hard-way-on-mac.git
cd k8s-the-hard-way-on-mac

Before starting, set your Docker user ID, so it can be referenced in the scripts:

export DOCKER_ID=YOUR_DOCKER_USERNAME

First off, I wanted to create all the ssh keys for each node, build a docker image to use for the nodes, and create a network 10.10.10.0/24:

./prepare.bash
./build.bash

All four of the nodes are created as docker containers with:

./run.bash jumpbox
./run.bash server
./run.bash node-0
./run.bash node-1

Now we have the four machines running as Docker containers for step 1 (Prerequisites) of the process. The architecture is aarch64, when checking with “uname -mov”. Before continuing on, we’ll set the known hosts and SSH keys on all the nodes so that we can SSH without passwords:

./known-hosts.bash

We’ll also copy over scripts to various nodes, so that we can run them. These scripts are the commands mentioned in the various steps of “Kubernetes The Hard Way”:

./copy-scripts.bash

For step 2 (Setting up the Jumpbox), we’ll access the jumpbox container, clone the kelseyhightower/kubernetes-the-hard-way repo, and install kubectl using the following command:

docker exec jumpbox /bin/bash -c ./jumpbox-install.bash

Because of the earlier steps we did, to setup known-hosts and authorized-keys on each node, and configure /etc/hosts, there is nothing to do for step 3 (Provisioning Compute Resources) of the process. You can verify that you can ssh into any node from the jumpbox, by accessing the node with the following command, and trying to SSH to other nodes by name (e.g. ssh node-1):

docker exec -it jumpbox /bin/bash

For step 4 (Provisioning the CA and Generating TLS Certificates), run these scripts from the jumpbox:

./CA-certs.bash
./distribute-certs.bash

For step 5(Generating Kubernetes Configuration Files for Authentication), run these scripts from jumpbox:

./kubeconfig-create.bash
./distribute-kubeconfigs.bash

For step 6 (Generating the Data Encryption Config and Key), run this script on jumpbox:

./encryption.bash

For step 7 (Bootstrapping the etcd Cluster), run this script on jumpbox:

./etcd-files.bash

Then, from the jumpbox, ssh into the server and run the script to startup the service for etcd:

ssh server
./etcd-config.bash

This FAILED as, for some reason, systemd is not running on any of the docker containers, even though it is on the host VM.

I did a little bit of research, and I see that Docker does not normally run systemd, as the expectation is that a container will be running one service (not multi-service, like on the host). I see some “potential” solutions…

One is to run the container with a “systemd” replacement as the running process (command). This would handle doing the systemctl start/stop operations, and it reads and processes the corresponding systemd init scripts for services that are started. It’s detailed here, and seems like maybe the most straight forward option. I haven’t tried this, but I think it would have to re done from the UTM virtual machine so that we have the overlay module that is also needed.

A second is to run systemd in the container. It looks like that requires a bunch of things, installing systemd, using /sbin/init as the command to run, volume mounting several paths to the host (so I think would require running from a VM still), and running in privileged mode. Several posts indicate different methods. I haven’t tried this either.

A third way would be a more heavyweight solution of running VMs for each of the nodes, so that systemd is running and the overlay module is present. Fortunately, I think I have found another way that may work…

Try 3: Podman (failed)

When looking for solutions for how to run systemd in a Docker container, I saw mention of how podman has systemd running by default, so I wanted to give it a try. Many of the commands are the same as Docker, so it would be easy to setup.

Before doing this in the UTM VM that I had, I decided to just try it from my Mac host. I installed the CLI version of podman on my Mac from https://podman.io/. Next, I updated the files in my github repo to refer to podman, and to alter the container image specification that I was using.

With these changes, I was ready to give another try…

Kubernetes The Hard Way (yet again)

In the GitHub repo, there is a machines.txt file with a list of all the nodes with IP, FQDN, node name, and pod subnet (if applicable). This is read by the scripts to configure nodes as needed through the process. Let’s get started with the steps for the Kubernetes The Hard Way tutorial (listed above). They will be very similar to try #2.

For step 1 (Prerequisites), store your DockerHub user id in an environment variable for use by some of the scripts:

export DOCKER_ID=YOUR_DOCKER_ID

Create and startup the podman machine with the following script:

./init.bash

Note: If you happen to be running Docker Desktop, it will indicate that you can set the DOCKER_HOST environment variable, so that podman can access its machine. In that case, just copy and paste the export command shown.

Create the ssh keys for all the nodes, build the image to be used by nodes (along with all the SSH keys generated), and create the network:

./prepare.bash
./build.bash

There will now be a container image named localhost/${DOCKER_ID}/node:v0.1.0 in the local registry. The four containers can now be created, using the container image:

./run.bash jumpbox
./run.bash server
./run.bash node-0
./run.bash node-1

When started, the container will get an IP, name, and FQDN from the machines.txt file, and will rename the public and private keys for the specific node to id_rsa.pub and id_rsa.

One issue is that because openssh-server is installed as part of the container image, every container created will have the same host keys. To make unique keys, we’ll run “ssh-keygen -A” on each node to generate new host keys. Then, the host keys will be collected and placed into the ~/.ssh/known-hosts file on each node, so that we can SSH from node to node easily. Run the following to make those changes:

./known-hosts.bash

Note: You probably could remove the sshd install from the Containerfile that builds the container image, and then install sshd, after running each node to create unique host keys. Then, the above script could just collect the host keys and build the known-host file, and not have to delete and regenerate host keys.

There are podman commands to see what has been created so far:

podman network ls
podman network inspect k8snet
podman ps -a

At this point, we can copy the scripts I created to the various nodes:

podman cp CA-certs.bash jumpbox:/root/
podman cp distribute-certs.bash jumpbox:/root/
podman cp kubeconfig-create.bash jumpbox:/root/
podman cp distribute-kubeconfigs.bash jumpbox:/root/
podman cp encryption.bash jumpbox:/root/
podman cp etcd-files.bash jumpbox:/root/

podman cp etcd-config.bash server:/root/

These scripts are just the commands listed in the tutorial, so that you can run the script, instead of copy and pasting all the commands in the steps.

For step 2 (Setting up the Jumpbox), we’ll access the jumpbox container, clone the kelseyhightower/kubernetes-the-hard-way repo, and install kubectl using the following command:

podman exec jumpbox /bin/bash -c ./jumpbox-install.bash

Because of the earlier steps we did, to setup known-hosts and authorized-keys on each node, and configure /etc/hosts, there is nothing to do for step 3 (Provisioning Compute Resources) of the process. You can accessing a node, like jumpbox, with the following command, and trying to SSH to other nodes by name (e.g. ssh node-1):

podman exec -it jumpbox /bin/bash

For step 4 (Provisioning the CA and Generating TLS Certificates), run these scripts from the jumpbox:

./CA-certs.bash
./distribute-certs.bash

For step 5(Generating Kubernetes Configuration Files for Authentication), run these scripts from jumpbox:

./kubeconfig-create.bash
./distribute-kubeconfigs.bash

For step 6 (Generating the Data Encryption Config and Key), run this script on jumpbox:

./encryption.bash

For step 7 (Bootstrapping the etcd Cluster), run this script on jumpbox:

./etcd-files.bash

Then, from the jumpbox, ssh into the server and run the script to startup the service for etcd:

ssh server
./etcd-config.bash

For step 8 (Bootstrapping the Kubernetes Control Plane), return back to jumpbox and run the script to place API server, Controller Manager, and Scheduler onto server node:

exit
./push-controller-settings.bash

Then, from the jumpbox, ssh into the server and start up these three services:

ssh server
./bootstrap-controllers.bash

Return to the jumpbox and verify the controller is working by requesting the version:

exit
curl -k --cacert ca.crt https://server.kubernetes.local:6443/version

For step 9 (Bootstrapping the Kubernetes Worker Nodes), from the jumpbox node, run this script to place configs, kubelet, kube-proxy, and kubectl on worker nodes:

./push-worker-settings.bash

Then, on each of the worker nodes, node-0 and node-1, append the following to kube-proxy-config.yaml to fix an issue with containerd and shim task being created:

conntrack:
maxPerCore: 0

And then, invoke:

./bootstrap-workers.bash

Go back to the jumpbox (exit from node-*, or podman exec, if totally exited containers). Check that nodes are running with:

ssh server kubectl get nodes -o wide --kubeconfig admin.kubeconfig

Note: this showed the nodes ready, but I was seeing the kube-proxy service not starting on the worker nodes (systemctl status kube-proxy).

For step 10 (Configuring kubectl for Remote Access), on jumpbox, invoke this script, which creates an admin config and runs lubectl command to show version and nodes:

./remote-access.bash

For step 11 (Provisioning Pod Network Routes), run the following script on jumpbox to create routes for pods to communicate:

./set-routes.bash

For step 12 (Smoke Test), you’ll perform a sequences of steps on the jumpbox to verify the installation. First, create a secret:

kubectl create secret generic kubernetes-the-hard-way \
--from-literal="mykey=mydata"

Check the secret:

ssh root@server 'etcdctl get /registry/secrets/default/kubernetes-the-hard-way | hexdump -C'

The key should be prefixed by ‘k8s:enc:aescbc:v1:key1‘.

Second, create a deployment:

kubectl create deployment nginx \
--image=nginx:latest

Verify the pod is running:

kubectl get all

Hitting a problem where the pod is not coming up and is showing an error:

 Warning FailedCreatePodSandBox 7s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox "73794753461123b907388be4a0b9dcb6b4cf304ceae312e7d2cedfbc0776ed69": failed to create containerd task: failed to create shim task: failed to mount rootfs component: invalid argument

It looks like containerd has some issues as well as there was mention in the log about undefined “SUBNET”. Looking at issues, I saw two things. There is mention of kubelet-config.yaml getting modified and copied to node-0/1 and then the original getting copied over (and overwriting). The modification replaces SUBNET with the actual subnet. I’m wondering if that was causing a problem with containerd.

There was another issue mentioned that someone was creating a Kubernetes The Harder Way with deployment locally on Mac with QEMU. May want to try that.

Trying from the start, with the kubelet-config.yaml overwrite resolved.

Try 4: Harder Way

When looking at the Github issues for the Kubernetes The Hard Way, I saw a posting of someone who made a repo doing the same thing, only geared towards local development and has a MacOs/ARM64 and Linux/AMD64 guide. It is called Kubernetes The Harder Way, and I’ll give that a try next.

Starting out, I cloned the repo:

git clone git@github.com:pmichali/k8s-the-hard-way-on-mac.git

Following the instructions there, which are very clear. Some observations/comments;

They create 7 nodes, some with 2GB RAM, some with 4GB RAM. I used 1GB and 1.5GB, as my Mac only has 16GB RAM.
A node is used for a load balancer, to allow app APPI requests to use one IP and it distributes to control plane nodes. What I’ve done in the past, is to use kube-vip for a common IP for the API.
The tmux tool is used, with a script that creates panes for each node and then they may use of the synchronize option, so that one command can be applied to multiple nodes (e.g. all control plane nodes) at the same time. A great idea.
A cloud-init image for Ubuntu is used, and they use the cloud-init files to set SSH key for remote access, to install tools, and to update the OS on startup.

Category: Uncategorized | Comments Off

July 10

Django App on Kubernetes

Viewmaster

For all the movies I own (500+), I had a spreadsheet listing them, so that when people visited, they could pick out a movie for us to watch. It was tedious, as I’d have to print it, or bring up the spreadsheet, and then, if they wanted to see a comedy, for example, I would sort by the “genre” column.

Wanting a better way to use this list of movies, I decided to make a web site with the information that would shows the title, genre, release date, rating, duration, and format (4K, Blu-ray, DVD). There were buttons to display the movies in multiple ways:

Alphabetical (e.g do I have “The Matrix”?)
Genre, then alphabetical (e.g. what comedy movies do I have?)
Genre, date, then alphabetical (e.g. what are the newest “SCI-FI” movies?)
Date, then alphabetical (e.g. what are the new releases that I have?)
Collection, then date (e.g. Die Hard movies in order)
Format, then alphabetical (e.g. what 4K movies do I have?)

There is a search box to look for a specific title, an option to see more details one each movie (aspect ratio, audio, cost, and collection keyword), and an option to include Laser Discs. I don’t have a LD player anymore, but I use the covers of the movies as wall hangings and still have about 60 discs.

I created a Django app or the web site, set it up to run in a Docker container, and made a script to import the spreadsheet info I had, into the movie database. This ran on a Raspberry Pi4 and was accessible locally on my network.

Now that I have a Kubernetes cluster, I want to port this web based Docker app into my cluster.

The Plan…

Here are the goals for this effort:

Use a deployment with one instance of the app running on a pod.
Instead of having a SQLite database in a file on the host, use a database like Postgres.
Have the database of movie information in Longhorn storage, so I can back it up.
Put confidential info into Secrets. Don’t have anything confidential in the app.
(Optionally) Make this web app accessible from outside my home, using HTTPS (make use of the NGINX Virtual Server I’ve already set up for my Emby music server).
Use a separate namespace for this app, rather than the “default”, to isolate things.

I found some videos on how to port Django apps to Kubernetes, and each were doing things slightly differently. So I used one method, sprinkled in some ideas from other methods, and added some more things that I wanted. Let’s get started on the journey…

Collect Together The Needed Items

First, I cloned the docker implementation of my app into my work area for Kubernetes. This has the typical Django development tree structure, plus a Dockerfile I used to package things up, and the SQLite3 database file that was used by that implementation (the Dockerfile mapped the ./DBase/movies.db file from the GIT repo on the host, to a mount point in the container – this way I could backup the database periodically).

You can take whatever Django app you have to do the same porting effort, whether it has a Docker setup or not. Here is my viewmaster app as an example Django app:

cd ~/workspace/kubernetes/
git clone https://github.com/pmichali/viewmaster.git
cd viewmaster
mkdir deploy

The master branch has the code right before I started the porting effort. The k8s-port branch has any app code changes, and the manifests and supporting files that I used to port to Kubernetes.

Prepare Settings

Create an environment file with the values you want for secrets (viewmaster-secrets.env):

cd deploy

SECRET_KEY='a unique string that django will use'
DB_HOST=viewmaster-postgres
POSTGRES_DB=name-of-your-database
POSTGRES_USER=name-for-your-db-user
POSTGRES_PASSWORD='pass-you-want-for-database'
PUBLIC_DOMAIN=movies.my-domain.com

The first is a secret key used for cryptographic signing in Django. The last one is for app use, and the others are for the database (fill in the items in red). Create the secrets and then remove the file:

kubectl create namespace viewmaster
kubectl create secret generic viewmaster-secrets -n viewmaster --from-env-file=viewmaster-secrets.env
rm viewmaster-secrets.env

Create a config map, which has settings for both Django and a Postgres database (viewmaster-configmap.yaml):

apiVersion: v1
kind: ConfigMap
metadata:
  name: viewmaster-cm
  namespace: viewmaster
data:
  ALLOWED_HOSTS: "*"
  LOGLEVEL: "info"
  DEBUG: "0"
  PGDATA: "/var/lib/postgresql/data/db-files/"

Of note is PGDATA, which tells Postgres to use a directory below the mount point that we will create, so that Postgres will not complain about a non-empty directory (it will have a .lost-found directory). Do a “kubectl apply -f viewmaster-configmap.yaml” to create the config map.

Deploy The Database

I created a manifest (postgres.yaml) with everything needed to deploy the Postgres database that I want to use:

apiVersion: v1
kind: Service
metadata:
  name: viewmaster-postgres
  namespace: viewmaster
  labels:
    app: viewmaster
spec:
  ports:
    - port: 5432
  selector:
    app: viewmaster
    tier: postgres
  clusterIP: None
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: viewmaster-postgres-pvc
  namespace: viewmaster
  labels:
    app: viewmaster
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
  namespace: viewmaster
  labels:
    app: viewmaster-postgres
spec:
  selector:
    matchLabels:
      app: viewmaster
      tier: postgres
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: viewmaster
        tier: postgres
    spec:
      volumes:
        - name: viewmaster-data
          persistentVolumeClaim:
            claimName: viewmaster-postgres-pvc
      containers:
        - image: postgres:16.3-alpine
          name: postgres
          ports:
            - containerPort: 5432
              name: postgres
          volumeMounts:
            - name: viewmaster-data
              mountPath: /var/lib/postgresql/data
          envFrom:
          - secretRef:
              name: viewmaster-secrets
          - configMapRef:
              name: viewmaster-cm

First, we have the service that will use port 5432 with no IP assigned. Second, is the 10 GB persistent volume claim using our default Longhorn storage. Finally, we have the deployment with a container using a current version of Postgres, referencing port 5432, and mounting using the PVC defined for the mount of the data area Postgres uses. The environment settings used by Postgres will come from the secret and config map created.

Do a “kubectl apply -f postgres.yaml”. There should be a deployment, replicaset, service and pod running for Postgres. In addition, there will be a 10 GB PV created and a claim.

Modify App To Use Environment Variables

In preparation to running things under Kubernetes, we want to remove the hard coding of secrets and other confidential information from the Django application, and obtain the values from environment variables that will be passed in. For the Viewmaster app, I moved to the movie_library/movie_library/ area in the repo and edited settings.py to change/add these lines:

import os

SECRET_KEY = os.environ.get('SECRET_KEY', 'changeme')

DEBUG = bool(int(os.environ.get('DEBUG', 0)))

ALLOWED_HOSTS = []
ALLOWED_HOSTS.extend(
    filter(
        None,
        os.environ.get('ALLOWED_HOSTS', '').split(','),
    )
)

MIDDLEWARE = [
    ...
    'whitenoise.middleware.WhiteNoiseMiddleware',
]

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': os.environ.get('DB_HOST'),
        'NAME': os.environ.get('POSTGRES_DB'),
        'USER': os.environ.get('POSTGRES_USER'),
        'PASSWORD': os.environ.get('POSTGRES_PASSWORD'),
    }
}

STATIC_URL = 'static/'
STATIC_ROOT = '/vol/web/static'
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'

We get the secret key, debug flag, and allowed hosts from environment variables passed to the app at startup. The database engine is set to Postgress and environment variables used for the host, database name, username, and password (removing what was there for SQLite). I could have used database agnostic names for these, but since they are shared with the Postgres pod, I used the same names (versus duplicating entries).

Because I switched from Django’s “runserver” to “gunicorn” and I’m not running in debug mode, I had to add the Whitenoise middleware, and specify STATIC_ROOT and STATICFLES_STORAGE, so that static files could be located.

Since I didn’t want to have the movie listing to require the path /viewmaster/, I changed the urlpattern in urls.py in the ./movie_library/movie_library/ area of the repo, to use the root of the HTML tree:

 urlpatterns = [
- path('viewmaster/', include('viewmaster.urls')),
+ path('', include('viewmaster.urls')),

Another cleanup item in the Viewmaster project, is an unused sqlalchemy import in ./movie_library/viewmaster/views.py (my bad). When converting over to Kubernetes, we won’t be including that package, so delete the import.

The latest code in the k8s-port branch of the repo has all these changes.

Build Image For Kubernetes

UPDATE: Newer python and poetry version used…

The next goal is to create a docker image for the Django app. I already have a Dockerfile at the top of the repo (~/workspace/kubernetes/viewmaster/), so I’ll just modify it to look like this:

FROM python:3.13.1

# Python and setup timezone
RUN apt-get update -y && apt-get install -y software-properties-common python3-pip postgresql-client

# Fault handler dumps traceback on seg faults
# Unbuffered sends stdout/stderr to log vs buffering
ENV CODEBASE=/code \
PYTHONENV=/code \
PYTHONPATH=/code \
EDITOR=vim \
PYTHONFAULTHANDLER=1 \
PYTHONUNBUFFERED=1 \
PYTHONHASHSEED=random \
PIP_NO_CACHE_DIR=off \
PIP_DISABLE_PIP_VERSION_CHECK=on \
PIP_DEFAULT_TIMEOUT=100 \
POETRY_VERSION=2.0.0

# System dependencies
RUN pip3 install "poetry==$POETRY_VERSION"

# Copy over all needed files
WORKDIR /code
COPY README.txt poetry.lock pyproject.toml runserver.bash /code/
COPY movie_library/ /code/movie_library/

# setup tools for environment, using pyproject.toml file
RUN poetry config virtualenvs.create false && \
poetry install

EXPOSE 80

# CMD sleep infinity
CMD ["/code/runserver.bash"]

I included the Postgres client package, in case I wanted to access the database from this pod (it is included in the Postgres pod we already created). I removed the user account setup lines, and added a line to expose port 80. Other things to consider, when doing this, is whether you want to update the Python base image version, and the Poetry version.

There are two other related changes. The runserver.bash file, in the same area, was changed to this:

#!/bin/bash
cd /code/movie_library
python manage.py collectstatic --noinput
python manage.py migrate
gunicorn -b :8080 movie_library.wsgi:application

Instead of running the built-in Django server, the script now does collectstatic, migration, and then runs the gunicorn server for our Python app using port 8080 (instead of 8642).

UPDATE: New syntax of poetry 2.0.0 pyproject.toml…

The pyproject.toml file, which contains the package definitions used, is changed to contain:

[project]
name = "viewmaster"
version = "0.1.3"
description = "My movies"
authors = ["YOUR NAME <YOUR_EMAIL_ADDRESS>"]
readme = "README.txt"
package-mode = false
requires-python = ">=3.13,<4.0"
dependencies = [
    "django (==4.2.14)",
    "django-auditlog (==2.3.0)",
    "psycopg (==3.2.1)",
    "gunicorn (==22.0.0)",
    "whitenoise (==6.7.0)"
]

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

I bumped the minor version number. The xlrd, openpyxl, sqlalchemy, and pandas packages are removed and the psycopg, gunicorn, and whitenoise packages are added. On your host, you can do ‘poetry update’ and if needed, update versions in the pyproject.toml file for the versions you are using. When the docker image is created, it will install these packages into container and setup PATH to reference the environment.

Now, from the top of the repo, we can build the docker image locally with:

docker buildx build . -t YOUR_DOCKER_ID/viewmaster-app:v0.1.3

With that completed, and assuming you have an account setup on Docker Hub, you can push the image up to your account:

docker push YOUR_DOCKER_ID/viewmaster-app:v0.1.3

It’s a good idea to use a different version, each time you update your app, so that when you deploy into Kubernetes it will download the updated image (assuming you update the deployment version, of course). Initially, I was using “latest”, but I had to set the image pull policy for the container to “Always”, instead of “IfNotPresent”.

Deploy The Django App

In the ./deploy/ area, create a manifest (django.yaml), to deploy the Viewmaster app:

apiVersion: v1
kind: Service
metadata:
  name: viewmaster-service
  namespace: viewmaster
  labels:
    app: viewmaster
spec:
  ports:
    - port: 8000
      targetPort: 8080
      name: http
  selector:
    app: viewmaster
    tier: app
  type: LoadBalancer

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: viewmaster-app-pvc
  namespace: viewmaster
  labels:
    app: viewmaster
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: viewmaster
  namespace: viewmaster
  labels:
    app: viewmaster
spec:
  selector:
    matchLabels:
      app: viewmaster
      tier: app
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: viewmaster
        tier: app
    spec:
      volumes:
        - name: viewmaster-app-data
          persistentVolumeClaim:
            claimName: viewmaster-app-pvc
      containers:

        - image: pmichali/viewmaster-app:v0.1.1
          imagePullPolicy: Always  # IfNotPresent
          name: app
          ports:
            - containerPort: 8080
              name: app
          volumeMounts:
            - name: viewmaster-app-data
              mountPath: /vol/web
          envFrom:
          - secretRef:
              name: viewmaster-secrets
          - configMapRef:
              name: viewmaster-cm

We create a service, listening on port 8000, and using load balancer for a “public” IP. A persistent volume of 10 GB will be used for the app. Finally, a deployment with the container image that was built, a volume mapping for data, and environment information from the config map and secrets defined.

Note that I’m setting it to pull the image “Always”, because I’m going through iterations. Once done, you can set this to IfNotPresent. Otherwise, you are forced to update the version tag, and build/push with the new tag, for each iteration.

Do a “kubectl apply -f django.yaml” and make sure the pod is running. You can setup the superuser account by exec-ing into the viewmaster pod and running the createsuperuser command. For example:

kubectl exec -it -n viewmaster viewmaster-6c956ddb66-sxq4f -- /bin/bash
cd movie_library
python manage.py createsuperuser

Enter in a username, email address, and password. While in the pod, you can access the database with the database shell command:

python manage.py dbshell

From here, you can view all the tables that were created, when the viewmaster app was started, by doing “\dt”:

viewmasterdb=# \dt
List of relations
Schema | Name                        | Type  | Owner
--------+----------------------------+-------+--------------
public | auditlog_logentry           | table | viewmasterer
public | auth_group                  | table | viewmasterer
public | auth_group_permissions      | table | viewmasterer
public | auth_permission             | table | viewmasterer
public | auth_user                   | table | viewmasterer
public | auth_user_groups            | table | viewmasterer
public | auth_user_user_permissions  | table | viewmasterer
public | django_admin_log            | table | viewmasterer
public | django_content_type         | table | viewmasterer
public | django_migrations           | table | viewmasterer
public | django_session              | table | viewmasterer
public | viewmaster_movie            | table | viewmasterer
(12 rows)

You can verify that the superuser account is correct, with the “select * from auth_user;” command. This shell can be used to import existing movie data…

Import Existing Data

Rather than re-enter all the movie information into this new Kubernetes based implementation, I wanted to export/import what I already have. In the repo I provided, there is a ./DBase/importVM.sql file with the data to import for my app, but I want to detail how this was created, as it wasn’t exactly trivial.

The Docker implementation had a SQLite database in ./DBase/movies.db. The first step was to export the database as a .sql file. I did the following:

cd DBase
sqlite3
.open movies.db
.once export.sql
.dump
.quit

From the export.sql file, I want the “viewmaster_movies” table. I created the file (importVM.sql) with the INSERT lines for that table from the export.sql file, all wrapped inside of “BEGIN TRANSACTION;” and “COMMIT;” lines, so that the Progres database would only be updated if all the lines could be processed:

BEGIN TRANSACTION;
INSERT INTO viewmaster_movie VALUES(1,'12 Monkeys',1995,'SCI-FI','02:10:00.000000','LD','LB','D-SURR','',25,1,0,'R');
...
INSERT INTO viewmaster_movie VALUES(656,'Shawshank Redemption',1994,'DRAMA','02:22:00','4K','1.85:1','DTS-HD','',20.39000000000000056,1,0,'R');
COMMIT;

Unfortunately, there are differences between SQLite and Postgres. If we look at the field layout in the Postgres database, we see (trimmed):

viewmasterdb=# \d viewmaster_movie
Table "public.viewmaster_movie"
Column      | Type                   | Nullable |
------------+------------------------+----------+
id          | bigint                 | not null |
title       | character varying(60)  | not null |
release     | integer                | not null |
category    | character varying(20)  | not null |
rating      | character varying(5)   | not null |
duration    | time without time zone | not null |
format      | character varying(3)   | not null |
aspect      | character varying(10)  | not null |
audio       | character varying(10)  | not null |
collection  | character varying(10)  | not null |
cost        | numeric(6,2)           | not null |
paid        | boolean                | not null |
bad         | boolean                | not null |

When I look at the table definition (reformatted for readability) in the export.sql file, I see:

CREATE TABLE IF NOT EXISTS "viewmaster_movie" (
  "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
  "title" varchar(60) NOT NULL,
  "release" integer NULL,
  "category" varchar(20) NOT NULL,
  "duration" varchar(5) NULL,
  "format" varchar(3) NULL,
  "aspect" varchar(10) NULL,
  "audio" varchar(10) NULL,
  "collection" varchar(10) NULL,
  "cost" decimal NULL,
  "paid" bool NULL,
  "bad" bool NULL,
  "rating" varchar(5) NULL
);

As you can see, the rating field is in a different position. This means that it will be in the wrong place in the existing INSERT lines, as the Postgres database is expecting the rating to be the fifth field and not the last field:

INSERT INTO viewmaster_movie VALUES(1,'12 Monkeys',1995,'SCI-FI','02:10:00.000000','LD','LB','D-SURR','',25,1,0,'R');

I decided that the easiest way to deal with this, is to add the ordering to the INSERT lines (added text in red), so they each look like this:

INSERT INTO viewmaster_movie ("id", "title", "release", "category", "duration", "format", "aspect", "audio", "collection", "cost", "paid", "bad", "rating")
VALUES(1,'12 Monkeys',1995,'SCI-FI','02:10:00.000000','LD','LB','D-SURR','',25,1,0,'R');

Essentially, we’re telling the insert command the order of the fields, rather than assuming they are in the same order as defined in the database. There can be cases, where in your new database, you named fields (or tables) differently, so this specification of fields can help.

Another issue is that SQLite export of boolean values use the numbers zero and one, whereas Postgres thinks these are integers. I ended up using my editor to wrap the values in single quotes (‘0’ and ‘1’), so that they are evaluated as boolean values. I made use of Emacs macros to do this quoting of the second and third from last values. I read later that one can change 0 to 0::boolean and 1 to 1::boolean.

With the importVM.sql file hopefully ready, I copied it to the viewmaster pod:

kubectl cp importVM.sql viewmaster/viewmaster-6c956ddb66-sxq4f:movie_library/importVM.sql

From the database shell that I have open on the viewmaster pod, I can import the table contents:

viewmasterdb=# \i importVM.sql

There is a good chance that this may fail, so you’ll have to scroll through the output and find any problems and correct them. In my case, I saw:

One entry had a value of ‘2’ for a boolean, had to change to ‘1’.
A few entries where the “audio” field was longer than the defined 10 chars max. Shortened them.
There were some cases of aspect ratio 16:9, which were treated as a time value with extra characters for seconds/microseconds and exceed width. Changed to “16×9”.
Another that had ans aspect ratio of “02:40:01.000000”, again value was treated as a time value. Changed to “2.40:1”.

Finally, the import was successful and I could do a “select * from viewmaster_movie;” from the database shell to see the entries. I’ve included the final ~/DBase/importVM.sql file in the repo, so that if you are following along, you can just import it.

Now, with some real data and a user account, we can get the IP of the service:

kubectl get svc -n viewmaster
NAME                  TYPE         CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
viewmaster            LoadBalancer 10.233.1.98  10.11.12.207  8000:30761/TCP  168m
viewmaster-postgres   ClusterIP    None         <none>        5432/TCP        17h

With a browser, I can navigate to the app at http://10.11.12.207:8000/viewmaster/ and see all the existing movies.

UPDATE: ASee “Create Movie Issue” below, for another problem that I found, after importing and using the system.

Secure Remote Access

Just like I did with the Emby music server I setup under Kubernetes, I want to do the same thing for this Django app. There are already some pieces in place, namely Traefik ingress is running in the cluster to route external requests to the app and redirect HTTP requests to HTTPS, cert-manager is running to create and manage Let’s Encrypt certificates, and the router is directing external HTTP/HTTPS requests to the ingress controller.

Prep Work

Specific to this Django app, there are some things that need to be set up. Like done in the Emby post, I need to create another sub-domain for this app (e.g. music.my-domain.com), and create a CNAME record that points to the Dynamic DNS service I use, so that HTTP/HTTPS requests to that subdomain will also make it to Kubernetes.

For my Django app, I had already installed the recommended security middleware. However, at a minimum, one also needs to define the “trusted origin” domains so as not not trigger the Cross Site Request Forgery (CSRF) warnings. I had to add the following line to ./movie_library/movie_library/settings.py:

CSRF_TRUSTED_ORIGINS = ['https://' + os.environ.get('PUBLIC_DOMAIN', 'missing-domain-name')]

Now, depending on how you wrote your Django app and what external resources you use, you may need to configure other CSRF settings. The easiest (?) way to figure out what you need, is to exercise your site via HTTPS with Django running in debug mode, and then it will show any CSRF errors and will provide a link with more info on the problem and how to fix it. Here is an example from one (non-Django) site I had:


CSP_IMG_SRC = ("'self'")
CSP_DEFAULT_SRC = ("'self'")
CSP_STYLE_SRC = ("'self'", 'https://fonts.googleapis.com')
CSP_SCRIPT_SRC = ("'self'")
CSP_FONT_SRC = ("'self'", 'https://fonts.gstatic.com')
CSP_FRAME_ANCESTORS = ("'none'")
CSP_FORM_ACTION = ("'self'")

These are indicating the allowed sources for various resources accessed.

Obviously, you’ll need to do this AFTER you have HTTPS remote access running, and it may take several iterations to resolve all the issues. That is why I set the image pull policy to “Always”, instead of “IfNotPresent” in the Deployment manifest for my app. This way, I can change the app, re-build, re-push to hub.docker.com, and then delete my viewmaster pod and it will pull the new image and use it.

Otherwise, you need to update the minor version in the ./pyproject.toml, build/push the app with a new tag, and change the deployment to reference the newer tag.

Ready, Set, Go…

Now, I need to perform the steps to create a certificate and to hookup ingress to my app. The explanation is brief, but you can see a more detailed description in the Emby post.

I’ll again use a Let’s Encrypt staging certificate, and once things are working, will use the production certificate. There is a rate limit on production certificates, so if you mess things up and try too many times, you’ll get locked out for a week!

Here is the staging issuer that I created and applied (./deploy/viewmaster-issuer.yaml):

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: viewmaster-issuer
  namespace: viewmaster
spec:
  acme:
    email: your-email-address
    # We use the staging server here for testing to avoid hitting
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # if not existing, it will register a new account and stores it
      name: viewmaster-issuer-account-key
  solvers:
    - http01:
        # The ingressClass used to create the necessary ingress routes
        ingress:
          class: traefik

This is in the same namespace as the app, requires an email address, and is using the staging certificate. With that applied, we can create the ingress for the app (./deploy/viewmaster-ingress.yaml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: viewmaster
  namespace: viewmaster
  annotations:
    cert-manager.io/issuer: "viewmaster-issuer"
    traefik.ingress.kubernetes.io/router.middlewares: secureapps-redirect2https@kubernetescrd
spec:
  tls:
    - hosts:
        - movies.my-domain.com
      secretName: tls-viewmaster-ingress-http
  rules:
    - host: movies.my-domain.com
        http:
          paths:
            - path: /
              pathType: Prefix
              backend:
                service:
                  name: viewmaster-service
                  port:
                    name: http

This references the issuer, uses the middleware to force HTTP to HTTPS redirect, has the subdomain name that I’ll use, and gives a name for the secret used to hold the staging certificate. It points to the viewmaster service and that uses the /viewmaster path. Once applied, you can look for the tls-viewmaster-ingress-http cert in the viewmaster namespace to be ready. Look through the info on the Emby page for details on the certificate creation process. It’ll take a minute or so to complete.

Now you can go to https://viewmaster.my-domain.com/viewmaster/ and see the site. If use use HTTP, it should redirect. Your browser will warn that it is insecure, but you can continue and look at the certificate info to see that it is a Let’s Encrypt staging certificate.

With it working, you can delete the ingress, secret, and issuer(if desired) and then apply the production issuer (./deploy/viewmaster-prod-issuer.yaml):

apiVersion: cert-manager.io/v1
kind: Issuer
  metadata:
    name: viewmaster-prod-issuer
    namespace: viewmaster
spec:
  acme:
    email: your-email-address
    # We use the staging server here for testing to avoid hitting
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # if not existing, it will register a new account and stores it
      name: viewmaster-issuer-account-key
    solvers:
      - http01:
          # The ingressClass used to create the necessary ingress routes
          ingress:
            class: traefik

I used a different name, so that both issuers can be present at the same time. You provide an email address, and it is using the production Let’s Encrypt URL.

The production ingress (./deploy/viewmaster-prod-ingress.yaml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: viewmaster
  namespace: viewmaster
  annotations:
    cert-manager.io/issuer: "viewmaster-prod-issuer"
    traefik.ingress.kubernetes.io/router.middlewares: secureapps-redirect2https@kubernetescrd
spec:
  tls:
    - hosts:
        - movies.my-domain.com
      secretName: viewmaster-prod-cert
  rules:
    - host: movies.my-domain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: viewmaster-service
                port:
                  name: http

This is the same, only using the viewmaster-prod-issuer, and viewmaster-prod-cert certificate. Once applied and the certificate is created, you can access with HTTPS, without any insecure warning. The cert-manager will renew the certificate automatically, as needed.

With all this done, you can access the site via https://movies.my-domain.com, and if you use HTTP, it will automatically redirect to HTTPS. If you want to access from within the local network, you can use HTTP with the IP of the viewmaster service and port 8000. I didn’t explore into how to access it securely from inside the local network.

Create Movie Issue

In my playing with this ported app, I tried to add a movie. When I did so (under debug mode), I got an error saying:

duplicate key value violates unique constraint "viewmaster_movie_pkey"
DETAIL:  Key (id)=(1) already exists.

It looks like the database insert is not using the next ID. I did a “kubectl exec” into the viewmaster app pod, moved down to the movie_library/ directory, and did “python manage.py dbshell” to look at the database. First, I checked that there was a primary key for the viewmaster_movie database:

# \d viewmaster_movie;
      Table "public.viewmaster_movie"
Column      | Type                   | Collation | Nullable |...
------------+------------------------+-----------+-----------...
id          | bigint                 |           | not null |...
title       | character varying(60)  |           | not null |
release     | integer                |           | not null |
category    | character varying(20)  |           | not null |
rating      | character varying(5)   |           | not null |
duration.   | time without time zone |           | not null |
format      | character varying(3)   |           | not null |
aspect      | character varying(10)  |           | not null |
audio       | character varying(10)  |           | not null |
collection  | character varying(10)  |           | not null |
cost.       | numeric(6,2)           |           | not null |
paid.       | boolean                |           | not null |
bad         | boolean                |           | not null |
Indexes:
    "viewmaster_movie_pkey" PRIMARY KEY, btree (id)

That looked good, so I was trying to figure out how Postgres picks the next ID to use. I see that there is a “sequence” so I did:

# SELECT relname sequence_name FROM pg_class WHERE relkind = 'S';
           sequence_name
-----------------------------------
django_migrations_id_seq
...
viewmaster_movie_id_seq

Looking at the sequence for the viewmaster_movie table, I see that the last_value is “1”, instead of the next value to use:

# select * from viewmaster_movie_id_seq;
 last_value | log_cnt | is_called
------------+---------+-----------
          1 |      32 | t

I determined the maximum value in use, and changed the last value to that:

# select max(id) from viewmaster_movie;
max
-----
656

# select setval('viewmaster_movie_id_seq', 656);
setval
--------
656

Now, when I do create, it works! Whew! I found out later, that with Postgres, you can set the id field type to “SERIAL” instead of “BIGINT” and that should create the correct sequencing. I haven’t tried it here, but it worked on a database for another Django app I was porting.

TODOs…

Future Items to consider:

Add non-admin login and modify app so that everyone has to login to see the pages (to limit viewing)?
Decide if want single cert for all subdomains running under Kubernetes, instead of one per app.
App enhancements:
- See if can access public information for artwork and maybe description information for movies? Can we get technical specs too (run time, sound, aspect ratio)?
- Persist checkbox settings for “Show details” and “Show LDs”.
- Allow search initiation, when pressing enter, after entering in search phrase.
- Add index (alphabet, category, date, collection, disk format) at top to allow jumping down to a section.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

June 24

Media Server In Kubernetes

Another one of my apps running on a standalone Raspberry PI 4 (in a docker container), is the Emby media server. I ripped all my CDs to FLAC files and have been serving them up with Emby, so that I can play on my laptop, phone, Sonos speaker, and other DLNA devices. All the music is on a NAS box and I had it mounted on the Raspberry PI.

Now that I have a Kubernetes cluster of PIs, I wanted move the Emby server, and this looked like a good exercise on how to take a Docker container and run it on Kubernetes. There were some challenges, which made this harder than expected. Let’s go through the process though…

Migrating the Docker Container

After searching around, I found the a common way to migrate from Docker to Kubernetes was to use Kompose. I had this Dockerfile for Emby:

version: "2.3"
services:
emby:
image: emby/embyserver_arm64v8:latest
container_name: emby
environment:
- PUID=1000
- PGID=1003
- TZ=America/New_York
volumes:
- /var/lib/docker/volumes/emby/_data:/config
- /mnt/music:/Music
network_mode: host
# ports:
# - 8096:8096
# - 8920:8920
restart: unless-stopped

There are a couple of things of note here. First, I set the UID to the same ID used on the NAS box for the FLAC files, and a GID to the one used on the NAS box so that family members had access to the files as well. Second, I mapped the config location to the host, of which the music area was an NFS mount to the NAS box.

Lastly, I was using host mode networking, which was needed so that the Multicast DLNA packets (M-SEARCH and NOTIFY) would be seen from the container. This allowed Emby to “see” the DLNA devices on my local network. This proved to be a difficult thing to setup under Kubernetes.

I ran the Kompose convert command and it generated a deployment, service, and some PVC definitions. Of course, I ran this on my Mac, where the mount points and config area did not exist, so there were warnings and things were not setup as desired. But, it was useful, as it gave me an idea of how I wanted to define things.

I created a single manifest that incorporated some of what Kompose generated, sprinkled with settings I wanted, and setting up the volumes to use NFS, instead of PVC using Kubernetes storage. Here’s what I came up with (shown in parts) placed into ~/workspace/kubernetes/emby/k8s-emby.yaml:

apiVersion: v1
kind: Namespace
metadata:
  name: emby
---

I wanted all the music server stuff in a separate namespace.

apiVersion: v1
kind: Service
metadata:
  name: emby-service
  namespace: emby
spec:
  type: LoadBalancer
  selector:
    app: emby
  ports:
    - name: http
      port: 8096
      targetPort: 8096
      protocol: TCP
    - name: https
      port: 8920
      targetPort: 8920
      protocol: TCP
---

A service is defined, with type LoadBalancer, so that I can access the media service with a well-known IP. I used the defaults that Emby suggested for HTTP and HTTPS access to the server.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: emby
  namespace: emby
spec:
  replicas: 1
  selector:
    matchLabels:
      app: emby
  template:
    metadata:
      labels:
        app: emby
    spec:
      containers:
        - name: emby
          image: emby/embyserver_arm64v8:latest
          env:
            - name: UID
              value: "1000"
            - name: GID
              value: "1003"
            - name: GIDLIST
              value: "1003"
            - name: TZ
              value: "America/New_York"
          ports:
            - containerPort: 8096
              protocol: TCP
            - containerPort: 8920
              protocol: TCP
          volumeMounts:
            - name: config
              mountPath: /config
            - name: music
              mountPath: /Music
      restartPolicy: Always
      volumes:
        - name: config
          nfs:
            server: IP_OF_MY_NAS
            path: /music/config
        - name: music
          nfs:
            server: IP_OF_MY_NAS
            path: /music/music

For the deployment, I used the same namespace and defined to use a single pod with Emby. which will create a single pod with the latest ARM64 version of Emby (happens to be 4.8.8.0. Could have pinned to a specific version and then update, as desired, by looking at hub.docker.com). The environment settings for UID, GID, GIDLIST, and TZ are passed in to the pod. I looked at the Docker version of the latest Emby to see that it had some different settings than my (much older) version. The HTTP and HTTPS ports for Emby are defined and match the service.

Lastly, I defined a volume for config settings and another for the music repo and mapped those to the IP and share locations of my NAS. The NAS already had a share called “music”, with a directory called “music”, containing directories for all of the artists, which in turn had directories for the artist’s albums, and then FLAC files for the album songs. I created a directory in the music share, called “config” to hold the configuration settings

With this setup, we are ready to apply the manifest and configure the Emby server…

Emby Startup

After doing “kubectl apply -f k8s-emby.yaml”, I could see that there was one pod running and a service with an IP from my load balancer pool. From a browser, I navigated to http://<emby-service-ip>:8096/ and could see the Emby setup wizard. I picked the language (“English”), and created my Emby user and password.

On the “Setup Media Libraries” page, I clicked the button for “New Library”, selected the type “Music”, gave it a name “Music”, and then clicked on the folder “Add” button, selected the “/Music” directory that maps to the NFS share, and clicked “OK”. Lastly, under “Music Folder Structure”, I picked the item “Perfectly ordered into artist/album folders, with tracks directly in the album folders”, and pressed the “OK” button.

You can click on the Advanced selector at the top right of the page and then choose some other options, if desired.

On the next screens, I skipped the metadata language info (as it was fine), kept the default port mapping selection, accepted the terms of use, and clicked on the finished button.

At this point, I could click on the “Manual Login” and log in with the credentials I set up. Under the settings (gear at top right of screen), I did some more settings.

Under “Network”, I set the “LAN Networks” to the CIDR for my local network. Under “DNLA”, I checked the “Enable DNLA Server” box and chose my user under the “Default User” entry. Under “Plugins”, I clicked the “Catalog” button, and under General, installed the Sonos plugin.

With these changes, I clicked on the “Dashboard” button, clicked on the power button icon at the top, and selected to restart the Emby server to apply all the changes.

Partial success…

As-is, I can now access the URL (port 8096) from my web browser on my Mac or phone, and select and play music. However, The “Play On” menu (square box at the top right of each page), only has the selection of the web browser I’m using. I cannot see my Sonos speaker, receiver that has DLNA support, or any other DNLA devices.

I found out that the issue is with how DLNA works. From my basic understanding, the DLNA server will multicast M-SEARCH UDP packets to 239.255.255.250, using port 1900. DLNA devices will multicast NOTIFY UDP packets to the same address and port. When I was using a Docker container, the container was using Host networking, and thus was using the same IP as the host, which is on my local network.

With Kubernetes, the Emby pod is running on the pod network (10.233.0.0/18), whereas all the DLNA devices are on the local network, and these multicast packets will not traverse subnets.

I tried one solution, and that was to add to the deployment’s template spec “hostNetwork: true”. Now, the pod is on the local network and DLNA multicasts are seen and the Emby server can Play On devices like my Sonos. The problem here is that the pod has the same IP as the node that it was deployed on. This makes it hard to use, as the pod could be re-deployed on another node. Yeah, I could force it to one node, but if that node failed, I’d loose the Emby server.

Houston We Have Lift-Off!

I found that I can setup two interfaces on the pod, by using Multus. The plan is to create a second interface on the pod that is on the local network so that it can send/receive DLNA multicasts communicating with the DNLA devices. This requires several steps…

First, we need to install Multus. Fortunately, Multus works well with Calico, which I’m using on my network. Unfortunately, I could not use the “quick start” install methods for Multus on my arm64 Raspberry PI hardware. To get this installed, I first pulled the Multus repo:

git clone https://github.com/k8snetworkplumbingwg/multus-cni.git
cd multus-cni/deployments

I used the multus-daemonset.yml to create a daemonset that will install Multus on each node. However, the two image: lines need to be changed, as “ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot” is not for the arm64 platform. I think they have some multi-platform support, maybe with annotations, but I didn’t know how to set that up. So, I went to the Github Container Registry for Multus, clicked on the “OS/Arch” tab and then selected the image for arm64 and noted the version. In multus-daemonset.yml, I changed the image version:

diff --git a/deployments/multus-daemonset.yml b/deployments/multus-daemonset.yml
index 40fa5193..fa8bde5c 100644
--- a/deployments/multus-daemonset.yml
+++ b/deployments/multus-daemonset.yml
@@ -179,7 +179,7 @@ spec:
serviceAccountName: multus
containers:
- name: kube-multus
- image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot
+ image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-debug@sha256:351652b583600b0d0d704269882fd2fa53395c5ce4602a76a2799960b2c06dce
command: ["/thin_entrypoint"]
args:
- "--multus-conf-file=auto"
@@ -204,7 +204,7 @@ spec:
mountPath: /tmp/multus-conf
initContainers:
- name: install-multus-binary
- image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot
+ image: ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-debug@sha256:351652b583600b0d0d704269882fd2fa53395c5ce4602a76a2799960b2c06dce
command: ["/install_multus"]
args:
- "--type"

Now, I can “kubectl apply -f multus-daemonset.yml” to install the daemonset. Once done, I checked that the pods are running on each node:

kubectl get pods --all-namespaces | grep -i multus
kube-system kube-multus-ds-4btnp 1/1 Running 0 20h
kube-system kube-multus-ds-6p9vx 1/1 Running 0 20h
kube-system kube-multus-ds-mzb4b 1/1 Running 0 20h
kube-system kube-multus-ds-s7d8v 1/1 Running 0 20h
kube-system kube-multus-ds-twn6k 1/1 Running 0 20h
kube-system kube-multus-ds-vqxh8 1/1 Running 0 20h
kube-system kube-multus-ds-wwnbj 1/1 Running 0 20h

On a node, you can check that there is a /etc/cni/net.d/00-multus.conf file as the lexically first file. Now, a network attachment definition can be created (I added it to the k8s-emby.yaml file, after the namespace definition):

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: macvlan-conf
  namespace: emby
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "macvlan",
      "master": "eth0",
      "mode": "bridge",
      "ipam": {
        "type": "host-local",
        "subnet": "10.11.12.0/24",
        "rangeStart": "10.11.12.211",
        "rangeEnd": "10.11.12.215",
        "routes": [
          { "dst": "10.11.12.0/24" }
        ],
        "gateway": "10.11.12.1"
      }
  }'

In the metadata, I specified the “emby” namespace, so that this is visible by the emby pod. The config section is a CNI configuration. Of note is that master attribute is the interface name for the pod’s main interface (with IP on pod network). For IPAM, I used the CIDR of my local network as the subnet, and used a range of IPs that is outside of any DHCP pool, LoadBalancer pool, and existing static IPs. I set the route destination as the local network (not a default route, so that it doesn’t interfere with pod traffic).

The final step is to modify the Emby deployment so that when the Emby pod is created, it uses the new network attachment definition and creates two interfaces. This is done as an annotation under the deployment’s template metadata (added lines in red) of the k8s-emby.yaml file:

...
apiVersion: apps/v1
kind: Deployment
metadata:
  name: emby
  namespace: emby
  labels:
    app: emby
spec:
  replicas: 1
  selector:
    matchLabels:
      app: emby
  template:
    metadata:
      labels:
        app: emby
      annotations:
        k8s.v1.cni.cncf.io/networks: macvlan-conf
    spec:
      ...

Now, when we apply the deployment, the pod will have two interfaces. The main interface, eth0, and the additional net1 interface:

3: eth0@if40: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue state UP qlen 1000
    link/ether be:0f:38:6b:bc:ea brd ff:ff:ff:ff:ff:ff
    inet 10.233.115.96/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::bc0f:38ff:fe6b:bcea/64 scope link
       valid_lft forever preferred_lft forever
4: net1@tunl0: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP
    link/ether 22:76:a6:b2:34:9f brd ff:ff:ff:ff:ff:ff
    inet 10.11.12.213/24 brd 10.11.12.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::2076:a6ff:feb2:349f/64 scope link
       valid_lft forever preferred_lft forever

Now, I can access the UI at the service’s public address, and when I click on the “Play On” button, I see all the DLNA devices that receive streamed music. Yay!

Remote/Secure Access

Everything is working locally, but I wouldn’t mind being able to play music on my phone, when I’m away from home. I started planning this and decided on a few things:

- Use the domain name that I purchased and create subdomains for each app.
- Use the Dynamic DNS service that I purchased to map my domain to my home router.
- Configure the router to map HTTP and HTTPS requests to an Ingress controller, which will route the requests to the apps, based on the subdomain used.
- Use Let’s Encrypt so that all HTTPS requests have a valid certificate (from a Certificate Authority).

Prep Work

I have the domain registration (e.g. my-domain.com) and I have created subdomains for my apps (e.g. music.my-domain.com). I know my Dynamic DNS service domain name, so I created CNAME DNS records to point the domain and all the subdomains to that DDNS name.

With my router and the Dynamic DNS service, I have configured it so that the DDNS domain name is always pointing to my router’s WAN address (which is a DHCP address from my service provider and can change).

Kubernetes Work

With the external stuff out of the way (mostly), I could focus on connecting up the Kubernetes side. I installed Traefik for an ingress controller. I like this better than NGINX, because it works well with apps that are in namespaces:

helm repo add traefik https://traefik.github.io/charts
helm repo update
helm install traefik traefik/traefik

This is running in the default namespace, but can be run in a specific namespace, if desired. I made sure the pod and service were running. On my router, I set port forwarding of HTTP and HTTPS requests to the IP of the Traefik service. That will cause all external requests to use the Traefik ingress for routing. Next, install the cert-manager:

helm repo add jetstack https://charts.jetstack.io --force-update
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.15.0 \
  --set crds.enabled=true

Obviously, you can use the latest version of cert-manager that is compatible with the Kubernetes version you are running. You’ll see pods, services, deployments, and replica sets created (and running) for the cert-manager.

I created a work area to hold manifests for the resources that will be created:

mkdir -p ~/kubernetes/traefik
cd ~/kubernetes/traefik

For the Let’s Encrypt certificates, we’ll test everything out with staging certificates, and then once that is all working, we can switch to production certificates. This is done, because there is rate-limiting on production certificates and we don’t want to have multiple failures to hit the limit and block us.

Here is the staging certificate (emby-issuer.yaml):

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: emby-issuer
  namespace: emby
spec:
  acme:
    email: your-email-address
  # We use the staging server here for testing to avoid hitting
  server: https://acme-staging-v02.api.letsencrypt.org/directory
  privateKeySecretRef:
    # if not existing, it will register a new account and stores it
    name: emby-issuer-account-key
  solvers:
    - http01:
        # The ingressClass used to create the necessary ingress routes
        ingress:
          class: traefik

Note that this is in the same namespace as the app, the staging Let’s Encrypt server is used, and you provide a contact email address. For the production Issuer, emby-prod-issuer.yaml, we have:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: emby-prod-issuer
  namespace: emby
spec:
  acme:
    email: your-email-address
    # We use the staging server here for testing to avoid hitting
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      # if not existing, it will register a new account and stores it
      name: emby-issuer-account-key
    solvers:
      - http01:
          # The ingressClass used to create the necessary ingress routes
          ingress:
            class: traefik

Pretty much the same thing, only using the production Let’s Encrypt server, and a different name for the issuer. Do a “kubectl apply -f” for each of these and then do a “kubectl get issuer -A” to make sure they are ready. You can check “kubectl describe issuer -n emby emby-issuer” and under the Status section see that the staging issuer is registered and ready:

    Reason: ACMEAccountRegistered
    Status: True
    Type: Ready

Now, I create an Ingress for the app that will be used with the staging certificate:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: emby
  namespace: emby
  annotations:
    cert-manager.io/issuer: "emby-issuer"
spec:
  tls:
    - hosts:
        - music.my-domain.com
      secretName: tls-emby-ingress-http
  rules:
    - host: music.my-domain.com
      http:
        paths:
          - path: /emby
            pathType: Prefix
            backend:
              service:
                name: emby-service
                port:
                  name: http

Of note is that there is an annotation that refers to the cert-manager staging issuer and the subdomain name that will be used for this Emby app is specified both as the host and in the TLS. If desired you can leave out the annotation and the TLS section and test out accessing the Emby app by using HTTP (e.g. http://music.my-domain.com/emby). That is what I did to make sure the ingress alone was OK.

This ingress will take requests to music.my-domain.com/emby/… and pass them to the service “emby-service” (running in namespace “emby”) using the port defined in the service with the name “http” (e.g. 8096).

By applying emby-ingress.yaml, you will initiate the process of creating a staging certificate. A certificate will be created, but not ready. This will trigger a certificate request and then an order. The order will create a challenge that will verify the challenge URL is reachable and then will obtain the certificate from Let’s Encrypt. Here are the get commands you can use for resources, and then you can do describe commands for the specific resources:

kubectl get certificate -A
kubectl get certificaterequest -A
kubectl get order -A
kubectl get challenge -A

It will take some time for all this to happen, but you can “describe” the challenge and check the other resources to see when they are valid/ready/approved. Don’t worry if you see a 404 status on the challenge initially. It should clear after 30 seconds or so.

When the challenge is completed successfully, the challenge resource will be removed and there will be a new secret with the name of your certificate (e.g. tls-emby-ingress-http) in the namespace of the app. This secret would be used for the certificate that users would see when accessing your domain. Granted, it is from the staging server, so there would be a warning about the validity, but now you can repeat the process with the production certificate and then visitors would see a valid certificate.

Here is the production ingress (emby-prod-ingress.yaml) that can be used with the production issuer that was previously created:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: emby
  namespace: emby
  annotations:
    cert-manager.io/issuer: "emby-prod-issuer"
spec:
  tls:
    - hosts:
        - music.my-domain.com
      secretName: emby-prod-cert
  rules:
    - host: music.my-domain.com
      http:
        paths:
          - path: /emby
            pathType: Prefix
            backend:
              service:
                name: emby-service
                port:
                  name: http

I used different names for the issuer and secret, so that there was no conflict with the staging ones. You can delete the staging ingress, issuer, and secret, once this is working. Here is output of a successful production certificate:

kubectl get cert -n emby
NAME           READY SECRET         AGE
emby-prod-cert True  emby-prod-cert 122m

kubectl get certificaterequest -n emby
NAME              APPROVED DENIED READY ISSUER           REQUESTOR                                       AGE
emby-prod-cert-1  True            True  emby-prod-issuer system:serviceaccount:cert-manager:cert-manager 122m

kubectl get order -n emby
NAME                        STATE AGE
emby-prod-cert-1-2302305457 valid 122m

Forcing HTTPS

Right now, it is possible to use both http://music.my-domain.com/emby/ and https://music.my-domain.com/emby/. I would like to redirect all HTTP requests to HTTPS. To do that, I’ll use the Traefik redirect middleware by creating this manifest (redirect2https.yaml):

# Redirect to https
apiVersion: v1
kind: Namespace
metadata:
  name: secureapps
---
apiVersion: traefik.io/v1alpha1
kind: Middleware
metadata:
  name: redirect2https
  namespace: secureapps
spec:
  redirectScheme:
    scheme: https

As you can see, I have the namespace “secureapps”. If desired, you can use the namespace for a single app (e.g. “emby”), if you only want the redirection to apply there. You can alternatively modify the Traefik Helm chart (do a “helm show values traefik/traefik > values.yaml” and set ports.web.redirectTo: websecure” and then update the chart) to apply to all ingresses. I have not tried that method.

Now, we update the ingress manifest to add this annotation (this is the top of the emby-ingress.yaml):

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: emby
  namespace: emby
  annotations:
    cert-manager.io/issuer: "emby-issuer"
    traefik.ingress.kubernetes.io/router.middlewares: secureapps-redirect2https@kubernetescrd
spec:

The new line is in red and has the namespace-middleware (secureapps-redirect2https). If you wanted this to apply only to Emby, you could change the namespace of the Middleware to “emby” and use “emby-redirect2https”.

I deleted the ingress I was currently using, deleted the secret for the cert that was generated, and then applied the Middleware manifest and the ingress that was updated. A new cert was created and now, HTTP requests are redirected to HTTPS!

Removing The Remote Access Setup

Delete the issuers and ingress that you have (e.g. use kubectl delete -f emby-prod-ingress.yaml). You can then remove Traefik:

helm delete traefik

And the cert-manager:

helm delete -n cert-manager cert-manager
kubectl delete crd virtualservers.k8s.nginx.org

kubectl delete crd virtualserverroutes.k8s.nginx.org

kubectl get crd | grep cert | cut -f1 -d" " | xargs kubectl delete crd

And finally any secrets that were created for certificates:

kubectl delete -n emby secret emby-issuer-account-key emby-prod-cert tls-emby-ingress-http

Things To Explore

Traefik also has an IngressRoute mechanism that seems to be quite flexible and a lot of their documentation (like for Let’s Encrypt setup) uses this instead of Ingress. It may be worthwhile using that as, at first blush, it seems like a newer way of doing things.

Consider removing the /emby prefix from the path for accessing remotely. This would only apply, if Emby was the only web service being provided. If you have multiple web services, then you need to keep the prefix to discern which service to use.

Consider using one certificate for all sub-domains.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

June 12

Ad-Blocking With PI-Hole

I had Pi-Hole running on a standalone Raspberry PI, but wanted to move this to my Kubernetes cluster. Digging around, I found a useful article on how add PI-Hole to Kubernetes, which not only talked about using PI-Hole, but having redundant instances with info on keeping them in-sync. It used MetalLB, ingress, and CertManager for Let’s Encrypt certifications – something I was interested in.

There was another article, based on using Helm and having some monitoring setup. I may try someday.

UPDATE: I did an update of Pi-Hole from 2024.07.0 to 2025.03.0, and ran into a bunch of issues. See the details below of this upgrade.

A Few Things

First, as expected, this article had an older version of pi-hole (2022.12.1). I tried the latest version (at this time 2024.05.0), but the pods were stuck in crash loops. What I found out, was that for liveness/readiness, the YAML specified to do an HTTP get at the root of the Lighttp web server. When using the 2023.02.1 pihole image it worked, but with 2023.02.2 it failed.

Trying curl http://127.0.0.1/ inside the pod showed a 403 Forbidden error. If I tried to access http://127.0.0.1/admin, I’d get a 301 Moved Permanently with a ‘/admin/’ path. If I did http://127.0.0.1/admin/, I’d get a 302 Found response with path ‘login.php’. When I did http://127.0.0.1/admin/login.php, I’d get a 200 OK result with content.

So, I changed the liveness and health probe configuration to add a path field with ‘/admin/login.php’ and then the pods would come up successfully.

Second, For the PI-Hole admin web pages, I chose to use a network type of LoadBalancer (instead of ClusterIP and then setting up an ingress IP). Accessing locally is fine, as I just use the IP assigned by the load balancer. The article talks about setting up a certificate using Let’s Encrypt to be able to access remotely.

I already have a domain name, and I’m using Dynamic DNS to redirect that domain to my router’s WAN IP. But, I’m currently port forwarding external HTTP/HTTPS traffic to my standalone Raspberry PI for a music server that uses Let’s Encrypt for certificates.

For now, I think I’ll just access my PI-Hole admin page locally. I will, however, have to figure out how to setup Let’s Encrypt, once I move my music server and other web apps to the Kubernetes cluster, so it will be useful to keep this info in mind.

Setting Up PI-Hole

I’m doing the same thing as the article, running three replicas of the PI-Hole pods, and I altered the liveness/readiness check. Here is my manifest.yaml in pieces:

apiVersion: v1
kind: Namespace
metadata:
  name: pihole
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: pihole-configmap
  namespace: pihole
data:
  TZ: "America/New_York"
  PIHOLE_DNS_: "208.67.220.220;208.67.222.222"

This sets up a namespace for PI-Hole, defines the timezone I’m using, and the upstream DNS servers that I wanted to use (OpenDNS). You can customize, as desired.

---
apiVersion: v1
kind: Secret
metadata:
name: pihole-password
namespace: pihole
type: Opaque
data:
WEBPASSWORD: <PUT_BASE64_PASSWORD_HERE> # Base64 encoded

This is the password that will be used when logging into the PI-Hole admin page. You should encode this using “echo -n ‘MY PASSWORD’ | base64” and place the encoded string in the WEBPASSWORD attribute.

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: pihole
  namespace: pihole
spec:
  selector:
    matchLabels:
      app: pihole
  serviceName: pihole
  replicas: 3
  template:
    metadata:
      labels:
        app: pihole
    spec:
      containers:
        - name: pihole
          image: pihole/pihole:2024.05.0
          envFrom:
            - configMapRef:
                name: pihole-configmap
            - secretRef:
                name: pihole-password
          ports:
            - name: svc-80-tcp-web
              containerPort: 80
              protocol: TCP
            - name: svc-53-udp-dns
              containerPort: 53
              protocol: UDP
            - name: svc-53-tcp-dns
              containerPort: 53
              protocol: TCP
          livenessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login.php
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login.php
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 10
          volumeMounts:
            - name: pihole-etc-pihole
              mountPath: /etc/pihole
            - name: pihole-etc-dnsmasq
              mountPath: /etc/dnsmasq.d
  volumeClaimTemplates:
    - metadata:
        name: pihole-etc-pihole
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi
    - metadata:
        name: pihole-etc-dnsmasq
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi

This is the stateful set that will create three replicas of the PI-Hole pods. I’m using the latest version at this time (2024.05.0), have modified the liveness/readiness checks as mentioned above, and am using PVs (longhorn) for storing configuration.

---
apiVersion: v1
kind: Service
metadata:
  name: pihole
  namespace: pihole
  labels:
    app: pihole
spec:
  clusterIP: None
  selector:
    app: pihole
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-web-svc
  namespace: pihole
spec:
  selector:
    app: pihole
    statefulset.kubernetes.io/pod-name: pihole-0
  type: LoadBalancer
  ports:
    - name: svc-80-tcp-web
      port: 80
      targetPort: 80
      protocol: TCP
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-udp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
  type: LoadBalancer
  ports:
    - name: svc-53-udp-dns
      port: 53
      targetPort: 53
      protocol: UDP
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-tcp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
  type: LoadBalancer
  ports:
    - name: svc-53-tcp-dns
      port: 53
      targetPort: 53
      protocol: TCP

These are the services for the UI and for DNS. Of note, we are using the same laod balancer IP for the TCP and UDP DNS services. I used load balancer for the web UI as well (instead of using ClusterIP and setting up an ingress – maybe that will bite me later).

With this manifest, you can “kubectl apply -f manifest.yaml” and then look for all three of the pods to start up. You should be able to do nslookup/dig commands using the IP of the service as the server to verify that DNS is working, and you can use the IP for the pihole-web-svc service with a path of /admin/ (e.g. http://10.11.12.203/admin/). Use the password you defined in the manifest, to log in and see operation of the Ad Blocker.

Keeping The PI-Hole Pods In Sync

As mentioned in the article, we have three PI-Hole pods (one primary, two secondary), but need to keep the database in sync. To do this, Orbital Sync is used to backup the primary pod’s database, and then restore it to the secondary pods’ databases. Here is the orbital-sync.yaml manifest:

kind: ConfigMap
metadata:
name: orbital-sync-config
namespace: pihole
data:
PRIMARY_HOST_BASE_URL: “http://pihole-0.pihole.pihole.svc.cluster.local”
SECONDARY_HOST_1_BASE_URL: “http://pihole-1.pihole.pihole.svc.cluster.local”
SECONDARY_HOST_2_BASE_URL: “http://pihole-2.pihole.pihole.svc.cluster.local”
INTERVAL_MINUTES: “1”

—

apiVersion: apps/v1
kind: Deployment
metadata:
name: orbital-sync
namespace: pihole
spec:
selector:
matchLabels:
app: orbital-sync
template:
metadata:
labels:
app: orbital-sync
spec:
containers:
– name: orbital-sync
image: mattwebbio/orbital-sync:latest
envFrom:
– configMapRef:
name: orbital-sync-config
env:
– name: “PRIMARY_HOST_PASSWORD”
valueFrom:
secretKeyRef:
name: pihole-password
key: WEBPASSWORD
– name: “SECONDARY_HOST_1_PASSWORD”
valueFrom:
secretKeyRef:
name: pihole-password
key: WEBPASSWORD
– name: “SECONDARY_HOST_2_PASSWORD”
valueFrom:
secretKeyRef:
name: pihole-password
key: WEBPASSWORD

It runs every minute, and uses the secret that was created with the password to access PI-Hole. You can look at the orbital sync pod log to see that it is backing up and restoring the database among the PI-Holes.

Finishing Touches

Under the UI’s local DNS entries section, I manually entered the hostname (with a .home suffix) and IP address for each of my devices on the local network, so that I can access them by :”name.home”.

I did not setup DHCP on PI-Hole, as I used my router’s DHCP configuration.

To use the PI-Hole as the DNS server for all systems in your network, you can specify the IP of the PI-Hole on each host as the only DNS server. If you specify more than one DNS server, based on your OS, it may use the other server(s) at times and bypass the ad-blocking.

For me, I have all my hosts using the router as the primary DNS server. The router is configured to use the PI-Hole as the primary server, and then a public server as the secondary server. Normally, requests would always go to the Pi-Hole, unless for some reason it was down. This was advantageous for two reasons. First, when I had my standalone PI-Hole, if it crashed, there still was DNS resolution. Second, it made it easy to switch from the standalone PI-Hole to the Kubernetes one, by just changing the router configuration.

The only odd thing with this setup, is that when I use my laptop away from the network, my router’s IP is (obviously) not available. I’ve been getting around this, by using the “Location” feature of the MacOS, to setup the “Home” location to use my router’s IP for DNS, and to use a public DNS server for the “Roaming” location.

I guess I could setup so that the ports used for DNS on my domain name (which points to my router using Dynamic DNS), would port forward to the PI-Hole IP, but I didn’t want to expose that to the Internet.

Updating/Upgrading PI-Hole

From the initial version 2024.05.0, I had updated to 2024.07.0. For this update, I did the following…

First, I went to the PI-Hole release page to see what the latest version was. Then, I updated the manifest.yaml to call out the new version (e.g. 2024.07.0) and then deleted the pods, one at a time, so that they would load the newer image version. It worked rather well.

However, recently, I tried to update to the latest 2025.03.0, and found out on the Pi-Hole GitHub page, that Pi-Hole redesigned the app and there were many changes to config variables. Needless to say, I didn’t even look closely at the release notes, and just tried to do the same update method as before. It didn’t work, with pods in crash loops.

Needless to say, I lost all my local DNS settings, and custom blacklist/whitelist entries. It was a learning moment (disaster). I finally got it to work, and will detail the steps here, showing changes I did to the YAML files (all in ~/workspace/kubernetes/pi-hole).

Some steps may not be needed, but this is roughly what I did, taking baby steps to get things working (since everything was broken). I suspect, you may be able to update and then restart a pod at a time, letting it update the database (I think it may), and have not lost anything.

I also think, you probably could shut down everything, and then remove the UUID from the claimref for the PVs, so that when new pods start up, they will reuse the PVs. I ended up with multiple PVs and didn’t realize that I could have possibly reused them. In any case, backup your PVs, before doing this update (kicking myself for not doing that basic step).

I shutdown Orbital Sync, with a “kubectl delete -f orbital-sync.yaml” so there was no syncing going on, and the Pi-Hole with “kubectl delete -f manifest.yaml”.

Here is the new Pi-Hole manifest.yaml (shown in chunks), highlighting changed lines:

apiVersion: v1
kind: Namespace
metadata:
  name: pihole

---

No changes to the above.

apiVersion: v1
kind: ConfigMap
metadata:
name: pihole-configmap
namespace: pihole
data:
TZ: “America/New_York”
FTLCONF_dns_upstreams: “208.67.220.220;208.67.222.222”
FTLCONF_dns_listeningMode: “all”
—

The PIHOLE_DNS_ was renamed to FTLCONF_dns_upstreams. I also added the listening mode to all. The documentation mentions that if running in Docker’s bridge mode to set this to “all”. I’m guessing it is needed for Kubernetes as well (maybe not?).


apiVersion: v1
kind: Secret
metadata:
name: pihole-password
namespace: pihole
type: Opaque
data:
FTLCONF_webserver_api_password: <PUT_BASE64_PASSWORD_HERE> # Base64 encoded

---

The WEBPASSWORD config has been renamed to FTLCONF_webserver_api_password. You can use the same value (base64 encoded) as was done before.


apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: pihole
  namespace: pihole
spec:
  selector:
    matchLabels:
      app: pihole
  serviceName: pihole
  replicas: 3
  template:
    metadata:
      labels:
        app: pihole
    spec:
      containers:
        - name: pihole
          image: pihole/pihole:2025.03.0
          securityContext:
            capabilities:
              add: ["SYS_TIME"]
          envFrom:
            - configMapRef:
                name: pihole-configmap
            - secretRef:
                name: pihole-password
          ports:
            - name: svc-80-tcp-web
              containerPort: 80
              protocol: TCP
            - name: svc-53-udp-dns
              containerPort: 53
              protocol: UDP
            - name: svc-53-tcp-dns
              containerPort: 53
              protocol: TCP
          livenessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            httpGet:
              port: svc-80-tcp-web
              path: /admin/login
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 10
          volumeMounts:
            - name: pihole-etc-pihole
              mountPath: /etc/pihole
            - name: pihole-etc-dnsmasq
              mountPath: /etc/dnsmasq.d
  volumeClaimTemplates:
    - metadata:
        name: pihole-etc-pihole
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi
    - metadata:
        name: pihole-etc-dnsmasq
        namespace: pihole
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: 3Gi

---

Besides updating the version of Pi-Hole, I added the SYS_TIME capability to prevent a warning. There is another, SYS_NICE, but I didn’t touch that, as I figured that the pod is only running Pi-Hole, so we don’t have to reduce the CPU allowed.

Another important thing was the liveness and readiness checks where the old code would check that the web interface was accessible, by accessing the login page at /admin/login.php. This file has been changed to login.lp, so I changed the path to /admin/login and let the web server resolve to the correct file. Without this change, the pod would fail the liveness and readiness checks and fail to come up.

apiVersion: v1
kind: Service
metadata:
  name: pihole
  namespace: pihole
  labels:
    app: pihole
spec:
  clusterIP: None
  selector:
    app: pihole

---

kind: Service
apiVersion: v1
metadata:
  name: pihole-web-svc
  namespace: pihole
spec:
  selector:
    app: pihole
    statefulset.kubernetes.io/pod-name: pihole-0
  type: LoadBalancer
  ports:
    - name: svc-80-tcp-web
      port: 80
      targetPort: 80
      protocol: TCP

---

kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-udp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
type: LoadBalancer
ports:
  - name: svc-53-udp-dns
    port: 53
    targetPort: 53
    protocol: UDP
---
kind: Service
apiVersion: v1
metadata:
  name: pihole-dns-tcp-svc
  namespace: pihole
  annotations:
    metallb.universe.tf/allow-shared-ip: "pihole"
spec:
  selector:
    app: pihole
  type: LoadBalancer
  ports:
    - name: svc-53-tcp-dns
      port: 53
      targetPort: 53
      protocol: TCP

There were no other changes to the rest of the manifest.yaml file.

In the orbital-sync.yaml file, the three instances of WEBPASSWORD were changed to FTLCONF_webserver_api_password:

apiVersion: v1
kind: ConfigMap
metadata:
  name: orbital-sync-config
  namespace: pihole
data:
  PRIMARY_HOST_BASE_URL: "http://pihole-0.pihole.pihole.svc.cluster.local"
  SECONDARY_HOST_1_BASE_URL: "http://pihole-1.pihole.pihole.svc.cluster.local"
  SECONDARY_HOST_2_BASE_URL: "http://pihole-2.pihole.pihole.svc.cluster.local"
  INTERVAL_MINUTES: "1"

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: orbital-sync
  namespace: pihole
spec:
  selector:
    matchLabels:
      app: orbital-sync
  template:
    metadata:
      labels:
        app: orbital-sync
  spec:
    containers:
    - name: orbital-sync
      image: mattwebbio/orbital-sync:latest
      envFrom:
        - configMapRef:
            name: orbital-sync-config
      env:
        - name: "PRIMARY_HOST_PASSWORD"
          valueFrom:
            secretKeyRef:
              name: pihole-password
              key: FTLCONF_webserver_api_password
        - name: "SECONDARY_HOST_1_PASSWORD"
          valueFrom:
            secretKeyRef:
              name: pihole-password
              key: FTLCONF_webserver_api_password
        - name: "SECONDARY_HOST_2_PASSWORD"
          valueFrom:
            secretKeyRef:
              name: pihole-password
              key: FTLCONF_webserver_api_password

With these changes, I applied manifest.yaml and checked the log to see that the pod is fully up. From a browser, I went to the public IP and verified that I could log into Pi-Hole. At this point you can add local DNS domains, and whitelist/blacklist entries.

I then modified manifest.yaml to increase the replicas to three, verified that they all are running, and then applied orbital sync and made sure it was running. Everything looked good. In my case, I did not reuse the PVs that had been created, so I deleted all detached PVs. There should be six PVs for the three running PI-Holes and their DNSMASQ volumes. Again, you could try attaching to the existing PVs vs creating new ones, by deleting the UUID in the claimref of the PVs that have been released, but remain, before starting up the Pi-Hole pods.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

June 3

Kubespray Add-Ons

In Part IV of the PI cluster series, I mention how to setup Kubespray to create a cluster. You can look there for how to setup your inventory, and the basic configuration settings for Kubespray. In that series, I mention about how to add more features, after the cluster is up. Some are pretty simple, and some require some manual steps to get everything set up.

However, you can also have Kubespray install some “add-on” components, as part of the cluster bring-up. In many cases, this makes the process more automated, and “easier”, but it does have some limitations.

First, you will be using the version and configuration that is defined in Kubespray’s Ansible templates and roles. Granted, you can always customize Kubespray, with the caveat of having to keep your changes up to date with upstream.

Second, removing the feature on a running cluster can be more difficult. You’ll have to manually delete all the resources (e.g. daemonsets, deployments, etc.), of which, some may be hard to identify (CRDs, RoleBindings, secrets, etc). Looking in the Kubespray templates may provide some insight into the resources that were created.

You may be able to find manifests for the feature and version from the feature’s repo, and pull them and use “kubectl delete” on the manifests to remove the feature. Just note, that there may be some differences, between what is in the repo manifests for a version, and what are in the manifests that Kubespray used. I haven’t tried it, but if there is a Helm based version of the feature that matches what Kubespray installed, you might be able to “helm install” the already installed feature, and then “helm delete”?

Kube VIP (Virtual IP and Service Load Balancing)

To add Kube-VIP as part of the Kubespray add-on, I did these steps, before creating the cluster.

First, I modified the inventory, so that etcd would run on each of my control-plane nodes (versus a mix of control-plane and worker nodes).

Second, in inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml, I enabled strict ARP, used IPVS (instead of iptables) for kube-proxy, and excluded my local network from kube-proxy (so that kube-proxy would not clear entries that were created by IPVS):

kube_proxy_strict_arp: true
kube_proxy_mode: ipvs
kube_proxy_exclude_cidrs: ["CIDR_FOR_MY_LOCAL_NETWORK",]

Third, I enabled kube-vip in inventory/mycluster/group_vars/k8s_cluster/addons.yml. I turned on ARP (vs BGP), and setup to do VIP for control plane and specified the API to use. I also selected to do load balancing of that VIP. I did not enable load-balancing for services, but that is an option too:

kube_vip_enabled: true
kube_vip_arp_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_address: VIP_ON_MY_NETWORK
loadbalancer_apiserver:
 address: "{{ kube_vip_address }}"
 port: 6443
kube_vip_lb_enable: true

# kube_vip_services_enabled: false
# kube_vip_enableServicesElection: true

I had tried this out, but found that the kube-vip container was showing connection refused and permission problems, so leader election was not working for the virtual IP chosen.

I finally found a bug report on the issue when using Kubernetes 1.29 with kube-vip. Essentially, when the first control plane node is starting up, the admin.conf file used for kubectl commands, does not have the permissions needed for kube-vip at that point in the process. The kube-vip team needs to create their own config file for kubectl. In the meantime, the bug report is trying a work-around fix in Kubespray, by switching to the super-admin.conf file, which will have the needed permissions at that point in time. However, the patch they have does not work. I did more hacking to it, and have this change, which works:

diff --git a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
index f7b04a624..b5acdac8c 100644
--- a/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
+++ b/roles/kubernetes/node/tasks/loadbalancer/kube-vip.yml
@@ -6,6 +6,10 @@
- kube_proxy_mode == 'ipvs' and not kube_proxy_strict_arp
- kube_vip_arp_enabled

+- name: Kube-vip | Check if first control plane
+ set_fact:
+ is_first_control_plane: "{{ inventory_hostname == groups['kube_control_plane'] | first }}"
+
- name: Kube-vip | Write static pod
template:
src: manifests/kube-vip.manifest.j2
diff --git a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2 b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
index 11a971e93..7b59bca4c 100644
--- a/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
+++ b/roles/kubernetes/node/templates/manifests/kube-vip.manifest.j2
@@ -119,6 +119,6 @@ spec:
hostNetwork: true
volumes:
- hostPath:
- path: /etc/kubernetes/admin.conf
+ path: /etc/kubernetes/{% if is_first_control_plane %}super-{% endif %}admin.conf
name: kubeconfig
status: {}

UPDATE: There is a fix that is in progress, which is a streamlined version of my change. Once that is merged, no patch will be needed.

With this change to Kubespray, I did a cluster create:

cd ~/workspace/kubernetes/picluster
poetry shell
cd ../kubespray
ansible-playbook -i ../picluster/inventory/mycluster/hosts.yaml -u ${USER} -b -vvv --private-key=~/.ssh/id_ed25519 cluster.yml

Everything was up and running, but kubectl commands were failing on my Mac, because the ~/.kube/config file uses the FQDN https://lb-apiserver.kubernetes.local:6443 for the server, and there is no DNS info on my Mac for this host name (it does work on the nodes, however). The simple fix was to repace the FQDN with the IP address selected for the VIP.

Now, all requests to that IP are redirected to the node that is currently running the API server. If the node is not available, IPVS will redirect to another control plane node.

Update 01/07/2025: The fix has been released and no patching is needed. However, I did find one problem with kube-vip. If a node is rebooted (intentionally or after a power outage or crash), the node will fail to become “Ready”. The issue is that the node will try to register with the cluster, but is using the lb-apiserver.kubernetes.local name, and because the Kubernetes DNS server is not available on the node yet, it cannot resolve that name to the VIP that is configured.

The solution is to create a mapping of VIP to lb-apiserver.kubernetes.local in /etc/hosts. However, this file is automatically created (and recreated on reboot), so the mapping needs to be added to the template file /etc/cloud/templates/hosts.debian.tmpl. This should be done on each node, and ideally, one probably wants to create a playbook to automatically do this on every node in the inventory. I haven’t done it yet, so it is an exercise for the reader. 🙂

MetalLB Load Balancer

Instead of setting this up after the cluster was created, you can opt to let Kubespray do this as well. In the inventory/mycluster/group_vars/k8s_cluster/addons.yml, I did these changes:

metallb_enabled: true
metallb_speaker_enabled: "{{ metallb_enabled }}"
metallb_namespace: "metallb-system"


metallb_protocol: "layer2"
metallb_config:

 address_pools:
 primary:
 ip_range:
 - FIRST_IP_IN_RANGE-LAST_IP_IN_RANGE
 auto_assign: true
 layer2:
 - primary

Besides enabling the feature, I made sure that it was using layer two vs layer three, and under the config, setup an address pool with the range of IPs on my local network that I wanted to use for load balanced IPs. You can specify as a CIDR, if desired.

Now, when the cluster is created with Kubespray, MetalLB will be set up and you can change pods/services to use the networking type “LoadBalancer” and an IP from the pool will be assigned.

As mentioned in the disclaimer above, with the version of Kubespray I have, it installs MetalLB 0.13.9. I could have overridden the ‘metallb_version’ to a newer version, like ‘v0.14.5’, but the templates for MetalLB in Kubespray are using the older v0.11.0 kubebuilder image in several places. To get the same versioning as used when installing MetalLB via Helm, I would have to modify the templates to specify v0.14.0. I did see other configuration differences with the CRDs used in the Helm version, like setting the tls_min_version argument and not setting some priority nor priorityClassName configurations.

NGINX Ingress

This one is pretty easy to enable, by changing this setting in inventory/mycluster/group_vars/k8s_cluster/addons.yml:

ingress_nginx_enabled: true

When the cluster comes up, there will be an ingress daemonset, which created ingress controller pods on each node, and a NGINX ingress service with an IP from the MetalLB address pool.

There are example YAML files in the MetalLB/NGINX Ingress post, that will allow you to create pods and services, and an ingress resource that allows access via path prefixes.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off

June 1

High Availability?

OK, so I have a cluster with three control plane nodes and four worker nodes (currently). However, if I shutdown the control plane node that is hosting the API server, I lose API access. 🙁

I’ve been digging around and it looks like kube-vip would be a good solution, as it allows me to create a virtual IP for the API server, and then does load balancing and leader election between the control plane nodes so that the failure of the node providing the API can switch to another control plane node. In addition, kube-vip can do load balancing between services (I’m not sure if that makes metalLB redundant).

Before installing kube-vip, I needed to change the cluster configuration. I changed the inventory, so that etcd is running ONLY on the control-plane nodes (and not a mix of control plane and worker nodes).

Next, I made these changes to inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml:

kube_proxy_mode: ipvs
kube_proxy_strict_arp: true
kube_proxy_exclude_cidrs: ["CIDR_OF_LOCAL_NETWORK",]

This had kube-proxy also using IPVS (versus iptables), and running in strict ARP mode (needed for kube-vip). Lastly, to prevent kube-proxy from clearing IPVS settings made by kube-vip, the local network IPs must be excluded. With those changes, I re-created a cluster, and was ready to install kube-vip…

There was Medium article by Chris Kirby to use a Helm install of kube-vip for HA. It used an older version of kube-vip (0.6.4) and used value.yaml settings for K3s. I added the Helm repo for kube-vip, and pulled the values.yaml file to be able to customize it:

mkdir ~/workspace/kubernetes/kube-vip
cd ~/workspace/kubernetes/kube-vip
helm repo add kube-vip https://kube-vip.github.io/helm-charts
helm repo update

wget https://raw.githubusercontent.com/kube-vip/helm-charts/main/charts/kube-vip/values.yaml

Here are the changes I made to the values.yaml, saving it as values-revised.yaml:

6c6
< pullPolicy: IfNotPresent
---
> pullPolicy: Always
8c8
< # tag: "v0.7.0"
---
> tag: "v0.8.0"
11c11
< address: ""
---
> address: "VIP_ON_LOCAL_NETWORK"
20c20
< cp_enable: "false"
---
> cp_enable: "true"
22,23c22,24
< svc_election: "false"
< vip_leaderelection: "false"
---
> svc_election: "true"
> vip_leaderelection: "true"
> vip_leaseduration: "5"
61c62
< name: ""
---
> name: "kube-vip"
86c87,88
< nodeSelector: {}
---
> nodeSelector:
> node-role.kubernetes.io/control-plane: ""
91a94,97
> - effect: NoExecute
> key: node-role.kubernetes.io/control-plane
> operator: Exists
>
93,101c99,104
< # nodeAffinity:
< # requiredDuringSchedulingIgnoredDuringExecution:
< # nodeSelectorTerms:
< # - matchExpressions:
< # - key: node-role.kubernetes.io/master
< # operator: Exists
< # - matchExpressions:
< # - key: node-role.kubernetes.io/control-plane
< # operator: Exists
---
> nodeAffinity:
> requiredDuringSchedulingIgnoredDuringExecution:
> nodeSelectorTerms:
> - matchExpressions:
> - key: node-role.kubernetes.io/control-plane
> operator: Exists

Besides using a newer kube-vip version, this enabled load balancing for control plane nodes and services, selects nodes that have the control-plane attribute (but not a value, like the article), and sets the node affinity.

With this custom values file, I could do the install:

helm install my-kube-vip kube-vip/kube-vip -n kube-system -f values-revised.yaml

With this, all the kube-vip pods were up, and the daemonset showed three desired, current, and ready. However, when I changed the server IP to my VIP in ~/.kube/config and tried kubectl commands, they failed saying that there was a x509 certificate for each of the control plane nodes, and a cluster IP, but not for the VIP I’m using.

This can be fixed by re-generating the certificates on every control plane node:

sudo su
cd
kubectl -n kube-system get configmap kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' --insecure-skip-tls-verify > kubeadm.yaml

mv /etc/kubernetes/pki/apiserver.{crt,key} ~
kubeadm init phase certs apiserver --config kubeadm.yaml

In the output, I saw the IPs of the control plane nodes AND the VIP I defined. Next, the kube-apiserver container needs to be stopped and removed, so that a new one is started.

crictl ps | grep kube-apiserver
crictl stop <ID-of-apiserver>
crictl rm <ID-of-apiserver>

Now, kubectl commands using the VIP will be redirected to the control plane node running the API server, and if that node is unavailable, the requests will be redirected to another control plane node. You can see that by doing arping of the VIP and, when the leadership changes, the MAC displayed will change.

Kind of involved, but this works!

One Slight Complication…

I did have some problems, when playing with HA for the API. I had rebooted the control plane node that was actively providing the API. Kube-vip did its job, and IPVS redirected API requests to another control plane node that was “elected” as the new leader. All good so far.

However, when that control plane node came back up, it would appear in the “kubectl get node” output, but showed as “NotReady”, and it never seemed to become ready. It appeared that the network was not ready, and the calico-node pod was showing an error. I played around a bit, but couldn’t seem to clear the error.

One thing I did was a Kubespray upgrade-cluster.yml with the –limit argument, specifying the node and one of the other control plane nodes (so that control plane “facts” were specified). The kube-vip pod for the node was still failing with a connection refused error. On the node, I stopped/removed the kube-apiserver container and then kube-vip container, and then kube-vip no longer had any errors.

The only thing was that ipvsadm on the node, did not show a load balancing entry for the VIP, and the other two control plane nodes only had their IPs in the load balancing entry for the VIP. I didn’t try rebooting another control-plane node.

Category: bare-metal, Kubernetes, Raspberry PI | Comments Off