Major Viewmaster Refactoring
With the Viewmaster movie library, I had noticed that there are over 600 titles, and when displaying in “detail” mode, it was loading all those cover URLs. The thought was that maybe I could save the cover images locally, when movies are entered into the database, and I could load the existing ones via a migration.
I created a new table for the cover URLs and files, and realized that there was quite a bit of duplication in the database, and thought that I should try to normalize the data. Through the process of making new tables and trying to migrate, I uncovered several other issues.
Long story short, I ended up undoing everything, and then designing a new table with migration to address the issues and to try to minimize duplicate information in the database. In summary, these are the concerns/issues I found out in the process:
- IMDB information was duplicated from both TV movie series, multi-disk sets that happened to have the same IMDB info, and single movies where I had multiple formats (DVD, BR, 4K). I wanted to separate out the “shared” IMDB info to reduce duplication.
- For the TV series and multi-disk sets, the recorded duration in IMDB was wrong, w.r.t. each disk (e.g. duration, release date), and often the titles needed to be different (indicating the season number or specific disk). This meant that some IMDB info needed to be “overridden”.
- I found a few movies where the IMDB did not have a cover URL, so I needed to create a dummy cover that could be used. Likewise, I hit one specialty disk, where there was no IMDB entry at all.
- The IMDB would provide multiple genres for most movies, and my database was setup for just having one. In addition, some of the genres were spelled differently (e.g. SCIENCE FICTION vs SCI-FI, WAR vs MILITARY) and some were not in my lists of genres (e.g. SPORTS). Regardless of what I chose, I would like to know what choices were recommended by IMDB.
- Since IMDB was shared among movies, I needed to remove the database entry and the cover file, when the IMDB info was no longer used.
- By the end of the process, I did notice a few problems with performance, so I’ll have some follow-up work to try to fix that.
The Plan
Since there were lots of changes, I planned on bumping the version from 0.2.3 to 0.3.0. Here is the overall pan of attack that I had.
- Define a new table (ImdbInfo) that contained just the IMDB information that I did not expect to change, even on shared disks. The IMDB number, plot, actors, directors, cover URL and file.
- Because movies could override some IMDB info, I wanted to have the “original” stored in the new table, but keep modifiable copies in the Movie table. This includes duration, release date, rating, and title.
- The Movie table would still have disk specific info, like format, aspect ratio, audio, cost, good/bad indication, collection name (if applicable), and paid/gift indication.
- The Movie table would still have the selected genre (I had called it ‘category’), but the ImdbInfo table would contain the recommended genres for the IMDB title.
- As part of the migration process to add the new table, I would also download and save any cover files to the Persistent Volume used for static files.
You can clone the v0.3.0 code from GitHub to see all the changes from v0.2.3, and latest content.
Table Definition
Here is the new table definition for the IMDB information:
class ImdbInfo(models.Model): """IMDB information for a movie (or series of movies).""" title_name = models.CharField( max_length=60, help_text="Up to 60 characters for title. May be overridden." ) release_date = models.IntegerField( help_text="Four digit year of release. May be overridden." ) genres = models.CharField(help_text="List of genres applicable to the movie.") mpaa_rating = models.CharField( max_length=5, default="?", choices=RATING_CHOICES, help_text="Select the MPAA rating. May be overridden.", ) run_time = models.TimeField( help_text="Duration in hh:mm format. May be overridden." ) # These will be common to every movie with this IMDB # identifier = models.CharField( max_length=20, unique=True, help_text="IMDB movie ID." ) plot = models.CharField(blank=True, default="", help_text="Plot summary.") actors = models.CharField(blank=True, default="", help_text="Top cast.") directors = models.CharField(blank=True, default="", help_text="Director(s).") cover_url = models.URLField( blank=True, default="", help_text="URL where poster image is located." ) cover_file = models.ImageField( blank=True, null=True, upload_to="covers", storage=FileSystemStorage(allow_overwrite=True), )
Some important things to note here. First, the fields that will remain in the Movie table, use different names. This is because multiple forms will be used at movie creation/edit and we need to be able to uniquely address the fields (there is no discriminator based on the form. The sizes and types of the fields are the same.
Second, for the cover file, I selected file system storage to allow overwriting. Otherwise, if you save a cover file with the same name, it will create a new file with a suffix. I saw that when first dealing with multiple disk sets, before I resolved how to ensure only one IMDB for them.
Migration
In the several failed attempts at doing this change, I realized that there were several movie genre types that I needed to translated to what I had, and I wanted to add the “SPORTS” genre. Before creating the new table definition, I modified the CATEGORY_CHOICES dict and then ran ‘python manage.py makemigrations’ command to create the migration file to update the table. I ran “migrate” to affect the change to the database (no change to existing entries).
With that out of the way, I then build the needed new table definition and ran “makemigrations” again to create a file to add that table, and alter the Movie table to have a ForeignKey to the new ImdbInfo table. Then, I created custom code in the migration file, to do data migration. This was added as another command, after the table changes were done:
migrations.RunPython(copy_imdb_info, migrations.RunPython.noop),
On a “migrate” it would run the code in copy_imdb_info(), and in a reverse migration, it wouldn’t do anything, and would proceed to remove the new table and the added ForeignKey in the Movie table.
For copy_imdb_info(), the code cycles through all the entries in the current Movie table.
For each movie, the IMDB ID would be identified, and if this was the first time encountered, a new ImdbInfo entry created. All the data from the entry would be obtained via API request (versus copying from the Movie, which may have customized info). Existing extraction functions were used to obtain the info, and the cover file would be stored locally. Lastly, the ID of the new or existing ImdbInfo entry would be stored in the Movie table’s new ForeignKey field, imdb_info.
For the movie genres, I had updated the list of genres, based on some new ones that I thought I’d want. Then, for the list of genres provided by IMDB (a comma separated string), I used a new filter_genres() function to extract each one, convert them to ones that matched the list that I had (renaming/substituting as needed), and then would re-build a comma separated string of the genres for storage in the new table.
Other Changes
With the database migrated (and we still have the movies with original data), I proceeded to change the code to use the new tables.
For the movie listing, the “show details” option now would show some information from the Movie table and some from the ImdbInfo table. I created a custom template tag that would return the URI for the cover. This would use one of the following, in order… the cover file, if available, the cover URL, if available, a static image indicating there was no cover, if the cover was not valid, or a static image indicating there was no IMDB info for the movie. It would display a red line around the cover, if it was using the cover via a URL and not local file.
For create/edit of movies, we would now have two forms. One for the movie, and one for the IMDB info. Several of the “shared” fields of the IMDB info, would be hidden fields, so that they are passed along.
On “GET” portion of create/edit, if there was an IMDB ID selected (via find, for create, or from the movie for edit), we would query the IMDB and populate the ImdbInfo form with the data. For the genres, if there is IMDB info, we build a list of those entries, followed by all the rest of the possible genres, separated by a dashed line. If there is no IMDB info, the full list of possible genres is displayed. If this was an edit of a movie, the previously selected value would be the default genre.
On “POST” portion of create/edit, there are four phases. First, we the get the movie form, movie identifier (or “0” if new). Second, we check to see if the user has requested to “clear” the IMDB info, otherwise we try to get the IMDB identifer (or “unknown” if there is none). Third, we get the ImdbInfo form data (could be an existing entry, new entry, or none).
Lastly, we validate and save. There are several cases that can occur, and different actions required:
Case |
Action |
There is no IMDB identifier specified. | If there was an existing movie, with IMDB info, note the entry |
New IMDB info and info is valid | Save new ImdbInfo entry. |
New IMDB info and info is invalid | Show form again, along with field errors |
Existing IMDB info, but no cover file | Save the cover file locally |
Existing IMDB info with cover file | No additional actions |
Movie info is not valid | Show form again, along with field errors |
The last step will be to save the movie, associating it with the corresponding IMDB info (or None), and then, if there was a previous IMDB info, check to see if it is referenced any more, and if not, delete the entry and remove the cover file.
In the GET processing and the form template, any differences in the shared IMDB settings that a movie may have overridden, are noted and displayed in the form, so that one can see when a movie has modified what was in the IMDB info. The field will have a red box border, to show that it was overridden. This applies to the release date, rating, and duration. The title is not highlighted.
There was an issue with displaying movies by genre/date/title, and using case insensitive filtering, so they were corrected as well.
One thing to keep in mind is that, when switching between running in development mode and running in production mode, we are sharing the same database, so it is up to date. However, when creating cover files, they are stored in the PVC for production and on my development machine for development and can be out of sync.
I did the migration on my development machine, and did most testing there, and then copied the cover files to the running pod. For example, with a current app pod viewmaster-579bd8d869-ph4nq, I would do:
kubectl cp public/media/covers/* \
viewmaster/viewmaster-579bd8d869-ph4nq:/vol/web/media/covers/
The “viewmaster/” prefix on the pod name is the namespace.
Deployment Changes
During this process, I was having problems serving up the cover files, which were stored in the MEDIA_ROOT (an area in the project tree for development, and in /vol/web/ for the production pod).
I had done some AI queries, and found out a few things. First, I had MEDIA_URL set to “media/” and it should have been “/media/” so that it is an absolute path. There was the same issue with the current STATIC_URL.
Second, when running in production (debug=False), Django does have the static clause for URL patterns that can be used for mapping URIs to the media area. It also sounded like Django and gunicorn (used for production pod), are not very good at serving up static files and that one should use NGINX (or equivalent), which is better suited.
I used the AI prompting to guide me through how to modify my deployment so that the pod would now have both my Django container AND an NGINX container. Requests coming in, for content, would get dispatched to gunicorn and the Django app, and requests for static/media files would be handled directly by NGINX.
This required modifying the deployment YAML to include NGINX and reference the same PV for config and static files, change the service to use port 80 and redirect to port 8080 for gunicorn, and providing a NGINX config YAML. I ended up breaking up django.yaml into parts, of which, to update my deployment, I would now run:
kubectl apply -f nginx-config.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
This is assuming that the “viewmaster” namespace was already created and I used it to modify the deployment. Obviously, I had rebuilt the app, with version changed to 0.3.0 (and updated the hard-coded version in movie-list.html), and pushed up the change.
A side (benefit) was that the app is accessible in production by the LoadBalancer IP, without using a port.
There was a pvc.yaml created as well, since the PVC creation was removed from the deployment.yaml. I already have the PVC, so it didn’t need to be used.
Of note, the deployment.yaml will do a workingDir command to change directory, and then invoke gunicorn with the desired arguments. The Dockerfile still does that same operation, for the case when we are in development mode and running locally (using Django’s server).
In urls.py, the following was added to the end of urlpatterns…
] + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
This allows the media area to be available, when running in development mode with debug=True.
I also tweaked the helper scripts as I needed to specify to use the “app” container, and not the “nginx” container.
Summary
I’ve been able to verify this working both in development and with production. Making sure that cover files are generated on create (or edit, if missing), and deleted when no longer used.
TODOs
Here are the things I need to work on…
- Under the create/edit form, show cover file, if available. Currently, just fetches URL.
- Do a 0.3.1 release that has a migration to remove the unused fields in the Movie table.
- Measure the page loading time (using Lighthouse under Chrome – I think it is showing 6-7 seconds currently).
- Working on page load time improvements. AI has suggested caching, lazy loading, turning on HTTP/2 as big hitters. Can also consider the Intersection Observer as well. Measure after changes.
- Cleanup, as there are some warnings reported with the code (like not specifying a DOCTYPE).
- Turn on HTTPS and add OTP to be able to access remotely.
- Use some of these techniques for other web apps that I have made.