First Impressions: Proxmox VE vs. TrueNAS SCALE

I’ve spent a few hours each on Proxmox VE and TrueNAS SCALE, the latter of which is now hosting both my personal media collection and a VM-based Plex Media Server view of it. Proxmox and TrueNAS are both very powerful pieces of software with a long list of features, a few hours of exposure barely scratched the surface. They share a great deal of similarities: they are both based on Linux, they both use KVM hypervisor for their virtual machines, and they both support software-based data redundancy across multiple storage drives. (No expensive RAID controllers necessary.) But I have found a distinct difference between them I can sum up as this: TrueNAS SCALE is a great storage server with some application server capabilities, and Proxmox VE is a great application server with an optional storage server component.

After I got past a few of my beginner’s mistakes, it was very quick and easy to spin up a virtual machine with Proxmox interface. I felt I had more options at my disposal, and I thought they were better organized than their TrueNAS counterparts. Proxmox also offers more granular monitoring of virtual machine resource usage. With per-VM views of CPU, memory, and network traffic. My pet feature USB passthrough allows adding/removing USB hardware from a virtual machine live at runtime under Proxmox. Doing the same under TrueNAS requires rebooting the VM before USB hardware changes are reflected. Another problem I experienced under TrueNAS was that my VM couldn’t see the TrueNAS server itself on the network. (“No route to host”) I worked around it by using another available Ethernet port on my server, but such an extra port isn’t always available. Proxmox VM could see their Proxmox host just fine over a single shared Ethernet port.

I was able to evaluate Proxmox on a machine with a single large SSD that hosts both Proxmox itself and virtual machines. In contrast, TrueNAS requires a dedicated system drive and additional separate data storage drives. This reflects its NAS focus (you wouldn’t want to commingle storage and operating data) but it does mean evaluating TrueNAS requires a commitment at least two storage devices versus just one for Proxmox.

But storage is easy and (based on years with TrueNAS CORE) dependable and reliable with redundant hardware. This is their bread-and-butter and it works well. In contrast, data storage in Proxmox is an optional component provided via Ceph. I’ve never played with Ceph myself but, based on skimming that documentation, there’s a steeper learning curve than setting up redundant storage with TrueNAS. Ceph seems to be more powerful and can scale up to larger deployments, but that means more complexity at the small end before I can get a minimally viable setup suitable for home use.

My current plan is to skip Ceph and continue using TrueNAS SCALE for my data storage needs. I will also use its KVM hypervisor to run a few long-running virtual machines hosting associated services. (Like Plex Media Server for my media collection.) For quick experimental virtual machines who I expect to have a short lifespan, or for those that require Proxmox specific feature (add/remove USB hardware live, granular resource monitoring, etc) I’ll run them on my Proxmox evaluation box. Over the next few months/years, I expect to better able evaluate which tool is better for which job.

Plex Media Server in TrueNAS SCALE Virtual Machine

After trying and failing to use the default method to run Plex Media Server via TrueNAS SCALE’s “App” catalog, I’m falling back to a manual route: spinning up a virtual machine with Ubuntu Server 22.04 to run Plex Media Server with my own preferred settings. I suppose I could learn about Helm charts so I could write one to run Plex my way, but at the moment I’m not too motivated to do so.

Recently I had been running Plex in a Docker container, which resolved my old gripes about FreeNAS plug-ins and FreeBSD freshports versions of Plex falling out of date. Plex developers maintain the container themselves and it gets updated in sync with official releases. A really nifty feature of their Docker container is that it doesn’t really have Plex in it: it has code to download and run Plex. In order to pick up an updated version of Plex, I don’t have to pull a new container. I just have to stop and restart it, and it downloads the latest and starts running it.

One subtlety of running Plex in Docker is the warning I shouldn’t use a mapped network drive for server configuration data. I had thought it would be a good way to keep my Plex database constantly backed up on a TrueNAS ZFS redundant drive array, but I abandoned that plan after reading a scary disclaimer on the Docker repository README: “Note: the underlying filesystem needs to support file locking. This is known to not be default enabled on remote filesystems like NFS, SMB, and many many others. The 9PFS filesystem used by FreeNAS Corral is known to work but the vast majority will result in database corruption. Use a network share at your own risk.

I could install Docker Engine in my virtual machine for Plex, and repeat my configuration, but it seems weird to have nested virtualization mechanisms. (Docker inside KVM.) So this time I will run Docker as a service installed to my virtual machine, installed from Plex-maintained official repository. Migrating my Plex database started by finding the correct directory in both my existing Docker container volume and my new Ubuntu Server virtual machine at “/var/lib/plexmediaserver/“. Copying the files directly was not sufficient (Plex Media Service would fail to launch) because I forgot to update file permissions as well. That was fixed by running “chown -R plex:plex Library” on the library database directory tree.

One unexpected obstacle is that a VM running under TrueNAS SCALE couldn’t see the server itself. Doesn’t matter if I’m trying to use NFS mapping, SSH, or HTTP, the server address is unreachable and times out with the error “No route to host”. I confirmed this is not a general KVM issue, as my Ubuntu Desktop laptop with KVM and Proxmox VE had no such problems. I hypothesize it has something to do with how TrueNAS SCALE configured network bridge for virtual machines, but I don’t know enough Linux networking to know for sure. As a workaround, I could bind my virtual machine to the Realtek chipset Ethernet port integrated on my server’s motherboard. TrueNAS runs on an Intel NIC (network interface card) because FreeNAS didn’t support the Realtek onboard port years ago. Now under TrueNAS SCALE I have access to both ports, so I run the TrueNAS server on the Intel NIC and bind my virtual machines to motherboard onboard Ethernet. Not the most satisfying solution, but it uses what I have on hand and is good enough.

TrueNAS and Plex Media Server

A painless and uneventful TrueNAS migration from FreeBSD-based CORE to their newer Linux-based SCALE is the latest in a long series of successful operations with this line of software. I decided to build a machine for home storage about six years ago and chose TrueNAS (called FreeNAS at the time) and it has worked flawlessly as a storage appliance ever since. There has been no loss of data from software failures, and it has successfully recovered from multiple hard drive failures. Based on my experience, I can heartily recommend TrueNAS for reliable data storage.

In addition to simple and reliable data storage capabilities, FreeNAS/TrueNAS also has an application services side. I’ve had a bumpier relationship with those features. Among the data stored on my TrueNAS box is my personal music collection, and I’ve been using Plex as one of several ways I consume my media. Plex Media Server has thus been my guinea pig for TrueNAS application hosting and finding problems with each. I’ve tried using it as a FreeNAS Plugin (stale and infrequently updated), as a self-managed application inside a FreeBSD jail (less stale but still a significant delay), to a Docker container running inside an Ubuntu virtual machine (timely updates but bhyve hypervisor has problems with Ubuntu) and now that I’ve migrated to TrueNAS SCALE I have new options to try.

Plex Media Server is one of the options under TrueNAS SCALE’s “Apps” menu representing a collection of Helm charts. I have some vague idea this mechanism is related to Kubernetes, but I haven’t invested time into learning the details as I just wanted to use it as a point-and-click user. I tried to create an instance with “Enable Host Path for Plex Data Volume” option pointing to my existing media share. My attempt failed with the following error:

Error: [EINVAL]
chart_release_create.app
VolumeMounts.data.hostPathEnabled.hostPath: Invalid mount path. 
Following service(s) use this path: SMB Share, NFS Share

A bit of web searching found this is expected behavior: in order to avoid any potential problems with conflicting operations, helm charts will verify that each app has exclusive control to all volumes before proceeding. This is a reasonable thing to do for most applications, but unnecessarily cautious for media data on a read-only network share. And furthermore, it is not compatible with my media consumption pattern which requires leaving SMB and NFS sharing running. Thus, I add “Plex Helm Chart” to the running list of TrueNAS application service I’ve tried and failed.

I will now try a different approach: create a virtual machine for running Plex.

Successful TrueNAS CORE to SCALE Migration

I took a quick test drive of Proxmox VE and first impressions looked good. Including a feature that I considered important: USB passthrough (also known as USB device redirection) for virtual machines. It was something I had a chance to try earlier and really liked. Thinking back to that experiment, I remembered my motivation for investigating KVM hypervisor was because I had problems with the bhyve hypervisor of TrueNAS CORE and wanted to try something else.

Since I had TrueNAS already up and running, I looked into switching from TrueNAS CORE to TrueNAS SCALE. The difference between sibling products is that CORE was built on FreeBSD and SCALE was built on Linux. Moving to Linux also meant a change from bhyve hypervisor I’ve had problems with to KVM hypervisor that has worked well. However, when I last looked at TrueNAS SCALE, it was at version 22.02 (“Angelfish”) and it didn’t support USB passthrough. Checking the issues database after my Proxmox test drive, I saw that USB passthrough is now in 22.12 (“Bluefin”). This feature made it compelling enough for me to migrate.

TrueNAS documentation includes a page dedicated to this process: Migrating from TrueNAS CORE to SCALE. I found it interesting that the migration is only supported one-way: CORE to SCALE and not the reverse. This strongly hints that TrueNAS CORE is on its way out, so it is good that I have motivation to move now instead of being forced to later.

I only had a few minor tasks to prepare for the migration, as I hadn’t been using any of the TrueNAS features that would make migration challenging. The upgrade process itself was impressively seamless. The documentation page has paragraphs of information about upgrading using an ISO file or using a “Manual Update” file, but the easiest method was given only a single sentence: “The easiest method is to upgrade from the CORE system UI, but your system must have the CORE 13.0 major release installed to use this method.

I am running CORE 13 and this meant I go to the software updates page where I usually pick up system updates. There is a drop-down box where I can select an update train, and one of the options is to migrate to SCALE. I selected that option, confirmed, and within half an hour I was running TrueNAS SCALE. No fuss, no muss. All of my ZFS data volumes carried over, as did my network shares. That’s all I could ask for in a NAS migration. I was very impressed at how smooth it was.

After I confirmed all of my network shares were working as expected, I created a new virtual machine to test USB passthrough. Results were mixed. The good news is that USB passthrough exists and works. The bad news is that USB device configuration for virtual machines don’t seem to take place immediately. I needed to reboot the virtual machine before that USB device is visible and accessible by the VM.

This is fine for mainstream scenarios like USB license keys that stay plugged in. It doesn’t work so well for ESPHome where I want to plug in an ESP8266/ESP32 microcontroller and update it. Still, it’s better than nothing. Since I rarely perform the USB initial firmware flash, I guess having to reboot the VM isn’t a huge deal. But it does mean I’m going to hold off migrating my Home Assistant OS VM, I’ll migrate something else first to test the waters.

My First Proxmox VM

After using an old laptop to dabble with running virtual machines under KVM hypervisor, I’ve decided to dedicate a computer to virtual machine hosting. The heart of the machine are the CPU, memory, and a M.2 SSD all mounted to a small Mini-ITX mainboard. On these pages, they were formerly the core of Luggable PC Mark II, which was decommissioned and disassembled two years ago. Now it will run Proxmox VE (Virtual Environment) which offers both virtual machine and container hosting managed with a browser-based interface. Built on top of a Debian Linux distribution, Proxmox uses KVM as its virtual machine hypervisor which I’ve used successfully before.

Enterprise Subscription

I downloaded Proxmox VE 7.4, latest as of this writing, and its installation was uneventful. Its setup procedure no more complex than an Ubuntu installation. Once up and running, the first dialog box to greet me was a “You haven’t paid for an Enterprise subscription!” reminder. This warning dialog box would repeat every time I log on to the administration dashboard. Strictly speaking, a subscription is optional for running core features of Proxmox. A fraction of the Enterprise repository features are available by switching to the no-subscription repository, which has a far less comprehensive service level agreement. If I end up loving Proxmox and using it on an ongoing basis, I may choose to subscribe in the future. In the meantime, I have to dismiss this dialog every time I log on, even when I’m on the no-subscription repository. I understand the subscription is what pays the bills for this project so I don’t begrudge them for promoting it. A single dialog box per logon isn’t overly pushy by my standards.

ISO Images for Operating System Installation

My first impression of the administration interface is that it is very tightly packed with information. A big blue “Create VM” button in the upper right corner made it obvious how to create a new virtual machine, but it took some time before I figured out how to install an operating system on it. During VM creation there’s a dialog box for installation media, but I couldn’t upload Ubuntu Server 22.04 ISO on that screen. It took some poking around before I found I need to click on the Proxmox node representing my computer, click on its local storage, and at that point I could upload an ISO. Or, conveniently, I could download from an URL if I didn’t have an ISO to upload. I could even enter the SHA256 checksum to verify integrity of the download! That’s pretty slick. (Once I’ve found it.)

Helpful Help

After an installation ISO was on my Proxmox host, everything else went smoothly. This was helped tremendously by the fact every Proxmox user interface has a link to its corresponding section in online HTML documentation. I’ve learned to like this approach, because it lets me see that information in context of other related information in the same section. In contrast, clicking help in TrueNAS would give me just a short description. If that’s not enough, I’ve got to hit the web and search on my own.

USB Passthrough: Success

Once my virtual machine was up and running, I tested my must-have feature: USB passthrough. While the virtual machine is up and running, I can go into Proxmox interface and add a USB passthrough device. It immediately showed up in the virtual machine as if I had just hot-plugged the USB hardware into a port. Excellent! This brings it to parity with my existing Home Assistant VM setup using Ubuntu + Virtual Machine Manager and ahead of TrueNAS SCALE 22.02 (“Angelfish”) which lacked USB passthrough.

When I looked at TrueNAS SCALE earlier with an eye to running my Home Assistant VM, I found the TrueNAS bug database entry tracking the USB passthrough feature request. Revisiting that item, I saw USB passthrough has since been added to TrueNAS SCALE 22.12 (“Bluefin”). Well, now. That means it’s time for another look at TrueNAS SCALE.

Hello Proxmox Virtual Environment

Last time I played with virtualization, my motivation was to run Home Assistant Operating System (HAOS) within a hypervisor that can reliably reboot my virtual machines. I was successful running HAOS under KVM (kernel-based virtual machine) on an old laptop. A bonus feature of KVM was USB passthrough, allowing a virtual machine to access USB hardware. This allowed ESPHome to perform initial firmware flash. (After that initial flash, ESPHome can update wirelessly, but that first flash must use an USB cable.) Once I had a taste of USB passthrough, it has been promoted from a “bonus” to a “must-have” feature.

I wasn’t up for learning the full suite of command-line tools for managing KVM so I installed Virtual Machine Manager for a friendlier graphical user interface. Once everything was setup for HAOS, it was easy for me to add virtual machines for experiments. Some quick and fleeting, others lasting weeks or months. And when I’m done with the experiment, I could delete those virtual machines just as easily. I could install software within a VM without risk of interference from earlier experiments, because they were isolated in entirely different VMs. I now understand the appeal of having a fleet of disposable virtual machines!

With growing VM use, it was inevitable I’d start running into limitations of an old laptop. I had expected the processor to be the first barrier, as it was a meager Core i5-3320M with two hyperthreaded cores. But I hadn’t been running processor-intensive experiments so that CPU was actually fine. A standard 2.5″ laptop hard drive slot made for easy upgrades in SSD capacity. The biggest barrier turned out to be RAM: there was only 4GB of it, and it doesn’t make much economic sense to buy DDR3 SODIMM to upgrade this old laptop. Not when I already have more capable machines on hand I could allocate to the task.

This laptop screen has only 1388×768 resolution, which was a minor impediment. In its use as KVM host, I only ever have to look at that screen when I bring up Virtual Machine Manager to perform VM housekeeping. (Those I have yet to learn to do remotely with virsh commands over ssh.) For such usage, the screen is serviceable but also cramped. I frequently wished I could manage KVM remotely from my desktop with large monitor.

Now that I’m contemplating setting up a dedicated computer, I decided to try something more task-focused than Ubuntu Desktop + Virtual Machine Manager combination I have been using. My desire to dedicate a computer to host a small number of virtual machines under KVM hypervisor, managed over local network, led me to Proxmox Virtual Environment. I learned about Proxmox VE when an acquaintance posted about setting it up on their machine a few weeks ago. As I read through Promox website I thought “That would be interesting to investigate later.”

It is time.

Notes on Automating Ubuntu Updates

I grew up when computers were major purchases with four digits in the dollar sign. As technology advanced, perfectly capable laptops can be found for three digits. That was a major psychological barrier in my mind, and now I have another adjustment to make: today we can get a full-fledged PC (new/used) for well under a hundred bucks. Affordable enough that we can set up these general-purpose machines for a single specialized role and left alone.

I’ve had a few Raspberry Pi around the house running specialized tasks like OctoPi and TrueNAS replication target, and I’ve always known that I’ve been slacking off on keeping those systems updated. Security researchers and malicious actors are in a never-ending game to one-up each other and it’s important to keep up with security updates. The good news is that Ubuntu distributions come with an automated update mechanism called unattended-upgrades, so many security patches are automatically applied. However, its default settings only cover critical security updates, and sometimes they need a system reboot before taking effect. This is because Ubuntu chose default behavior to ensure they are least disruptive to actively used computers.

But what about task-specific machines that sees infrequent user logins? We can configure unattended-upgrades to be more aggressive. I went searching for more information and found a lot of coverage on this topic. I chose to start with this very old and frequently viewed AskUbuntu thread “How do I enable automatic updates?” The top two answer links lead to “AutomaticSecurityUpdates” page on help.ubuntu.com, and to “Automatic updates” on Ubuntu Server package management documentation. Browsing beyond official Ubuntu resources, I found “How to Install & Configure Unattended-Upgrades on Ubuntu 20.04” on LinuxCapable.com to be a pretty good overview.

For my specific situation, the highlights are:

  • Configuration file is at /etc/apt/apt.conf.d/50unattended-upgrades
  • Look at the Allowed-Origins entry up top. The line that ends with “-security” is active (as expected) and the line that ends with “-updates” is not. Uncomment that line to automatically pick up all updates, not just critical security fixes.
  • In order to pick up fixes that require a reboot, let unattended-upgrades reboot the machine as needed via “Unattended-Upgrade::Automatic-Reboot” to “true“.
  • (Optional) For computers that sleep most of the day, we may need to add an entry in root cron job table (sudo crontab -e) to run /usr/bin/unattended-upgrade at a specified time within the machine’s waking time window.
  • (Optional) There are several lines about automatically cleaning up unused packages and dependencies. Setting them to “true” will reduce chances of filling our disk.
  • Log files are written to directory /var/log/unattended-upgrades

Linux Shell Control of Sleep and Wake

I’ve extracted the 3.5″ SATA HDD from a Seagate Backup+ Hub external USB hard drive and installed it internally into a desktop PC tower case. I configured the PC as a TrueNAS replication target so it will keep a backup copy of data stored on my TrueNAS array. I couldn’t figure how to make it “take over” or “continue” the existing replication set on this disk created from Raspberry Pi, so I created an entirely new ZFS dataset instead. It’s a backup anyway and I have plenty of space.

But replication only happens once a day for a few minutes, and I didn’t want to keep the PC running around the clock. I automated my Raspberry Pi’s power supply via Home Assistant. That complexity was unnecessary for a modern PC as they include low-power sleep mode capability missing from (default) Raspberry Pi. I just need to figure out how to access that capability from the command line, and I found an answer with rtcwake and crontab.

rtcwake

There are many power-saving sleep modes available in the PC ecosystem, not all of which runs seamlessly under Linux as they each require some level of hardware and/or software driver support. Running rtcwake --list-modes is supposed to show what’s applicable to a piece of hardware. However, I found that even though “disk” (hibernate to disk) is listed, my attempt to use it merely caused the system to become unresponsive without going to sleep. (I had to reset the system.) I then tried “mem” (suspend system, keep power only to memory) and that seemed to work as expected. Your mileage will vary depending on hardware. I can tell my computer to sleep until 11:55PM with:

sudo rtcwake --mode mem --date 23:55

hwclock

The command above allowed me to put the computer to sleep, and schedule wake for five minutes before midnight. On my machine, it displayed the target time and went to sleep. But the listed target time was not 23:55! I thought I did something wrong, but after a bit of poking around I realized I didn’t. I wanted 23:55 my local time, and Ubuntu had set up my PC’s internal hardware clock to UTC time zone. The listed target time is in relative to UTC time of hardware clock. To set our current local time zone we run timedatectl. To see current hardware clock we can run this command:

sudo hwclock --show --verbose

I wasn’t surprised that putting the computer to sleep required “sudo” privileges, but I was surprised to see that hwclock needed that privilege as well. Why is reading the hardware clock important to protect? I don’t know. Sure, I can understand setting the clock may require privileges, but reading? timedatectl didn’t require sudo privileges to read. So hwclock‘s requirement was a surprise.

ssh

Another consequence of running rtcwake from a ssh session is that a sleep computer would leave my ssh prompt hanging. It will eventually time out with “broken pipe” but if I want to hurry that along, there’s a special key sequence to terminate a ssh session immediately: Press the <Enter> key, then type ~. (tilde symbol followed by period.)

crontab

But I didn’t really want to run the command manually, anyway. I want to automate that part as well. In order to schedule a job to execute that command at a specific time and interval, I added this command to the cron jobs table. Since I need root privileges to run rtcwake, I had to add this line to root user’s table with “sudo crontab -e“:

10 0 * * * rtcwake --mode mem --date 23:55

The first number is minutes, the next number hours. “10 0” means to run this command ten minutes after midnight, which should be long enough for TrueNAS replication to complete. The three asterisks mean every day of the month, every month, every day of the week. So “10 0 * * *” translates to “ten minutes after midnight every day” putting this PC to sleep until five minutes before midnight. I chose five minutes as it should be more than long enough for the machine to become visible on the network for TrueNAS replication. When this all works as intended (there have been hiccups I haven’t diagnosed yet) this PC, which usually sits unused, would wake up for only fifteen minutes a day instead of wasting power around the clock.

Notes from ZFS Adventures for TrueNAS Replication

My collection of old small SSDs played a game of musical chairs to free up a drive for my TrueNAS replication machine, the process of which was an opportunity for hands-on time with some Linux disk administration tools. Now that I have my system drive up and running on Ubuntu Server 22.04 LTS, it’s time to wade into the land of ZFS again. It’s been long enough that I had to refer to documentation to rediscover what I need to do, so I’m taking down these notes for if when I need to do it again.

Installation

ZFS tools are not installed by default on Ubuntu 22.04. There seems to be two separate packages for ZFS. I don’t understand the tradeoffs between those two options, I chose to sudo apt install zfsutils-linux because that’s what Ubuntu’s ZFS tutorial used.

Creation

Since my drive was already setup to be a replication storage drive, I didn’t have to create a new ZFS pool from scratch. If I did, though, here are the steps (excerpts from the Ubuntu tutorial linked above):

  • Either “fdisk -l” or “lsblk” to list all the storage devices attached to the machine.
  • Find the target device name (example: /dev/sdb) and choose a pool name (example: myzfs)
  • “zpool create myzfs /dev/sdb” would create a new storage pool with a single device. Many ZFS advantages require multiple disks, but for TrueNAS replication I just output to a single drive.

Once a pool exists, we need to create our first dataset on that pool.

  • “zfs create myzfs/myset” to create a dataset “myset” on pool “myzfs”
  • Optional: “zfs set compress=lz4 myzfs/mydataset” to enable LZ4 compression on specified dataset.

Maintenance

  • “zpool scrub myzfs” to check integrity of data on disk. With a single drive it wouldn’t be possible to automatically repair any errors, but at least we would know that problems exist.
  • “zpool export myzfs” is the closest thing I found to “ejecting” a ZFS pool. Ideally, we do this before we move a pool to another machine.
  • “zpool import myzfs” brings an existing ZFS pool onto the system. Ideally this pool had been “export”-ed from the previous machine, but as I found out when my USB enclosure died, this was not strictly required. I was able to import it into my new replication machine. (I don’t know what risks I took when I failed to export.)
  • “zfs list -t snapshot” to show all ZFS snapshots on record.

TrueNAS Replication

The big unknown for me is figuring out permissions for a non-root replication user. So far, I’ve only had luck doing this on root account of the replication target, which is bad for many reasons. But every time I tried to use a non-root account, replication fails with error umount: only root can use "--types" option

  • On TrueNAS: System/SSH Keypairs. “Add” to generate a new pair of private/public key. Copy the public key.
  • On replication target: add that public key to /root/.ssh/authorized_keys
  • On TrueNAS: System/SSH Connections. “Add” to create a new connection. Enter a name and IP address, and select the keypair generated earlier. Click “Discover Remote Host Key” which is our first test to see if SSH is setup correctly.
  • On TrueNAS: Tasks/Replication Tasks. “Add” to create a replication job using the newly created SSH connection to push replication data to the zfs dataset we just created.

Monitor Disk Activity

The problem with an automated task going directly to root is that I couldn’t tell what (if anything) was happening. There are several Linux tools to monitor disk activity. I first tried “iotop” but unhappy with the fact it required admin privileges and that is not considered a bug. (“Please stop opening bugs on this.”) Looking for an alternative, I found this list and decided dstat was the best fit for my needs. It is not installed on Ubuntu Server by default, but I could run sudo apt install pcp to install, followed by dstat -cd --disk-util --disk-tps to see activity level of all disks.

Notes on Linux Disk Tools

I am setting up an old PC as a TrueNAS replication target to back up data on my drive array. Fitting a modern SSD into the box was only part of the challenge, I need an SSD to put in it. This is a problem easily solved with money because I don’t need a big system drive for this task, and we live in an era of 256GB SSDs on sale for under $20.(*) But where’s the fun in that? I already have some old and small SSDs, I just need to do a bit of musical chairs to free one up.

These small drives are running various machines in my hoard of old PC hardware. 64-bit capable machines run Ubuntu LTS and 32-bit only hardware running Raspberry Pi Desktop. Historically they were quite… disposable, in the sense that I usually wipe the system and start fresh whenever I want to repurpose them. This time is different: one of these is currently a print server, turning my old Canon imageCLASS D550 laser printer into a network-connected printer. Getting Canon’s Linux driver up and running on this old printer was a miserable experience. Canon has since updated imageCLASS D550 Linux driver so things might be better now, but I didn’t want to risk repeating that experience. Instead of wiping a disk and starting fresh, I took this as an opportunity to learn and practice Linux disk administration tools.

Clonezilla

My first attempt tried using Clonezilla Live to move my print server from one drive to another. This failed with errors that scrolled by too fast for me to read. I rediscovered the “Scroll Lock” key on my keyboard to pause text scrolling so I could read the errors: partition table information was expected by one stage of the tool but was missing from a file created by an earlier stage of the tool. I have no idea how to resolve that. Time to try something else.

dd

I decided it was long overdue for me to learn and practice using the Linux disk tool dd. My primary reference is Arch Linux Wiki page for dd. It’s a powerful tool with many options, but I didn’t need anything fancy for my introduction. I just wanted to directly copy from one drive to another (larger) drive. To list all of my install storage drives, I knew about fdisk -l but this time I also learned of lsblk which doesn’t require entering the root password before listing all block storage device names and their capacities. Once I figured out the name of the source (/dev/sdc) and the destination (/dev/sde) I could perform a direct copy:

sudo dd if=/dev/sdc of=/dev/sde bs=512K status=progress

The “bs” parameter is “block size” and apparently the ideal value varies depending on hardware capabilities. But it defaults to 512 bytes for historical reasons and that’s apparently far too small for modern hardware. I bumped it up several orders of magnitude to 512 kilobytes without really understanding the tradeoffs involved. “status=progress” prints the occasional status report so I know the process is ongoing, as it can take some time to complete.

gparted

After the successful copy, I wanted to extend the partition so my print server can take advantage of new space. Resizing the partition with Ubuntu’s “disks” app failed with an error message “Unable to satisfy all constraints on the partition.” Fortunately, gparted had no such complaints, and my print server was back up and running with more elbow room.

Back to dd

Before I erase the smaller drive, though, I thought I would try making a disk image backup of it. If Canon driver installation were painless, I would not have bothered. In case of SSD failure, I would replace the drive and reinstall Ubuntu and set up a new print server. But Canon driver installation was painful, and I wanted an image to restore if needed. I went about looking for how to create a disk image and in the Linux world of “everything is a file” I was not too surprised to find it’s a matter of using a file name (~/canonserver.img) instead of device name (/dev/sde) for dd output.

sudo dd if=/dev/sdc of=~/canonserver.img bs=512K status=progress

gzip and xz

But that raw disk image file is rather large, exactly the size of the source drive. (80GB in my case) To compress this data, Arch Linux Wiki page on dd had examples of how to pipe dd output into gzip for compression. Following those direction worked fine, but I noticed Ubuntu’s “disks” app recognized img.xz natively as a compressed disk image file format and not img.gzip. Looking into that xz suffix, I learned xz was a different compression tool analogous to gzip, and I could generate my own img.xz image by piping dd output into xz, which in turn emits its output into a file, with the following command:

sudo dd if=/dev/sdc bs=512K status=progress | xz --compress -9 --block-size=100MiB -T4 > ~/canonserver.img.xz

I used xz parameters “-9” for maximum compression. “-T4” means spinning up four threads to work in parallel, as I was running this on a quad-core processor. “–block-size=100MiB” is how big of a chunk of data each thread receives to work.

A spinning-platter HDD was used as a test output and verified a restoration of this compressed image worked. Now I need to move this file to my TrueNAS array for backup, kind of bringing the project full circle. At 20GB, it is far smaller than the raw 80GB file but still nontrivial to move.

gio

I tried to mount my TrueNAS SMB shares as CIFS but kept running into errors. It would mount and I could read files, I just couldn’t write any. After several failures I started looking for an alternative and found gio.

gio mount --anonymous "smb://servername/sharename"
gio copy --progress ~/canonserver.img.xz "smb://servername/sharename/canonserver.img.xz"

OK, that worked, but what did I just use? This name “gio” is far too generic. My first search hit was a “Cross-Platform GUI for Go” which is definitely wrong. My second hit “Gnome Input/Output” might be correct or at least related. As a beginner this is all very fuzzy, perhaps it’ll get better with practice. For today I have an operating system disk up and running so I can work on my ZFS data storage drive.

Local Development Web Host nginx Docker Container

During the course of Codecademy’s skill path for website publishing, we are given several off-platform assignments. “Off-platform” in this context means we are to build a website on our own using something outside Codecademy’s in-browser learning environment. I decided to put these assignments on my GitHub account, because then it’s easy to publish them via GitHub Pages. (When I decided this, I hadn’t realized GitHub Pages would be the explicit focus for one of the assignments.) But there’s a several-minute delay between pushing git commits and seeing those changes reflected on GitHub Pages. So I also wanted a local development web host for immediate feedback as I work. I decided to try using nginx for this purpose.

Local development web hosting is just about the lightest duty workload possible for web server software, so using nginx is sheer overkill. The renowned speed and response of nginx running high traffic websites is completely wasted serving my single browser. Furthermore, some of nginx performance is due to its high-performance caching system, and I wanted to turn that off as well. Running nginx and not caring about cache is like buying a Toyota Prius and not caring about fuel efficiency. Such is the contradiction of using nginx as a local development web host. I will be making many changes and I want to see their effect immediately. I don’t want to risk looking at results from a stale cached copy.

The reason I’m using this overkill solution (or arguably the wrong tool for the job) is because I hoped it would give me a beginner’s level view of working with nginx. The easy part comes from the fact nginx distributes their code as a Docker container, so I could quickly launch an instance of “nginx:stable-alpine” and play with it. According to the tagging schema described on nginx Docker Hub page, “stable” represents the latest stable release which is fine by me as I don’t need the latest features. And “alpine” refers to a container built on top of Alpine distribution of Linux with a focus on minimal size and complexity.

To disable caching, I copied the default configuration file (/etc/nginx/nginx.conf) out of the nginx container so I could add a few lines to the configuration file. One to turn off nginx server side caching (from nginx documentation) and another to ask browser not to cache (from this StackOverflow post.)

    # Ask server not to cache
    proxy_no_cache $http_pragma $http_authorization;

    # Ask browser not to cache
    add_header 'Cache-Control' 'no-cache, no-store, must-revalidate';

After editing, I will use Docker to map my modified version overriding the default. I don’t think this is best Docker practice, but I’m focused on “easy” right now. I think the “right” way to do this is to build my own docker container on top of the nginx release but after modifying its configuration file. Something like what’s described in this person’s blog post.

I specified the settings I typically use in a Docker Compose file. Now all I need to do is to go into my project directory and run “docker compose up” to have a non-caching local development web host. To double check, I used curl -I (uppercase i) to verify my intended Cache-Control headers have been added.

$ curl -I http://localhost:18080
HTTP/1.1 200 OK
Server: nginx/1.22.1
Date: Wed, 09 Nov 2022 20:07:11 GMT
Content-Type: text/html
Content-Length: 4868
Last-Modified: Wed, 09 Nov 2022 18:28:46 GMT
Connection: keep-alive
ETag: "636bf15e-1304"
Cache-Control: no-cache, no-store, must-revalidate
Accept-Ranges: bytes

Looks good! My modified nginx.conf and my docker-compose.yml files are publicly available from my Codecademy HTML off-platform projects repository. After this work getting nginx set up for local hosting and GitHub Pages set up for global hosting, it’s time to jump into my Codecademy off-platform assignments!

Mid 2022 Snapshot of Unity DOTS Transition

As a curious hobbyist, I can learn Unity at my own pace choosing topics that interest me. More importantly, I have the luxury of pausing when I’m more interested in learning something else. No game development shipping deadline to meet, just a few Unity projects for fun here and there. This meant I first learned about Unity DOTS at the end of 2021, and I had to catch up to what has happened since. Since Unity’s DOTS transition is still in progress, the information on this page will quickly become outdated.

My sources are Unity blog posts tagged with DOTS, plus two YouTube playlists:

  1. Unite Copenhagen 2019 – DOTS. This playlist of 17 videos was at Unity’s own conference in 2019, where they invited people to start sinking their teeth into DOTS. A lot of future-looking discussion about goals and aims, but there were enough tools and support infrastructure to start experimentation. (As opposed to Unite LA 2018, which was more of a DOTS introduction as fewer tools were available for testing.)
  2. Unity at GDC 2022. This playlist of 17 videos spanned Unity presentations at Game Developer Conference earlier this year. Not all of the videos on this list involve DOTS. But those that did, gave us an update on how things have progressed (or not progressed) since 2019.

Given that information, my understanding today: Unity DOTS adoption is to improve performance of Unity titles on modern hardware while still preserving Unity’s flexibility, existing codebase, and friendliness to users. Especially beginners!

“Improved performance” is usually shown off by demonstrating huge complex scenes, but at the core it aims to better align Unity runtime code with how modern multicore CPUs go about their work. Yes, this would allow huge and complex scenes previously impossible, but it also means reducing power and resources consumed to deliver scenes built today. This is especially important for those publishing titles for battery-powered cell phones and tablets.

But that is runtime code. At design time, Unity will stay with the current GameObject-oriented design paradigm. All those tutorials out there will still work, all the components on Unity Marketplace could still be used. Paraphrasing a presenter: “Design time stays people-friendly, DOTS changes the runtime to become computer-friendly.” Key to this duality is a procedure that was called “conversion” but has since been renamed “baking” to translate a set of GameObject data for use via Entities and Components. GameObject code are converted to Systems that execute on said data. These systems ideally work in units that can be compiled to native code with the Burst compiler and scheduled by the Jobs system for distribution across all available CPU cores. But if that’s too big of a leap to make in one conversion, Unity aims to support intermediate levels of functionality so developers can adopt DOTS piecemeal as they are ready and do so in places that make sense.

Of course, it is possible to do dive into the deep end of the pool and directly create Entities/Components /Systems. However, the Unity editor is not (yet?) focused on supporting that workflow, current work is focused on helping the user base make this “baking” transition over the next few years. Which means certain Unity initiatives for a fully DOTS era may be put on hold.

Learning DOTS, the biggest mental hurdle for me has been around Entities. Conceptually, Unity GameObject are already composed of components. Though actual code differs between two different schools of components, it was a close-enough concept for me to understood. It was similarly easy for me to comprehend that executable code logic move to systems to work on those components. From there, it was easy for me to conclude GameObject are converted to Entities, but that is wrong. (Or at least, hinders maximizing potential of DOTS.) I’m still struggling with this myself and I hope for an “A-ha!” moment at some point in the future.

Notes on “Hardspace:Shipbreaker” Release

Just before 2021 ended I bought the game Hardspace:Shipbreaker in an incomplete early-access state. I had a lot of fun despite its flaws. In May, the game exited early-access to become an official release, followed quickly by a 1.1 release. This post documents a few observations from an enthusiastic player.

The best news: many annoying bugs were fixed! A few examples:

  • Temperature Control Units no longer invisibly attach ship exterior to interior.
  • Waste Disposal Units are no longer glued to adjacent plates.
  • Armor Plates could be separated for barge recycling separately from the adjacent hull plate, which goes to the processor.

Sadly, not all of my annoyance points were fixed. The worst one is “same button for picking up a part and pushing it away”. That is still the case, and I still occasionally blast some parts off into space when I intended to grab them, which meant I have to waste time chasing them down.

The most charming new feature are variations on ship interiors. The 0.7 release had variations on exterior livery that corresponded to fictional companies that owned and used these ships, but the interior had been generically identical. Now there are a few cosmetic variations, and I was most amused by the green carpet in old passenger liners. It gave me a real fun 70s vibe in a futuristic spaceship.

The most useful new feature is the ability to save partial ship salvage progress. Version 0.7 lacked this feature and it meant once we started a ship, we were committed to keeping the game running until we were done. (Either by playing through multiple shifts in one sitting or leaving the computer on and running the game until we could return.) Saving ship progress allows us to save and quit the game and return to our partially complete ship later. This feature noticeably lengthens game load and save times, but I think it is a worthwhile tradeoff.

In version 0.7, the single-player campaign plotline only went to an Act II cliffhanger. It now has an Act III conclusion, but that did not make the plot more appealing to me. The antagonist went too far and entered the realm of annoying caricature. Note I did not say “unrealistic” because there definitely exist people who climb into positions of power in order to abuse others. I’ve had to deal with that in real workplaces and didn’t enjoy having that in my fictional workplace. I was also disappointed with the storybook depiction of unionization, real life union-busting is far more brutal. Though I don’t particularly need to experience that in my entertainment, either. But aside from imposing some pauses in the shipbreaking action, the single player plotline does not impact the core game loop of taking ships apart. Lastly: the “little old space truck” side quest now ties into the conclusion, because getting it fixed up is your ticket out of that hellhole.

Based on earlier information, the development team should now be focused on releasing this title for game consoles. I’ve been playing it using a game controller on my PC and found it made an acceptable tradeoff with its own upsides and downsides relative to keyboard-and-mouse. I hope it will do well on consoles, I want to see more puzzle-solving teardown games on the market.

But the reason I started playing this game at all was because I had been learning about Unity game engine’s new Data Oriented Technology Stack (DOTS) and wanted to see an application of it in action. As much as I enjoyed the game, I hadn’t forgotten the educational side of my project.

Unity Without OpenGL ES 2.0 Is All Pink on Fire TV Stick

I dug up my disassembled Fire TV Stick (second generation) and it is now back up and running again. This was an entertaining diversion in its own right, but during the update and Android Studio onboarding process, I kept thinking: why might I want to do this beyond “to see if I could”? This was the same question I asked myself when I investigated the Roku Independent Developer’s Kit just before taking apart some Roku devices. For a home tinkerer, what advantage did they have over a roughly comparable Raspberry Pi Zero? I didn’t have a good answer for Roku, but I have a good answer for Fire TV: because it is an Android device, and Android is a target platform for Unity. Raspberry Pi and Roku IDK, in comparison, are not.

I don’t know if this will be useful for me personally, but at the very least I could try installing my existing Unity project Bouncy Bouncy Lights on the device. Loading up Unity Hub, I saw that Unity had recently released 2021 LTS so I thought I might as well upgrade my project before installing Unity Android target platform tools. Since Bouncy Bouncy Lights was a very simple Unity project, there were no problems upgrading. Then I could build my *.apk file which I could install on my Fire TV just like introductory Android Studio projects. There were no error messages upon installation, but upon launch I got a warning: “Your device does not match the hardware requirements of this application.” What’s the requirement? I didn’t know yet, but I got a hint when I chose to continue anyway: everything on screen rendered a uniform shade of pink.

Going online for answers, I found many different problems and solutions for Unity rendering all pink. I understand pink-ness is a symptom of something going wrong in the Unity graphics rendering pipeline, and it is a symptom that can have many different causes. Without a single solution, further experiment and diagnosis is required.

Most of the problems (and solutions) are in the Unity “Edit”/”Project Settings…”/”Player”/”Other Settings” menu. This Unity forum thread with the same “hardware requirements” error message suggests checking to ensure “Auto Graphics API” is checked (it was) and “Rendering Path” to Linear (no effect). This person’s blog post was also dealing with a Fire TV and their solution was checking “Auto Graphics API” which I am already doing. But what if I uncheck that box? What does this menu option do (or not do?)

Unchecking that box unveils a list of two Graphics APIs: Vulkan and OpenGLES3. Hmm, I think I see the problem here. Fire TV Stick second generation hardware specification page says it only supported OpenGL ES 2.0. Digging further into Unity documentation found that OpenGL ES 2.0 support is deprecated and not included by default, but we could add it to a project if we need it. Clicking the plus sign allowed us to add it as a graphics API for use in our Unity app:

Once OpenGL ES 2.0 is included in the project as a fallback graphics API, I could rebuild the *.apk file and install the updated version.

I got colors back! It is no longer all pink, and cubes that are pink look like they’re supposed to be pink. So the cubes look fine, but all color has disappeared from the platform. It was supposed to have splotches of color cast by randomly colored lights attached to each block.

Instead of showing different colors, it has apparently averaged to a uniform gray. I guess this is where an older graphics API gets tripped up and why we want newer APIs for best results. But at least it is better than a screen full of pink, even if the solution in my case was to uncheck “Auto Graphics API”. The opposite of what other people have said online! Ah well.

Move Calculation Off Microcontroller to Node-RED

After I added MQTT for distributing data, I wanted to change how calculations were done. Using a microcontroller to read a voltage requires doing math somewhere along the line. The ADC (analog-to-digital) peripheral on a microcontroller will return an integer value suitable for hardware registers, and we have to convert that to a floating-point voltage value that makes sense to me. In my first draft ESP8266 voltage measurement node, getting this conversion right was an iterative process:

  • Take an ADC reading and convert to voltage using a conversion multiplier.
  • Comparing against voltage reading from my multimeter.
  • Calculate a better conversion factor.
  • Reflash ESP8266 with Arduino sketch that includes the new conversion factor.
  • Repeat.

The ESP8266 ADC is pretty noisy, with probable contributions from other things like temperature variations. So there was no single right conversion factor value, it varies through time. The best I can hope for is a pretty-close average tradeoff. While looking for that value, the loop of recalculating and uploading got to be pretty repetitious. I want to move that conversion work off of the Arduino so it can be more easily refined and updated.

One option is to move that work to the data consumption side. This means logging raw ADC values into InfluxDB and whoever queries that data is responsible for conversion. This preserves original unmodified measurement data allowing the consumers to be smart about dealing with jitter and such. I like that idea but not ready to dive into that sort of data analysis just yet.

To address both of these points, I pulled Node-RED into the mix. I’ve played with this flow computing tool earlier and I think my current project aligns well with the strengths of Node-RED. The voltage conversion process, specifically, is a type of data transformation people do so often in Node-RED that there is a standard node Range for this purpose. Performing voltage conversion in a Range node means I could fine-tune the conversion and update by clicking “Deploy” which is much less cumbersome than recompiling and uploading an Arduino sketch.

Node-RED also allows me to carry both the original and converted data through the flow. I use a Change node to save original ADC value to another property before using Range to convert ADC value to voltage. Now I have a Node-RED message with both original and converted data. Now I need to put that into the database, and I searched the public Node-RED library for “InfluxDB” and I decided to try node-red-contrib-stackhero-influxdb-v2 first since it explicitly supported version 2 of InfluxDB. I’m storing the raw ADC values now even though I’m not doing anything with it yet. The idea is to keep track so in the future I can explore voltage conversion on the data consumption side.

To test this new infrastructure design using MQTT and Node-RED, I’ll pull an ESP32 development board out of my pile of parts.


Here is my Node-RED function to package data in the format expected by node-red-contrib-stackhero-influxdb-v2 InfluxDB write node. Input data: msg.raw_ADC is the original ADC value, and msg.payload the voltage value converted by Range node:

var fields = {V: msg.payload, ADC: msg.raw_ADC};
var tags = {source: 'batt_monitor_02',location: 'lead_acid'};
var point = {measurement: 'voltage',
      tags: tags,
      fields: fields};
var root = {data: [point]};
msg.payload = root;
return msg;

My simple docker-compose.yml for running Node-RED:

version: "3.8"

services:
  nodered:
    image: nodered/node-red:latest
    restart: unless-stopped
    ports:
      - 1880:1880
    volumes:
      - ./data:/data

Routing Data Reports Through MQTT

Once a read-only Grafana dashboard was up and running, I had end-to-end data flow from voltage measurement to a graph of those measurements over time. This was a good first draft, from here I can pick and choose where to start refining the system. First thought: it was cool that an ESP8266 Arduino could log data straight to an InfluxDB2 server, but I don’t think that is the best way to go.

InfluxDB is a great database to track historical data, but sometimes I want just the most recent measurement. I don’t want to have to spin up a full InfluxDB client library and perform a query just to retrieve a single data point. That would be a ton of unnecessary overhead! Even worse, due to the overhead, not everything has an InfluxDB client library and I don’t want to be limited to the subset that does. And finally, tracking historical data is only one aspect of the system, at some point I want to take action based on data and measurements. InfluxDB doesn’t help at all for that.

To improve these fronts, I’m going to add a MQTT broker to my home network in the form of a docker container running Eclipse Mosquitto. MQTT is a simple publish/scribe system. The voltage measuring node I’ve built out of an ESP8266 is a publisher, and InfluxDB is a subscriber for that data. If I want the most recent measurement, I can subscribe to the same data source and see it at the same time InfluxDB does.

I’ve read the Hackaday MQTT primer, and I understand it is a popular tool for these types of projects. It’s been on my to-do list ever since and this is the test project for playing with it. Putting MQTT into this system lets me start small with a single publisher and a single subscriber. If I expand on this system, it is easy to add more to my home MQTT network.

As a popular and lightweight protocol, MQTT enjoys a large support ecosystem. While not everything has an InfluxDB client library, almost everything has a MQTT client library. Including ESP8266 Arduino, and InfluxDB in the form of a Telegraf plugin. But after looking over that page, I understand it is designed for direct consumption and has little (no?) options for data transformation. This is where Node-RED enters the discussion.


My simple docker-compose.yml for running Mosquitto:

version: "3.8"

services:
  mqtt-broker:
    image: eclipse-mosquitto:latest
    restart: unless-stopped
    ports:
      - 1883:1883
      - 9001:9001
    volumes:
      - ./config:/mosquitto/config
      - ./data:/mosquitto/data
      - ./log:/mosquitto/log

Using Grafana Despite Chronograf Integration

I’ve learned some valuable lessons making an ESP8266 Arduino log data into InfluxDB, giving me ideas on things to try for future iterations. But for the moment I’m getting data and I want to start playing with it. This means diving into Chronograf, the visualization charting component of InfluxDB.

In order to get data around the clock, I’ve changed my plans for monitoring solar panel voltage and connected the datalogging ESP8266 to the lead-acid battery array instead. This allows me to continue refining and experimenting with data at night when the solar panel generates no power. It also means the unnecessarily high power consumption also means the battery is being unnecessarily drained, but an ESP8266 on full power is still consuming only a small percentage of what my lead-acid battery array can deliver so I’m postponing that problem.

Chronograf was pretty easy to get up and running. Querying on the tags of my voltage measurement/table, plotting logged voltage values over time. This is without getting distracted by all the nifty toys Chronograf has to offer. Getting a basic graph allows me to explore how to present this data in some sort of dashboard, and here I ran into a problem. There doesn’t seem to be a way within Influx to present a Chronograf chart in a read-only manner. I found no access control on the data visualization dashboard, nor could I find access restriction options at an Influx users level. Not that I could create new users from the UI

Additional users cannot be created in the InfluxDB UI.

A search for more information online directed me to the Chronograf GitHub repository, where there is a reference to a Chronograf “Viewer” role. Unfortunately, that issue is several years old, and I think this feature got renamed sometime in the past few years. Today a search for “viewer role” on InfluxDB Chronograf documentation comes up empty.

The only access control I’ve found in InfluxDB is via API tokens, and I don’t know how that helps me when logged in to use Chronograf. The only way I know to utilize API tokens is from outside the system, which means firing up a separate Docker container running another visualization charting software package: Grafana. Then I could add InfluxDB as a data source with a read-only API token so a Grafana dashboard has no way to modify InfluxDB data. This feels very clumsy and I’m probably making a beginner’s mistake somewhere, but it gives me peace of mind to leave Grafana displaying on a screen without worry about my InfluxDB data integrity. This lets me see the data so I know the system is working end-to-end as I go back and rework how data is communicated.

For reference here is my very simple docker-compose.yml for my Grafana instance.

version: "3.8"

services:
  server:
    image: grafana/grafana:latest
    restart: unless-stopped
    ports:
      - 3000:3000
    volumes:
      - ./data:/var/lib/grafana

Notes on “Hardspace: Shipbreaker” 0.7

I have spent entirely too much time playing Hardspace: Shipbreaker, but it’s been very enjoyable time spent. As of this writing, it is a Steam Early Access title and still in development. The build I’ve been playing is V.0.7.0.217552 dated December 20th, 2021. (Only a few days before I bought it on Steam.) The developers have announced their goal to take it out of Early Access and formally release in Spring 2022. Comments below from my experience do not necessarily reflect the final product.

The game can be played in career mode, where ship teardowns are accompanied by a storyline campaign. My 0.7 build only went up to act 2, the formal release should have an act 3. Personally, I did not find the story compelling. This fictional universe placed the player as an indentured servant toiling for an uncaring mega-corporation, and that’s depressing. It’s too close to the real world of capitalism run amok.

Career mode has several difficulty settings. I started with the easiest “Open Shift” that removes the stress of managing consumables like my spacesuit oxygen. It also removes the time limit of a “shift” which is fifteen minutes. After I moved up to “Standard” difficulty, the oxygen limit is indeed stressful. But I actually started appreciating the fifteen-minute limit timer because it encourages me to take a break from this game.

Whatever the game mode (career, free play or competitive race) the core game is puzzle-solving: How to take apart a spaceship quickly and efficiently to maximize revenue. My workspace is a dockyard in earth orbit, and each job takes apart a ship and sort them into one of three recycle bins:

  1. Barge: equipment kept intact. Examples: flight terminal computers, temperature control units, power cells, reactors.
  2. Processor: high value materials. Examples: exterior hull plates, structural members.
  3. Furnace: remainder of materials. Example: interior trim.

We don’t need to aim at these recycle bins particularly carefully, as they have an attraction field to suck in nearby objects. Unfortunately, these force fields are also happy to pull in objects we didn’t intend to deposit. Occasionally an object would fall just right between the bins and they would steal from each other!

I haven’t decided if the hungry processors/furnaces is a bug, or an intended challenge to the game. There are arguments to be made either way. However, the physics engine in the game exhibit behavior that are definitely bugs. Personally, what catches me off guard the most are small events with outsized effects. The most easily reproducible artifact is to interact with a large ship fragment. Our tractor beam can’t move a hull segment several thousand kilograms in mass. But if we use the same tractor beam to pick up a small 10 kilogram component and rub it against the side of the hull segment, the hull segment starts moving.

Another characteristic of the physics engine is that everything has infinite tensile strength. As long as there is a connection, no matter how small, the entire assembly remains rigid. It means when we try to cut the ship in half, each half weighting tens of thousands of kilograms, we could overlook one tiny thing holding it all together. My most frustrating experience was a piece of fabric trim. A bolt of load-bearing fabric holding the ship together!

But at least that’s something I can look for and see connected onscreen. Even more frustrating are bugs where ship parts are held together by objects that are visibly apart on screen. Like a Temperature Control Unit that doesn’t look attached to an exterior hull plate, but it had to be removed from its interior mount at which point both the TCU and the hull are free to move. Or the waste disposal unit that rudely juts out beyond its allotted square.

Since the game is under active development, I see indications of game mechanics that was not available to me. It’s not clear to me if these are mechanisms that used to exist and removed, or if they are promised and yet to come. Example: there were multiple mentions of using coolant to put out fires, and I could collect coolant canisters, but I don’t see how I can apply coolant to things on fire. Another example: there are hints that our cutter capability can be upgraded, but I encountered no upgrade opportunity and must resort to demolition charges. (Absent an upgrade, it’s not possible to cut directly into hull as depicted by game art.) We also have a side-quest to fix up a little space truck, but right now nothing happens when the quest is completed.

The ships being dismantled are one of several types, so we know roughly what to expect. However, each ship includes randomized variations so no two ships are dismantled in exactly the same way. This randomization is occasionally hilarious. For example, sometimes the room adjacent to the reactor has a window and computers to resemble a reactor control room. But sometimes the room is set up like crew quarters with chairs and beds. It must be interesting to serve on board that ship, as we bunk down next to a big reactor through the window and its radioactive warning symbols.

There are a few user interface annoyances. The “F” key is used to pick up certain items in game. But the same key is also used to fire a repulsion field to push items away. Depending on the mood of the game engine, sometimes I press “F” to pick up an item only to blast it away instead and I have to chase it down.

But these are all fixable problems and I look forward to the official version 1.0 release. In the meantime I’m still having lots of fun playing in version 0.7. And maybe down the line the developers will have the bandwidth to explore putting this game in virtual reality.

Spaceship Teardowns in “Hardspace: Shipbreaker”

While studying Unity’s upcoming Data-Oriented Technology Stack (DOTS) I browsed various resources on the Unity landing page for this technology preview. Several game studios have already started using DOTS in their titles and Unity showcased a few of them. One of the case studies is Hardspace:Shipbreaker, and it has consumed all of my free time (and then some.)

I decided to look into this game because the name and visuals were vaguely familiar. After playing a while I remembered I first saw it on Scott Manley’s YouTube channel. He made that episode soon after the game was available on Steam. But the game has changed a lot in the past year, as it is an “Early Access Game” that is still undergoing development. (Windows only for now, with goal of eventually on Xbox and PlayStation consoles.) I assume a lot of bugs have been stamped out in the past year, as it has been mostly smooth sailing in my play. It is tremendously fun even in its current incomplete state.

Hardspace:Shipbreaker was the subject of an episode of Unity’s “Behind the Game” podcast. Many aspects of developing this game were covered, and towards the end the developers touched on how DOTS helped them solve some of their performance problems. As covered in the episode, the nature of the game means they couldn’t use many of the tried-and-true performance tricks. Light sources move around, so they couldn’t pre-render lights and shadows. The ships break apart in unpredictable ways (especially when things start going wrong) there can be a wide variation in shapes and sizes of objects in the play area.

I love teardowns and taking things apart. I love science fiction. This game is a fictional world where we play a character that tears down spaceships for a living. It would be a stretch to call this game “realistic” but it does have its own set of realism-motivated rules. As players, we learn to work within the constraints set by these rules and devise plans to tear apart these retired ships. Do it safely so we don’t die. And do it fast because time is money!

This is a novel puzzle-solving game and I’m having a great time! If “Spaceship teardown puzzle game” sounds like fun, you’ll like it too. Highly recommended.

[Title image from Hardspace: Shipbreaker web site]

Unity-Python Communication for ML-Agents: Good, Bad, and Ugly

I’ve only just learned that Unity DOTS exists and it seems like something interesting to learn as an approach for utilizing resources on modern multicore computers. But based on what I’ve learned so far, adopting DOTS by itself won’t necessarily solve the biggest bottleneck in Unity ML-Agents as per this forum thread: the communication between Unity and Python.

Which is unfortunate, because this mechanism is also a huge strength of this system. Unity is a native code executable with modules written in C# and compiled, while deep learning neural network frameworks like TensorFlow and PyTorch runs under a Python interpreted environment. The easiest and most cross-platform friendly way for these two types of software to interact is via network ports even though data is merely looped back to the same computer and not sent over a network.

With a documented communication protocol, it allowed ML-Agents components to evolve independently as long as they conform to the same protocol. This was why they were able to change the default deep learning framework from TensorFlow to PyTorch between ML-Agents version 1.0 and 2.0 but without breaking backwards compatibility. (They did it in release 10, in case it’s important) Developers who prefer to use TensorFlow could continue doing so, they are not forced to switch to PyTorch as long as everyone talks the same language.

Functional, capable, flexible. What’s not to love? Well, apparently “performance”. I don’t know the details for Unity ML-Agents bottlenecks but I do know “fast” for a network protocol is a snail’s pace compared to high performance inter-process communications mechanisms such as shared memory.

To work around the bottleneck, the current recommendations are to manually stack things up in parallel. Starting at the individual agent level: multiple agents can train in parallel, if the environment supports it. This explains why the 3D Ball Balancing example scene has twelve agents. If the environment doesn’t support it, we can manuall copy the same training environment several times in the scene. We can see this in the Crawler example scene, which has ten boxes one for each crawler. Beyond that, we now have the capability to run multiple Unity instances in parallel.

All of these feel… suboptimal. The ML-Agents team is aware of the problem and working on solutions but have nothing to announce yet. I look forward to seeing their solution. In the meantime, learning about DOTS has sucked up all of my time. No, not learning… I got sucked into Hardspace:Shipbreaker, a Unity game built with DOTS.