Ingenuity the Mars Helicopter Technology Demonstrator

One of the most exciting part of the Mars 2020 mission is not visible on the Perseverance rover interactive 3D web page. It is Ingenuity, officially named the Mars Helicopter Technology Demonstrator. (It has its own online interactive 3D model.) For people like myself who want to really dig into the technical details, NASA JPL published many papers on the project including this one for the 2018 AIAA Atmospheric Flight Mechanics Conference. (DOI: 10.2514/6.2018-0023)

Ingenuity is the latest in a long line of technology demonstrator projects from JPL, where ideas are tested at a small scale in a noncritical capacity. Once proven, later missions can make more extensive use of the technology. Perseverance rover itself is part of such a line, tracing back to Sojourner which was the technology demonstrator for the concept of a Mars rover. Reflected in its official name, the Microrover Flight Experiment.

Most of the popular press has covered Ingenuity’s rotors and how they had to be designed for Mars. It has the advantage that it only had to lift against Martian gravity, which is much weaker than Earth gravity. But that advantage is more than balanced out by the disadvantage of having to work in Martian atmosphere, which is much much thinner than Earth air. Designing and testing them on Earth was a pretty significant challenge.

Mechanically, the part I find the most interesting were motor and control system for the coaxial helicopter. It has been simplified relative to coaxial helicopters flying on Earth, but still far more complex than the category of multirotor aircraft commonly called drones. Most multirotor aircraft have no mechanical control linkages at all, their propellers rigidly attached to a motor and control is strictly electronic via motor power. The paper describes the challenges of implementing a coaxial helicopter control system for Mars, but it didn’t explain why the design was chosen in the first place. I’m sure someone worked through the tradeoffs. Since mechanical simplicity (and hence reliability) is highly valued in planetary missions, I am very curious what factors outweighed it. Perhaps that information was published in another paper?

Electronically, the most exciting thing is Ingenuity’s brain, which is from the Snapdragon line of processors better known as the brains for cell phones and tablets here on Earth. If it works on Mars, it would offer a huge increase in computing power for planetary missions. Perseverance itself runs on a RAD750 computer, which has proven its reliability through many successful spacecraft but is roughly equivalent to a 20-year old PowerPC desktop. Having more powerful CPUs for future missions will allow our robotic explorers to be more autonomous instead of being dependent on brains back here on Earth to tell them what to do.

Angular CLI as WSL 2.0 Test Case

I’ve been fascinated by the existence of Windows Subsystem for Linux (WSL) ever since its introduction. I’ve played with it occasionally, such as trying to run ROS on it. And this time I thought I’d try installing the Angular CLI on a WSL instance. But this time with a twist: this is now WSL 2.0, a big revamp of the concept. Architecturally, there’s now much more of an actual Linux distribution running inside the environment, which promises even better Linux compatibility and performance in Linux-native scenarios. The tradeoff is a reduction in performance of Windows-Linux interoperations, but apparently the team decided it was worthwhile.

But first, I have to run through the installation instructions which, on my 2004 build, encountered the error that required a Linux kernel update.

WSL 2 requires an update to its kernel component. For information please visit https://aka.ms/wsl2kernel

Then I can install it from Microsoft store followed by an installation of Ubuntu. Then I installed Node.JS for Ubuntu followed by Angular CLI tools. The last step ran into the same permissions issue I saw on MacOS X with node-modules ownership. Once I took ownership, I got an entirely new error:

Error: EACCES: permission denied, symlink '../lib/node_modules/@angular/cli/bin/ng' -> '/usr/bin/ng'

The only resolution I found for this was “Run as root”. Unsatisfying, and I would be very hesitant if this was a full machine but I’m willing to tolerate it for a small virtual machine.

Once I installed Angular CLI, I cloned by “Tour of Heroes” tutorial repository into this WSL instance and tried ng serve. This triggered the error:

Cannot find module '@angular-devkit/build-angular/package.json'

Which turned out to be a Node.JS beginner mistake. Looking up the error I found this StackOverflow thread where I learned that cloning the repository was not enough. I also need to run “npm install” in that directory to set up Node dependencies.

Once those issues were resolved, I was able to run the application where I found two oddities. (1) I somehow didn’t completely remove the mock HEROES reference on my Mac? And (2) package.json and package-lock.json had lots of changes I did not understand.

But neither of those issues were as important as my hopes for more transparent networking support in WSL 2. Networking code running in WSL was not visible from elsewhere in my local network unless I jumped through some hoops with Windows Firewall, which was what made ROS largely uninteresting earlier for multi-node robots. WSL 2 claimed to have better networking support, but alas my Angular application’s “ng serve” was similarly unreachable from another computer on my local network.

Even though this test was a failure, judging by the evolution of WSL to WSL2 I’m hopeful that work will continue to make this more seamless in the future. At the very least I hope I wouldn’t have to use the “run as root” last resort.

Until the next experiment!

StackBlitz: a Web App for Building Web Apps

It’s always nice when the team behind a particular piece of software put in an effort to make newcomer introduction easy. Part of Angular’s low-friction introduction shopping app tutorial is letting us learn Angular basics without installing anything on our development machines, by using the web-based development environment StackBlitz. After I used it for the Angular tutorial, I took a quick detour to learn a bit more about StackBlitz to see what else it might be useful for.

According to the StackBlitz announcement post in 2017, its foundation is built from components of Visual Studio Code but transformed to be accessible via a browser. Behind the VSCode-based UI is a big pool of virtual machines spun up on-demand to support front-end web development. StackBlitz started with support for Angular and React, and more have been added since. Plus a beta program for full stack development so their ambition is still expanding. These virtual machines perform code compilation and also a web server to host the results. These servers are public by default, so developers can check how their results look on a cell phone, on a tablet, etc.

But while StackBlitz gives us online compute resources, their generosity does not extend to online storage. Their “tight GitHub integration” is another way of saying all our project code must be stored in our GitHub repositories. If a user is not willing to enter their GitHub credentials, their work only lasts until they disconnect and the virtual machines are shut down. Since I’m squeamish about spreading my GitHub credentials any more than I absolutely have to, this meant abandoning my Angular introductory app once it was complete.

That was the biggest damper on my enthusiasm for StackBlitz. If it weren’t for that bit of security paranoia, I would consider it a good replacement (at least, for the supported types of projects) for Cloud9 which used to be another free online development environment. Unfortunately, after Cloud9 was acquired by Amazon, using Cloud9 is no longer free. Even though Cloud9 is still very affordable in absolute terms (doing my LRWave project on Cloud9 only cost me $0.43 USD) there’s a huge psychological gap between $0.43 and free. Will StackBlitz lose its free usage in the future? Probably. Money to pay the bills for those virtual machines have to come from somewhere.

If this freebie goes away, so be it. It wouldn’t be the end of the world for the average Angular developer because the primary usage scenario is still centered around installing the tools on a development machine.

Quest for the Whistler Button

I’m a fan of physical, tactile buttons that provide visual feedback. I realize the current trend favors capacitive touch, but I love individual buttons I can find by feel. And one of the best looking buttons I’ve seen was from the 1992 movie Sneakers. When the blind character Whistler used a Braille-labeled device to add a sound effect representing the “thump” sound of a car going over seams of a concrete bridge.

They were only on screen for a few seconds, but I was enamored with the black buttons, each with a corresponding red LED. The aesthetics reminded me of 2001, like the eye of HAL in a mini monolith. Or maybe Darth Vader, if the Sith lord were a button. When I first watched the movie many years ago, I thought they were neat and left it at that. But in recent years I’ve started building electronics projects. So when I rewatched the movie recently and saw them again, I decided to research these buttons.

The first step is to determine if they were even a thing. All we saw was the front control panel of an unknown device. It was possible the buttons and LEDs were unrelated components sitting adjacent to each other on the circuit board, and only visually tied together by pieces of plastic custom-made for the device. So the first step was to find that device. There was a label at the bottom of the panel below Whistler’s hand, but due to the shallow depth of field I could only make out the end as “… 2002 digital sampler”. Time to hit the internet and see if anyone recognized the machine.

The first step is the Trivia section of the movie’s page on Internet Movie Database where people contribute random and minute pieces of information. Firearms enthusiasts can usually be counted on to name specific guns used in a film, and automotive enthusiasts frequently contribute make and model of cars as well.

Sadly, the electronics audio enthusiasts have not felt fit to contribute to this page, so I went elsewhere on the internet trying various keyword combinations of “Sneakers”, “Whistler”, “sampler”, etc. The answer was found in a comment to a Hackaday post about the movie. I’ve complained a lot about the general quality of internet comments, but this time one person’s nitpicking correction is my rare nugget of gold.

Whistler’s device is a Sequential Circuits Prophet 2002 Digital Sampler rack. As befitting the movie character, the sampler’s control panel had Braille labels covering the default text. But otherwise it appears relatively unmodified for the movie. I wish the pictures were higher resolution, but their arrangement strongly implies the button and LED are part of a single subcomponent. The strongest evidence came from the presence of four vertical axis buttons, rotated 90 degrees from the rest.

Aside: On the far right of the control panel, we can see a sign of the era, a 3.5″ floppy drive for data storage.

Encouraged by this find, I started searching for Prophet 2002 buttons. I quickly found an eBay community offering replacement parts for Sequential Circuits products including these buttons. What’s intriguing to me is that these are sold in “New” condition, not surplus or salvaged from old units. I’m optimistically interpreting this as a hint these buttons might still be in production, decades after the Prophet 2002 was released in 1985.

Thanks to those eBay listings, I have seen a picture of the component by itself and it is exactly what I hoped it would be: the button’s exterior surface, the electric switch itself, and the LED are integrated into a single through-hole component. Given the tantalizing possibility it is still in active production and something I can buy for my own projects, I went next to electronics supplier Digi-Key.

Digi-Key carries 305,212 components under its “Switches” section, not practical for individual manual review. Fortunately there are subsections and I first tried “Tactile Switches” (5721 items) because those buttons look like they’d give a good tactile response. In the movie we also heard a satisfying click when the button was pressed, but I don’t know if that was added later by the film’s sound mixer.

Within the “Tactile Switches” section, I aggressively filtered by the most optimistic wish they are active and in stock:

  • Part Status: Active
  • Stocking Options: In Stock
  • Illumination: Illuminated
  • Illuminator: LED, Red

That dropped it to 76 candidates. Almost all of them carried their illumination under the button instead of adjacent to it. The closest candidate is a JF Series switch by NKK Switches, the JF15RP3HC which has a Digi-Key part number 360-3284-ND.

It is a more modern and refined variant of the same concept. The button is sculpted, and the illuminated portion sits flush with the surroundings. This would be a great choice if I was updating the design, but I am chasing a specific aesthetic and this switch does not look like a monolith or Vader.

So that wasn’t too bad, but I’m not ready to stop. Peer to “Tactile Switches” are several other subsections worth investigating. I next went to “Pushbutton Switches” (175,722 items) and applied the following filters. Again starting with the optimistic wish they are active and in stock:

  • Part Status: Active
  • Stocking Options: In Stock
  • Type: Keyswitch, Illuminated
  • Illumination Type, Color: LED, Red

That filter cut the number of possibilities from 175,722 down to 21 which felt like an overly aggressive shot in the dark, and I expected I would have to adjust the search. But it wouldn’t hurt to take a quick look over those 21 and my eyes widened when I saw that list. Most of the 21 results had a very similar aesthetic and would make an acceptable substitute, but that would not be necessary because I saw the Omron B3J-2100.

Yes, I’ve hit the jackpot! Even if that isn’t precisely the correct replacement for a Prophet 2002 sampler, it has the right aesthetics: a dark angular block with the round LED poking out. But now that I’ve found the component, I can perform web searches with its name to confirm that others have also decided Omron B3J is the correct replacement.

Omron’s B3J datasheet showed a list of models, where we can see variations on this design. The button is available in multiple colors, including this black unit and the blue also used by the Prophet 2002. The number and color of LEDs add to the possible combinations, from no LEDs (a few blue examples on a Prophet 2002 have no lights) to two lights in combinations of red, green, or yellow.

Sure, these switches are more expensive than the lowest bidder options on Amazon. But the price premium is a small price to pay when I’m specifically seeking this specific aesthetic. When I want the look that started me on this little research project, only the Omron B3J-2100 will do. And yeah, I’m going to call them “Whistler buttons”.

[Follow-up: This post became more popular than I had expected, and I’m glad I made a lot of fellow button enthusiasts happy.]

First Impressions: Paxcess Rockman 200

I had been using a Monoprice PowerCache 220 to store and use power generated by my small Harbor Freight solar array. Due to its degrading battery and erroneous thermal protection circuit, I bought a Paxcess Rockman 200(*) to replace it. Thanks to its lithium chemistry battery, the Paxcess is far smaller and lighter than the Monoprice unit it replaced. Which made a good first impression as something I noticed before I even opened the box.

Two means of charging were included with the Rockman 200, giving users two choices of power source. Either use an automotive “cigarette lighter” power socket adapter, or use a household AC voltage power adapter. But I intended to charge from solar power, so I had to fashion my own charging adapter. Fortunately the Rockman 200 used commodity barrel jacks (5.5mm outer diameter, 2.1mm inner diameter) so it was easy to build one from my most recent purchase(*) of such connectors. This was much easier than the hack I had to do for my Monoprice.

Once up and running I was indeed able to charge from my Harbor Freight solar array. The maximum no-load open circuit voltage of these panels were around 21V, lower than the 24V maximum input voltage limit of the Rockman 200. The Rockman 200 had a far more informative display than the very limited one on board Monoprice PowerCache 220. I like to see how many watts the solar array is delivering, and seeing the number of watts being drawn by anything I had plugged in. Unfortunately, there were two disadvantages relative to the PowerCache 220.

  1. It is not possible to use the AC power output while charging. Like the Monoprice, 12V DC and USB DC power output can be used while charging. But while the Monoprice was willing to deliver AC power while charging, the Paxcess is not.
  2. When drawing DC power while charging, the cooling fan always comes on. I suppose this is some sort of DC voltage conversion process. In contrast the Monoprice stays silent if it can stay cool enough. Or at least it used to, before the thermal sensing system broke down.

Neither the Monoprice or the Paxcess attempts to perform maximum power point tracking (MPPT). I realize this means the panel is not operating at maximum efficiency, but a MPPT controller (*) cost significantly more money than their non-MTTP counterpart (*). Given that a standalone controller costs almost as much as the array, or the Paxcess itself, I don’t fault the Paxcess for not doing MPPT.

However, the Paxcess is even more non-MPPT than the Monoprice. The Monoprice pulls the voltage level down to whatever level its internal lead-acid battery is at, which is usually in the range of 11-13 volts. In contrast, the Paxcess drags the voltage all the way down to 9.5 volts, which is even further away from the maximum power point, as seen by varying power input wattage on the information display.

The display also shows a charge percentage for its internal battery. This allows me the option to stay within the 30% – 80% range if I want to minimize stress on the battery. Lithium chemistry batteries have a different care and feeding procedure than lead acid batteries. Speaking of which, with the new battery storage unit in hand, I opened up the old one to try to fix it.


(*) Disclosure: As an Amazon Associate I earn from qualifying purchases.

Monitoring Samsung 500T Discharge-Charge Cycle

Getting through Node-RED installation on my Samsung 500T tablet was the hard part. Once done, it was trivial to set up a flow to extract the tablet’s battery voltage and charge percentage. I didn’t want to add any more overhead than I already have, so the flow sends that data off to a MQTT broker. Allowing me to analyze that data on a different computer without impacting the battery consumption on the 500T.

It was instructive to watch those two graphs after I unplugged the 500T, charting its battery discharge down to under 10% followed by a charge back up to 100%. During this time, the 500T kept its screen on displaying ESA’s ISS Tracker, plus the normal Windows background tasks. And now it also has Node.JS running Node-RED to query the battery voltage and charge percentage.

The first observation is that the battery discharge percentage graph is impressively linear. As a user I never felt this way intuitively, as my most memorable episodes are battery meters whose value seem to drop faster and faster as I raced to finish a task before my battery gave out. A linear graph is impressive because a lithium ion battery’s discharge voltage is not linear, something we can see on the voltage graph. It drops sharply off 8.4V before stabilizing on a gentle slope that’s more like a curve as it gradually slowed approaching 7.4V. (That is the commonly listed nominal voltage level for two lithium ion battery cells in series.) Once it dips below 7.4V, voltage curve starts dropping rapidly and trending towards a steep dive when I plugged the 500T back in to a power source. We can also see that the voltage level is a bit noisy, fluctuating as it discharges. In contrast, except for a little dip off 100%, the percentage graph is steadily and reliably decreasing under a constant workload. Just as we’d expect, with no surprises. I have a lot of complaints about this machine, but its power management is rock solid.

For the charge cycle, again the percentage value is not based solely on battery voltage as we see those two are wildly divergent during charging. When I plugged in to recharge, there was a big jump upwards as the machine switched to charging cycle. Towards the end of that cycle as charge state approached 100%, there was some kind of a top-off procedure. I was surprised to see that charge controller allowing the battery voltage to exceed 8.4V, something I’ve been taught to never do when charging bare 2S LiPo battery packs. But according to this voltage graph, exceeding that voltage was part of the process of filling the battery before letting it settle down to around 8.35V. All through this top-off procedure, the battery percentage reported 100%.

I enjoyed this opportunity to geek out over battery management minutiae, but it isn’t just for curiosity’s sake. There was a project idea behind all this.

Brief Look At National Weather Service Web API

I’ve started exploring Node-RED and I like what I see. It’s a different approach to solving some problems and it’s always nice to have tools in the toolbox that would serve specific needs better than anything I had available before. The second tutorial introduced interacting with REST APIs on the web by querying for earthquake data, which was fun.

But while interesting and informative, there’s nothing I do differently after seeing earthquake data. Weather data, on the other hand, is a different story. As of this writing we’re living through a heat wave, and knowing the forecast daily highs and lows does affect my decisions. For example, whether to use my home air conditioning to pre-cool the house in the morning, which supposedly helps reduce load on the electric grid in the peak afternoon hours.

So I went looking for weather information and found a lot of taxpayer funded resources at the National Weather Service (NWS). Much of which is available as web service APIs. But in order to get data applicable to myself, I first need to figure out how to identify my location in the form of their grid system.

After a few false starts, I found my starting point (literally) in the points endpoint, which returns a set of metadata relevant to the given longitude and latitude. The metadata includes the applicable grid type as well as the X and Y coordinates corresponding to the given latitude and longitude. There are a lot of ways to get usable lat/long values, I went to the Wikipedia page for my city.

Once armed with the gridId, gridX, and gridY values, I could use them to query the remaining endpoints, such as asking for weather forecast for my grid. There’s a wealth of information here that can be a lot of fun for a future project, possibly for a smart home concept of some sort, but right now I should set aside this distraction and return to learning Node-RED.

Webcam Test with UWP

Once I had an old webcam taped to the carriage of a retired 3D printer, I shifted focus to writing code to coordinate the electronic and mechanical bits. My most recent experiments in application development were in the Microsoft UWP platform, so I’m going to continue that momentum until I find a good reason to switch to something else.

Microsoft’s development documentation site quickly pointed me to an example that implements a simple camera preview control. It looks like the intent here is to put up a low resolution preview so the user can frame the camera prior to taking a high resolution still image. This should suffice as a starting point.

In order to gain access to the camera feed, my application must declare the webcam capability. This will show the user a dialog box that the application wants to access the camera with options to approve or deny, and the user must approve before I could get video. Confusingly, that was not enough. If I approve camera access I still see errors. It turns out that even though I didn’t care about audio, I had to request access for the microphone as well. This seems like a bug but a simple enough workaround in the short term.

Once that was in place, I got a low resolution video feed from the camera. I don’t see any way to adjust parameters of this live video. I would like to shift to a higher resolution and I’m willing to accept lower frame rate. I would also like to reduce noise and I’m willing to accept lower brightness. The closest thing I found to camera options is something called “camera profiles“. For the moment this is a moot point because when I queried for profiles on this camera, IsVideoProfileSupported returned false.

I imagine there is another code path to obtain video feed, used by video conference applications and video recording apps. There must be a way to select different resolutions and adjust other parameters, but I have a basic feed now so I’m content to put that on the TO-DO list and move on.

The next desire is ability to select a different camera, since laptops usually have a built-in camera and I would attach another via USB. Thanks to this thread on Stack Overflow, I found a way to do so by setting VideoDeviceId property of MediaCaptureInitializationSettings.

And yay, I have a video feed! Now I want to move the 3D printer carriage by pressing the arrow keys. I created keyboard event handlers KeyDown and KeyUp for my application, but the handlers were never called. My effort to understand this problem became the entry point for a deep rabbit hole into the world of keyboard events in UWP.

[Code for this exploration is public on Github.]

I Do Not (Yet?) Meet The Prerequisites For Multiple View Geometry in Computer Vision

Python may not be required for performing computer vision with or without OpenCV, but it does make exploration easier. There are unfortunately limits to the magic of Python, contrary to glowing reviews humorous or serious. An active area of research that is still very challenging is extracting world geometry from an image, something very important for robots that wish to understand their surroundings for navigation.

My understanding of computer vision says the image segmentation is very close to an answer here, and while it is useful for robotic navigation applications such as autonomous vehicles, it is not quite the whole picture. In the example image, pixels are assigned to a nearby car, but such assignment doesn’t tell us how big that car is or how far away it is. For a robot to successfully navigate that situation, it doesn’t even really need to know if a certain blob of pixels correspond to a car. It just needs to know there’s an object, and it needs to know the movement of that object to avoid colliding with it.

For that information, most of today’s robots use an active sensor of some sort. Expensive LIDAR for self driving cars capable of highway speeds, repurposed gaming peripherals for indoor hobby robot projects. But those active sensors each have their own limitations. For the Kinect sensor I had experimented with, the limitation were that it had a very limited range and it only worked indoors. Ideally I would want something using passive sensors like stereoscopic cameras to extract world geometry much as humans do with our eyes.

I did a bit of research to figure out where I might get started to learn about the foundations of this field, following citations. One hit that came up frequently is the text Multiple View Geometry in Computer Vision (*) I found the web page for this book, where I was able to download a few sample chapters. These sample chapters were enough for me to decide I do not (yet) meet the prerequisites for this class. Having a robot make sense of the world via multiple cameras and computer vision is going to take a lot more work than telling Python to import vision.

Given the prerequisites, it looks pretty unlikely I will do this kind of work myself. (Or more accurately, I’m not willing to dedicate the amount of study I’d need to do so.) But that doesn’t mean it’s out of reach, it just means I have to find some related previous work to leverage. “Understand the environment seen by a camera” is a desire that applies to more than just robotics.


(*) Disclosure: As an Amazon Associate I earn from qualifying purchases.

Notes On OpenCV Outside of Python

It was fun taking a brief survey of PyImageSearch.com guides for computer vision, and I’m sure I will return to that site, but I’m also aware there are large areas of vision which are regarded as out of scope.

The Python programming language is obviously the focus of that site as it’s right in the name PyImageSearch. However, Python is not the only or even the primary interface for OpenCV. According to official OpenCV introduction, it started as a C library which has since moved to a C++ API. Python is but one of several language bindings on top of that API.

Using OpenCV via Python binding is advantageous not only because of Python itself, it also opens access to a large world of Python libraries. The most significant one in this context is NumPy. Other languages may have similar counterparts, but Python and NumPy together is a powerful combination. There are valid reasons to use OpenCV without Python, but they would have to find their own counterparts to NumPy for their number crunching heavy lifting.

Just for the sake of exercise, I looked at a few of the other platforms I recently examined.

OpenCV is accessible from JavaScript, or at least Node.JS, via projects like opencv4nodejs.  This also means OpenCV can be embedded in desktop applications written in ElectronJS, demonstrated with this example project of opencv-electron.

If I wanted to use OpenCV in a Universal Windows Platform app, it appears some people have shared some compiled form of OpenCV up to Microsoft’s NuGet repository. As I understand it, NuGet is to .NET as PyPI is to Python. Maybe there are important differences but it’s a good enough analogy for a first cut. Microsoft’s UWP documentation describes using OpenCV via a OpenCVHelper component. And since UWP can be in C++, and OpenCV is in C++, there’s always the option of compiling from source code.

As promising as all this material is, it is merely the foundation for applying computer vision to the kind of problems I’m most interested in: helping a robot understand its environment for mapping, obstacle avoidance, and manipulation. Unfortunately that field starts to get pretty complex for a casual hobbyist to pick up.

Notes After Skimming PyImageSearch

I’m glad I learned of PyImageSearch from Evan and spent some time to sit down to look it over. The amount of information available on this site is large enough that I resorted to skimming, with the intent to revisit specific subjects later as need arise.

I appreciate the intent of making computer vision accessible to beginners, it is always good to make sure people interested in exploring an area are not frustrated by problems unrelated to the problem domain. Kudos to the guides on command line basics, and on Python’s NoneType errors that are bewildering to beginners.

That said, this site does frequently dive into areas that I felt lacked sufficient explanation for beginners. I remember the difficulty I had in understanding how matrix math related to computer graphics. The guide on rotation discussed the corresponding rotation matrix. Readers got the assurance “This matrix looks scary, but I promise you: it’s not.” but the explanation that followed would not have been enlightening to me back when I was learning the topic. Perhaps a link to more details would be helpful? Still, the effort is appreciated.

There are also bits of Python code that would be confusing to a beginner. Not just Python itself, but also when leveraging the very powerful NumPy library. I had no idea what was going on between tuple and argmin in the code on this page:

extLeft = tuple(c[c[:, :, 0].argmin()][0])

Right now it’s a black box of voodoo magic to me, a sting of non-alphanumeric operators that more closely resemble something I associated with Perl programming. At some point I need to sit down with Python documentation to work through this step by step in Python REPL (read – evaluation – print loop) to understand this syntax. It would be good if the author included footnotes with links to the appropriate Python terminology for these operations.

A fact of life of learning from information on PyImageSearch is the sales pitch for the author’s books. It’s not necessarily a good thing or a bad thing, but it is very definitely a thing. Constant and repetitive reminder “and in my book you will also learn…” on every page. This site exists to draw people in and, if they want to take it further, sell them on the book. I appreciate this obviously stated routine over the underhanded ways some other people make money online, but that doesn’t make it any less repetitive.

Likely related to the above is the fact this site also wants to collect e-mail addresses. None of the code download links takes us to an actual download, they instead take us to a form where we have to fill in our e-mail address before we are given a link to download. Fortunately the simple bits I’ve followed along so far are easy to recreate without the download but I’m sure it will be unavoidable if I go much further.

And finally, this site is focused on OpenCV in Python running on Unix-derived operating systems. Other language bindings for  OpenCV are out of scope, as is the Windows operating system. For my project ideas that involve embedded platforms without Python, or those that will be deployed on Windows, I would need to go elsewhere for help.

But what is within scope is covered well, with an eye towards beginner friendliness, and available freely online in a searchable collection. For that I am thankful to the author, even as I acknowledge that there are interesting OpenCV resources beyond this scope.

Skimming Remainder Of PyImageSearch Getting Started Guide

Following through part of, then skimming the rest of, the first section of PyImageSearch Getting Started guide taught me there’s a lot of fascinating information here. Certainly more than enough for me to know I’ll be returning and consult as I tackle project ideas in the future. For now I wanted to skimp through the rest and note the problem areas it covers.

The Deep Learning section immediately follows the startup section, because they’re a huge part of recent advancements in computer vision. Like most tutorials I’ve seen on Deep Learning, this section goes through how to set up and train a convolutional neural network to act as an image classifier. Discussions about training data, tuning training parameters, and applications are built around these tasks.

After the Deep Learning section are several more sections, each focused on a genre of popular applications for machine vision.

  • Face Applications start from recognizing the presence of human faces to recognizing individual faces and applications thereof.
  • Optical Character Recognition (OCR) helps a computer read human text.
  • Object detection is a more generalized form of detecting faces or characters, and there’s a whole range of tools. This will take time to learn in order to know which tools are the right ones for specific jobs.
  • Object tracking: once detected, sometimes we want an object tracked.
  • Segmentation: Detect objects and determine which pixels are and aren’t part of that object.

To deploy algorithms described above, the guide then talks about hardware. Apart from theoretical challenges, there’s also hardware constraint that are especially acute on embedded hardware like Raspberry Pi, Google Coral, etc.

After hardware, there are a few specific application areas. From medical computer vision, to video processing, to image search engine.

This is an impressively comprehensive overview of computer vision. I think it’ll be a very useful resource for me in the future, as long as I keep in mind a few characteristics of this site.

Skimming “Build OpenCV Mini-Projects” by PyImageSearch: Contours

Getting a taste of OpenCV color operations were interesting, but I didn’t really understand what made OpenCV more powerful than other image processing libraries until we got to contours, which covers most of the second half of PyImageSearch’s Start Here guide Step 4: Build OpenCV Mini-Projects.

This section started with an example for finding the center of a contour, which in this case is examining a picture of a collection of non-overlapping paper cut-out shapes. The most valuable concept here is that of image moments, which I think of as a “summary” for a particular shape found by OpenCV. We also got names for operations we’ve seen earlier. Binarization operations turn an image into binary yes/no highlight of potentially interesting features. Edge detection and thresholding are the two we’ve seen.

Things get exciting when we start putting contours to work. The tutorial starts out easy by finding the extreme points in contours, which breaks down roughly what goes on inside OpenCV’s boundingRect function. Such code is then used in tutorials for calculating size of objects in view which is close to a project idea on my to-do list.

A prerequisite for that project is code to order coordinates clockwise, which reading the code I was surprised to learn was done in cartesian space. If the objective is clockwise ordering, I thought it would have been a natural candidate for processing in polar coordinate space. This algorithm was apparently originally published with a boundary condition bug that, as far as I can tell, would not have happened if the coordinate sorting was done in polar coordinates.

These components are brought together beautifully in an example document scanner application that detects the trapezoidal shape of a receipt in the image and performs perspective correction to deliver a straight rectangular image of the receipt. This is my favorite feature of Office Lens and if I ever decide to write my own I shall return to this example.

By the end of this section, I was suitably impressed by what I’ve seen of OpenCV, but I also have the feeling a few of my computer vision projects would not be addressed by the parts of OpenCV covered in the rest of PyImageSearch’s Start Here guide.

Skimming “Build OpenCV Mini-Projects” by PyImageSearch: Colors

I followed through PyImageSearch’s introductory Step 3: Learn OpenCV by Example (Beginner) line by line both to get a feel of using Python binding of OpenCV and also to learn this particular author’s style. Once I felt I had that, I started skimming at a faster pace just to get an idea of the resources available on this site. For Step 4: Build OpenCV Mini-Projects I only read through the instructions without actually following along with my own code.

I was impressed that the first part of Step 4 is dedicated to Python’s NoneType errors. The author is right — this is a very common thing to crop up for anyone experimenting with Python. It’s the inevitable downside of Python’s lack of static type checking. I understand the upsides of flexible runtime types and really enjoy the power it gives Python programmers, but when it goes bad it can go really bad and only at runtime. Certainly NoneType is not the only way it can manifest, but it is certainly going to be the most common and I’m glad there’s an overview of what beginners can do about it.

Which made the following section more puzzling. The topic was image rotation, and the author brought up the associated rotation matrix. I feel that anyone who would need an explanation of NoneType errors would not know how a mathematical matrix is involved in image rotation. Most people would only know image rotation from selecting a menu in Photoshop or, at most, grabbing the rotate handle with a mouse. Such beginners to image processing would need an explanation of how matrix math is involved.

The next few sections were focused on color, which I was happy to see because most of Step 3 dealt with gray scale images stripped of their color information. OpenCV enables some very powerful operations I want to revisit when I have a project that can make use of them. I am the most fascinated by the CIE L*a*b color space, something I had never heard of before. A color space focused on how humans perceived color rather than how computers represented it meant code working in that space will have more human-understandable results.

But operations like rotation, scaling, and color spaces are relatively common things shared with many other image manipulation libraries. The second half goes into operations that make OpenCV uniquely powerful: contours.

Notes On “Learn OpenCV by Example” By PyImageSearch

Once basic prebuilt binaries of OpenCV has been installed in an Anaconda environment on my Windows PC, Step #2 of PyImageSearch Start Here Guide goes into command line arguments. This section was an introduction for people who have little experience with the command line, so I was able to skim through it quickly.

Step #3 Learn OpenCV by Example (Beginner) is where I finally got some hands-on interaction with basic OpenCV operations. Starting with basic image manipulation routines like scale, rotate, and crop. These are pretty common with any image library, and illustrated with a still frame from the movie Jurassic Park.

The next two items were more specific to OpenCV: Edge detection attempts to extract edges from am image, and thresholding drops detail above and below certain thresholds. I’ve seen thresholding (or close relative) in some image libraries, but edge detection is new to me.

Then we return to relatively common image manipulation routines, like drawing operations on an image. This is not unique to OpenCV but very useful because it allows us to annotate an image for human-readable interpretation. Most commonly drawing boxes to mark regions of interest, but also masking out areas not of interest.

Past those operations, the tutorial concludes with a return to OpenCV specialties in the form of contour and shape detection algorithms, executed on a very simple image with a few Tetris shapes.

After following along through these exercises, I wanted to try those operations on one of my own pictures. I selected a recent image on this blog that I thought would be ideal: high contrast with clear simple shapes.

Xbox One

As expected, my first OpenCV run was not entirely successful. I thought this would be an easy image for edge detection and I learned I was wrong. There were false negatives caused by the shallow depth of field. Vents on the left side of the Xbox towards the rear was out of focus and edges were not picked up. False positives in areas of sharp focus came from two major categories: molded texture on the front of the Xbox, and little bits of lint left by the towel I used to wipe off dust. In hindsight I should have taken a picture before dusting so I could compare how dust vs. lint behaved in edge detection. I could mitigate false positives somewhat by adjusting the threshold parameters of the edge detection algorithm, but I could not eliminate them completely.

Xbox Canny Edge Detect 30 175

With such noisy results, a naive application of contour and shape detection algorithms used in the tutorial returned a lot of data I don’t yet know how to process. It is apparent those algorithms require more processing and I still have a lot to learn to deliver what they needed. But still, it was a fun first run! I look forward to learning more in Step 4: Build OpenCV Mini-Projects.

Notes on OpenCV Installation Guide by PyImageSearch

Once I decided to try PyImageSearch’s Getting Started guide, the obvious step #1 is about installing OpenCV. Like many popular open source projects, there are two ways to get it on a computer system: (1) use a package manager, or (2) build from source code. Since the focus here is using OpenCV from Python, the package manager of choice is pip.

Packages that can be installed via pip are not necessarily done by the original authors of a project. If it’s popular enough, someone will take on the task of building from open source code and make those built binaries available to others, and PyImageSearch pip install opencv guide says that is indeed the case here for OpenCV.

I appreciated the explanation of differences between the four different packages, a result of two different yes/no options: headless or not, and contrib modules or not. The headless option is appropriate for machines used strictly for processing and do not need to display any visual interface, and contrib describes a set of modules that were contributed by people outside of core OpenCV team. These have grown popular enough to be offered as a packaged bundle.

What’s even more useful was an explanation of what was not in any of these packages available via pip: modules that implement patented algorithms. These “non-free” components are commonly treated as part of OpenCV, but are not distributed in compiled binary form. We may build them from source code for exploration, but any binary distribution (for example, use in commercial software product) requires dealing with lawyers representing owners of those patents.

Which brings us to the less easy part of the OpenCV installation guide: building from source code. PyImageSearch offers instructions to do so on macOS, Ubuntu, and Raspbian for a Raspberry Pi. The author specifically does not support Windows as a platform for learning OpenCV. If I want to work in Windows, I’m on my own.

Since I’m just starting out, I’m going to choose the easy method of using pre-built binaries. Like most Python tutorials, PyImageSearch highly recommends a Python environment manager and includes instructions for virtualenv. My Windows machine already had Anaconda installed, so I used that instead to install opencv-contrib-python in an environment created for this first phase of OpenCV exploration.

Trying OpenCV Getting Started Guide By PyImageSearch

I am happy that I made some headway in writing desktop computer applications controlling hardware peripheral over serial port, in the form of a test program that can perform a few simple operations with a 3D printer. But how will I put this idea to work doing something useful? I have a few potential project ideas that leverage the computing power of a desktop computer, several of them in the form of machine vision.

Which meant it was time to fill another gap in my toolbox of solving problems with software: get a basic understanding of what I can and can’t do with machine vision. There are two meanings to “can” in that sentence, both of them apply: “is this even theoretically possible” sense and also the “is this within the reach of my abilities” sense. The latter will obviously be more limiting, and the limit is something I can devote the time to learn and fix. But getting an idea of the former is also useful so I don’t go off on a doomed project trying to build something impossible.

Which meant it was time to learn about OpenCV, the canonical computer vision library. I came across OpenCV in various contexts but it’s just been a label on a black box. I never devoted the time to sit down and learn more about this box and how I might be able to leverage it in my own projects. Given my interest in robotics, I knew OpenCV was on my path but didn’t know when. I guess now is the time.

Given that OpenCV is the starting point for a lot of computer vision algorithms and education, there are many tutorials to choose from and I will probably go through several different ones before I will feel comfortable with OpenCV. Still, I need to pick a starting point. Upon this recommendation from Evan who I met at Superconference, I’ll try Getting Started guide by PyImageSearch. First step: installing OpenCV.

Simple Logging To Text File

Even though I aborted my adventures into Windows ETW logging, I still wanted a logging mechanism to support future experimentation into Universal Windows Platform. This turned into an educational project in itself, learning about other system interfaces of this platform.

Where do I put this log file?

UWP applications are not allowed arbitrary access to the file system, so if I wanted to write out a log file without explicit user interaction, there are only a few select locations available. I found the KnownFolders enumeration but those were all user data folders, I didn’t want these log files clogging up “My Documents” and such. I ended up putting the log file in ApplicationData.TemporaryFolder. This folder is subject to occasional cleanup by the operating system, which is fine for a log file.

When do I open and close this log file?

This required a trip into the world of UWP application lifecycle. I check if the log file existed and, if not, create and open the log file from three places: OnLaunched, OnActivated, and OnResuming. In practice it looks like I mostly see OnLaunched. The flipside is OnSuspending, where the application template has already set up a suspension deferral buying me time to write out and close the log file.

How do I write data out to this log file?

There is a helpful Getting Started with file input/output document. In it, the standard recommendation is to use the FileIO class. It links to a section in the UWP developer’s guide titled Files, folders, and libraries. The page Create, write, and read a file was helpful for me to see how these differ from classic C file I/O API.

These FileIO classes promise to take care of all the complicated parts, including async/await methods so the application is not blocked on file access. This way the user interface doesn’t freeze until the load or save operation completes, instead remaining responsive while file access was in process.

But when I used the FileIO API naively, writing upon every line of the log file, I received a constant stream of exceptions. Digging into the call stack of the exception (actually several levels deep in the chain) told me there was a file access collision problem. It was the page Best practices for writing to files that cleared things up for me: these async FileIO libraries create temporary files for each asynchronous action and copy over the original file upon success. When I was writing once per line, too many operations were happening in too short of a time resulting in the temporary files colliding with each other.

The solution was to write less frequently, buffer up a set of log messages so I write a larger set of them with each FileIO access, rather than calling once per log entry. Reducing the frequency of write operations resolved my collision issue.

[This simple text file logging class is available on GitHub.]

Complexity Of ETW Leaves A Beginner Lost

When experimenting with something new in programming, it’s always useful to step through the code in a debugger the first time to see what it does. An unfortunate side effect is far slower than normal execution speed, which interferes with timing-sensitive operations. An alternative is to have a logging mechanism that doesn’t slow things down (as much) so we can read the logs afterwards to understand the sequence of events.

Windows has something called Event Tracing for Windows (ETW) that has evolved over the decades. This mechanism is implemented in the Windows kernel and offers dynamic control of what events to log. The mechanism itself was built to be lean, impacting system performance as little as possible while logging. The goal is that it is so fast and efficient that it barely affects timing-sensitive operations. Because one of the primary purposes of ETW is to diagnose system performance issues, and obviously it can’t be useful it if running ETW itself causes severe slowdowns.

ETW infrastructure is exposed to Universal Windows Platform applications via the Windows.Foundation.Diagnostics namespace, with utility classes that sounded simple enough at first glance: we create a logging session, we establish one or more channels within that session, and we log individual activities to a channel.

Trying to see how it works, though, can be overwhelming to the beginner. All I wanted is a timestamp and a text message, and optionally an indicator of importance of the message. The timestamp is automatic in ETW. The text message can be done with LogEvent, and I can pass in a LoggingLevel to signify if it is verbose chatter, informative message, warning, error, or a critical event.

In the UWP sample library there is a logging sample application showcasing use of these logging APIs. The source code looks straightforward, and I was able to compile and run it. The problem came when trying to read this log: as part of its low-overhead goal and powerful complexity, the output of ETW is not a simple log file I can browse through. It is a task-specific ETL file format that requires its own applications to read. Such tools are part of the Windows Performance Toolkit, but fortunately I didn’t have to download and install the whole thing. The Windows Performance Analyzer can be installed by itself from the Windows store.

I opened up the ETL file generated by the sample app and… got no further. I could get a timeline of the application, and I can unfold a long list of events. But while I could get a timestamp for each event, I can’t figure out how to retrieve messages. The sample application called LogEvent with a chunk of “Lorem ipsum” text, and I could not figure out how to retrieve it.

Long term I would love to know how to leverage the ETW infrastructure for my own application development and diagnosis. But after spending too much time unable to perform a very basic logging task, I shelved ETW for later and wrote my own simple logger that outputs to a plain text file.

Ubuntu and ROS on Raspberry Pi

Since I just discovered that I can replace Ubunto with lighter-weight Raspbian on old 32-bit PCs, I thought it would be a good time to quickly jot down some notes about going the other way: replacing Raspbian with Ubuntu on Raspberry Pi.

When I started building Sawppy in early 2018, I was already thinking ahead to turning Sawppy from a remote-controlled toy to an autonomous robot. Which meant a quick survey to the state of ROS. At the time, ROS Kinetic was the latest LTS release, targeted for Ubuntu 16.

Unfortunately the official release of Ubuntu 16 did not include an armhf build suitable for running on a Raspberry Pi. Some people would build their own ROS from source code to make it run on Raspbian, I took one attempt and the build errors took more time to understand and resolve than I wanted to spend. I then chose the less difficult path of finding a derived released of Ubuntu 16 that ran on the platform: Ubuntu Mate 16. An afternoon’s worth of testing verified basic ROS Kinetic capability, and I set it aside for revisiting later.

Later on in 2018, Ubuntu 18 was released, followed by ROS Melodic matching that platform. By then support for running Debian (& deriviatives) on armhf had migrated to Ubuntu, and they released both the snap-based Ubuntu Core and Ubuntu ‘classic’ for Raspberry Pi. These are minimalist server images, but desktop UI components can be installed if needed. Information to do so can be found on Ubuntu wiki but obviously UI is not a priority when I’m looking at robot brains. Besides, if I wanted an UI, Ubuntu Mate 18 is still available as well. For Ubuntu 20 released this year, the same choices continue to be offered, which should match well with ROS Noetic.

I don’t know how relevant this is yet for ROS on a Raspberry Pi, but I noticed not only are 32-bit armhf binaries available, so are 64-bit arm64 binaries. Raspberry Pi 3 and 4 have CPU capable of running arm64 code, but Raspbian has remained 32-bit for compatibility with existing Pi software and with low-end devices like the Raspberry Pi Zero incapable of arm64. More than just an ability to address more memory, moving to arm64 instruction set was also a chance to break from some inconvenient bits of architectural legacy which in turn allowed better arm64 performance. Though the performances increase are minor as applied to a Raspberry Pi, ROS releases include precompiled arm64 binaries so the biggest barrier to entry has already been removed and might be worth a look.

[UPDATE I found a good reason to go for arm64: ROS2]