Virtual Lunar Rovers May Help Sawppy Rovers

Over a year ago I hand-waved a grandiose wish that robots should become smarter to compensate for their imperfections instead of chasing perfection with ever more expensive hardware. This was primarily motivated by a period of disillusionment as I wanted to make use of work by robotics research only to find that their published software tend to require hardware orders of magnitude more expensive than what’s feasible for my Sawppy.

Since then, I’ve noticed imperfection is something that’s coming up more and more frequently. I had my eyes on the DARPA Subterranean Challenge (SubT) for focusing researcher attention towards rugged imperfect environments. They’ve also provided a very promising looking set of tutorials for working with the ROS-based SubT infrastructure. This is a great resource on my to-do list.

Another interesting potential that I wasn’t previously aware of is NASA Space Robotics Phase 2 competition. While phase 1 is a simulation of a humanoid robot on Mars, phase 2 is about simulated rovers on the moon. And just like SubT, there will be challenges with perception making sense of rugged environments and virtual robots trying to navigate their way through. Slippery uneven surfaces, unreliable wheel odometry, all the challenges Sawppy has to deal with in real life.

And good news, at least some of the participants in this challenge are neither big bucks corporations nor secretive “let me publish it first” researchers. One of them, Rud Merriam, is asking questions on ROS Discourse and, even more usefully for me, breaking down the field jargon to language outsiders can understand on his blog. If all goes well, there’ll be findings here useful for Sawppy here on earth! This should be fun to watch.

Micro-ROS Now Supports ESP32

When I looked wistfully at rosserial and how it doesn’t seem to have a future in the world of ROS2, I mentioned micro-ROS appeared to be the spiritual successor but it required more powerful microcontrollers leaving the classic 8-bit chips behind. Micro-ROS doesn’t quite put microcontrollers on a ROS2 system as a first-class peer to other software nodes running on the computer, as there still needs to be a corresponding “agent” node on the computer to act as proxy. But it comes closer than rosserial ever tried to be, and looks great on paper.

Based on the micro-ROS architectural overview diagram, I understand it can support multiple independent software components running in parallel on a microcontroller. This is a boost in capability from rosserial which can only focus on a single task. However, it’s not yet clear to me whether a robot with a microcontroller running micro-ROS can function independent of a computer running full fledged ROS2. On a robot with ROS using rosserial, there still needs to be a computer running the ROS master among other foundational features. Are there similar requirements for a robot with micro-ROS?

I suppose I wouldn’t really know until I set aside the time to dig into it, which has yet to happen. But the likelihood just increased. I knew that the official support for micro-ROS started with various ARM Cortex boards, but reading the system requirements I saw nothing that would have prevented the non-ARM ESP32 from joining the group. Especially since it is a popular piece of hardware and it already runs FreeRTOS by default. I have a few modules already on hand and I expected it was only a matter of time before someone ported micro-ROS to the ESP32. Likely before I build up the expertise and find the time to try it myself.

That expectation was correct! A few days ago an announcement was posted to ROS Discourse that ESP32 is now officially part of the micro-ROS ecosystem. And thus another barrier against ROS2 adoption has been removed.

Scott Locklin’s Take on Robotics

As someone who writes about robots on WordPress, I am frequently shown what other people have written about robots on WordPress. Like this post titled “Open problems in Robotics” by Scott Licklin and I agree with his conclusion: state of the art robotics still struggle to perform tasks that an average one year old human child can do with ease.

He is honest with a disclaimer that he is not a Serious Robotics Researcher, merely a technically competent spectator taking a quick survey of the current state of the art. That’s pretty much the same position I am in, and I agree with his list of big problems that are known and generally unsolved. But more importantly, he was able to explain these unsolved problems in generally understandable terms and not fall into field jargon as longtime practitioners (or wanna-be posers like myself) would be tempted to do. If someone not well versed in the field is curious to see how a new perspective might be able to contribute, Scott’s list is not a bad place to start. Robotics research still has a ton of room for newcomers to bring new ideas and new solutions.

Another important aspect of Scott’s writing is making it clear that unsolved does not mean unsolvable, a tone I see all too frequently from naysayers claiming robotics research is doomed to failure and a waste of time and money. Robotics research has certainly been time consuming and expensive, but I think it’s a stretch to say it’ll stay that way forever.

However, Scott is pessimistic that algorithms running on computers as we know them today would ever solve these problems, hypothesizing that robots would not be successful until they take a different approach to cognition. “more like a simulacrum of a simple nervous system than writing python code in ROS” and here our opinions differ. I agree current computing systems built on silicon aren’t able to duplicate brains built on proteins, but I don’t agree that is a requirement for success.

We have many examples in our daily lives where a human invention works nothing like their natural world inspiration, but have been proven useful regardless of that dissimilarity. Hydraulic cylinders are nothing like muscles, bearings and shafts are nothing like joints, and a Boeing 747 flies nothing like an eagle. I believe robots can effectively operate in our world without having brains that think the way human brains do.

But hey, what do I know? I’m not a big shot researcher, either. So the most honest thing to do is to end my WordPress post here with the exact same words Scott did:

But really, all I know about robotics is that it’s pretty difficult.

Randomized Dungeon Crawling Levels for Robots

I’ve spent more time than I should have on Diablo III, a video game where our hero adventures through endless series of challenges. Each level in the game has a randomly generated layout so it’s not possible to memorize where the most rewarding monsters live or where the best treasures are hidden. This keeps the game interesting because every level is an exploration in an environment I’ve never seen before and will never see its exact duplicate again.

This is what came to my mind when I learned of WorldForge, a new feature of AWS RoboMaker. For those who don’t know: RoboMaker is an AWS offering built around ROS (Robot Operating System) that lets robot builders leverage the advantages of AWS. One example most closely relevant to WorldForge is the ability to run multiple virtual robot simulations in parallel across a large number of AWS machines. It’ll cost money, of course, but less than buying a large number of actual physical computers to run those simulations.

But running a lot of simulations isn’t very useful whey they are all running the same robot through the same test environment, and this is where WorldForge comes in. It’s a tool that accepts a set of parameters, then generate a set of simulation worlds that randomly place or replace features according to those given parameters. Then virtual robots can be set loose to do their thing across AWS machines running in parallel. Consistent successful completion across different environments builds confidence our robot logic is properly generalized and not just memorizing where the best treasures are buried. So basically, a randomized dungeon crawler adventure for virtual robots.

WorldForge launched with ability to generate randomized residential environments, useful for testing robots intended for home use. To broaden the appeal of WorldForge, other types of environments are coming in the future. So robots won’t get bored with the residential tileset, they’ll also get industrial and business tilesets and more to come.

I hope they appreciate the effort to keep their games interesting.

Seeed Studio Odyssey X86J4105 Has Good ROS2 Potential

If I were to experiment with upgrading my Sawppy to ROS2 right now, with what I have on hand, I would start by putting Ubuntu ARM64 on a Raspberry Pi 3 for a quick “Hello World”. However, I would also expect to quickly run into limitations of a Pi 3. If I wanted to buy myself a little more headroom, what would I do?

The Pi 4 is an obvious step up from the 3, but if I’m going to spend money, the Seeed Studio Odyssey X86J4105 is a very promising candidate. Unlike the Pi, it has an Intel Celeron processor on board so I can build x86_64 binaries on my desktop machine and copy them straight over. Something I hope to eventually be a painless option for ROS2 cross compilation to ARM, but we’re not there yet.

This board is larger than a Raspberry Pi, but still well within Sawppy’s carrying capacity. It’s also very interesting that they copied the GPIO pin layout from Raspberry Pi, the idea some HATs can just plug right in is very enticing. Although that’s not a capability that would be immediately useful for Sawppy specifically.

The onboard Arduino co-processor is only useful for this application if it can fit within a ROS2 ecosystem, and the good news is that it is based on the SAMD21. Which makes it powerful enough to run micro-ROS, an option not available to the old school ATmega32U4 on the LattePanda boards.

And finally, the electrical power supply requirements are very robot friendly. The spec sheet lists DC input voltage requirement at 12V-19V, implying we can just put 4S LiPo power straight into the barrel jack and onboard voltage regulators will do the rest.

The combination of computing power, I/O, and power flexibility makes this board even more interesting than an Up Board. Definitely something to keep in mind for Sawppy contemplation and maybe I’ll click “Add to Cart” on this nifty little board (*) sometime in the near future.


(*) Disclosure: As an Amazon Associate I earn from qualifying purchases.

Still Constantly Asked: Is ROS2 Ready Yet?

One more note on this series of “keeping a pulse on ROS2” before returning to my own projects: we’re still very much in the stage where people are constantly asking if ROS2 is ready yet, and the answer has moved from “probably not” to “maybe, depending on what you are doing.” I noticed that since the first LTS was declared, the number of people who have dived in and started using ROS2 has grown. Thanks to the community of people willing to share their work, it helps growing the amount of developer resources available. It also means a lot of reports based on first-hand experience, one pops up on my radar once every few days. The most recent one being this article: 5 features ROS 2 needs in 2020.

The most encouraging thing about this particular post is towards the end: the author listed several problems that he had wanted to write about when he started, but before it was published, then the community had found answers! This highlights how the field is still growing and maturing rapidly. Whenever anyone reads a published article on “Is ROS 2 ready yet?” they must understand that some subset of facts would already be obsolete. Check the publication date, and discount accordingly.

Out of the issues on this list, there were two I hadn’t known about so I’m thankful I could learn about them here. I knew ROS 2 introduced the concept of having different QoS (Quality of Service) guarantees for messages. This allows a robot builder to make sure when the going gets tough, the robot knows which less important messages can be dropped to make sure the important ones get through. What I didn’t know was the fact QoS also introduced confusion in ROS2 debugging and diagnostics tools. I can see how it becomes a matter of perspective: to keep a robot running, the important messages need to get to their recipients and at that very second, it’s not as important to deliver them to a debugger. But if the developer is to understand what went wrong, they need to see the debugger messages! It’ll be interesting to see how those tools reconcile these perspectives.

The other one I hadn’t know about was the lack of visibility into subscriber connections and disconnections. Theoretically in an ideal world robot modules don’t have to care about who is listening to whatever they have to say, but it’s definitely one of the areas where the real world hasn’t matched up well with the theory. Again it’ll be interesting to see how this one evolves.

The next phase of “Is ROS2 ready yet?” will be “yeah, unless you’re doing something exotic” and judging by the pace so far, that’s only a year or two away. I’m definitely seeing a long term trend towards the day when the answer is “Ugh, why are you still using ROS 1?”

Notes on ROS2 and rosserial

I’m happy to see ROS2 improve over the past several releases, each release more mature and suitable for adoption than the last. Tackling some long standing problems like cross compilation and also new frontiers. I know a lot of my current reservations about ROS2 are on the to-do list, but there’s a fairly significant item that appears to be deliberately absent: rosserial.

In ROS, the rosserial module is the default way of for something simple to communicate with the rest of a ROS robot. It’s been a popular way for robot builders to add small dedicated modules that serve their little niche simply and effectively with only an Arduino or similar 8-bit microcontroller. By following its conventions for translating ROS messages into simple serial byte sequences, robot builders don’t have to constantly reinvent this wheel. However, it is only really applicable when we are in control of both the computer and Arduino end of the communication. When one side is outside of our control — such as the case for LX-16A servos used on Sawppy — we can’t use the rosserial protocol and a custom node has to be created.

But while I couldn’t use rosserial for communication with the servos on my own Sawppy, I’ve seen it deployed for other Sawppy builds in different contexts. Rhys Mainwaring’s Curio rover ROS stack uses rosserial to communicate with its Arduino Mega, and Steve [jetdillo] has just installed a battery voltage monitor that reports via rosserial.

With its proven value to budget robotics, I’m sad to see it’s not currently in the cards for ROS2. Officially, the replacement for rosserial is micro-ROS built on the Micro XRCE-DDS Client. DDS is the standardized communication protocol used by ROS2, and XRCE stands for “eXtremely Resource Constrained Environment.” It’s an admirable goal to keep the protocol running with low resource consumption, but “low” is relative. Micro XRCE-DDS proudly listed its resource requirements as thus:

From the point of view of memory footprint, the latest version of this library has a memory consumption of less than 75 KB of Flash memory and 2.5 KB of RAM for a complete publisher and subscriber application.

If we look at the ATmega328P chip at the heart of a basic Arduino, we see it has 32KB of Flash and 2KB of RAM and that’s just not going to work. A straightforward port of rosserial was aborted due to intrinsic ties to ROS, but that Github issue still sees traffic because people want the thing that does not exist. [UPDATE: Now we have a ROS Discourse discussion thread about it too.]

I found a ros2arduino library built on Micro XRCE DDS, and was eager to see how it managed to fit on a simple ATmega328. Disappointingly, it doesn’t. The “Arduino” in the name referred to newer high end boards like the Arduino MKR ZERO, leaving the humble ATmega328 Arduino boards out in the cold.

As far as I can tell, this is by design. Much as how ROS2 has focused on 64-bit computing platforms over 32-bit CPUs, their “low end microcontroller support” is graduating from old school 8-bit chips to newer designs like the ESP32 and various ARM Cortex flavors such as the STM32 family. Given how those microcontrollers have fallen to single digit dollar amounts of cost, it’s hard to argue for the economic advantage of old 8-bit chips. (The processing power per dollar argument was lost long ago.) So even though the old 8-bit chips still hold some advantages, I can see the reasoning, and have resigned to accept it as the way of the future.

ROS2 Receives Cross Compile Love

While I’m on the topic of information on Raspberry Pi running ROS2, another one that I found interesting was that there is now an actual working group focused on tooling. And their pathfinder project is to make ROS2 cross compilation less painful than it was in ROS.

Cross compilation has always been an option for ROS developers, but it was never as easy as it could be and there were many places where it, to put things politely, fell short of expectations. The reason cross-compilation was interesting was, again, the inexpensive Raspberry Pi allowing us to put a full Linux computer on board our ROS robots.

A Raspberry Pi could run ROS but there were performance limitations, and these limitations had even more severe impact when trying to compile ROS code on board the Raspberry Pi itself. The quad cores sounded nice on paper, but before the Pi 4, there was only 1GB of RAM to go around and compilation quickly ate it all up. Every developer is tempted to run four compilers in parallel to optimize for the four cores, and most ROS developers (including this one) have tried it at least once. We quickly learn this was folly. As soon as the RAM was exhausted, the Pi went to virtual memory which was a microSD card, and performance drops off a cliff because they are not designed for random reads and writes. I frequently get better overall performance by limiting compilation to a single core and staying well within available RAM.

Thus the temptation to use cross compilation: use our powerful 64-bit desktop machines to compile ARM32 binaries. Or more specifically, ARMHF, the instruction set for 32-bit ARM processors with (H)ardware (F)loating-point like those on the Raspberry Pi. But the pains of doing so has never proven to be worth the effort.

While Raspberry Pi 4 is now available with up to 8GB of RAM along with a corresponding 64-bit operating system, that’s still a lot less than the memory available on a typical developer workstation. And a modern PC’s solid state drive is still far faster than a Pi’s microSD storage. So best wishes to the ROS2 tooling working group, I hope you can make cross compilation effective for the community.

Update on ARM64: ROS2 on Pi

When I last looked at running ROS on a Raspberry Pi robot brain, I noticed Ubuntu now releases images for Raspberry Pi in both 32-bit and 64-bit flavors but I didn’t know of any compelling reason to move to 64-bit. The situation has now changed, especially if considering a move to the future of ROS2.

The update came courtesy of an announcement on ROS Discourse notifying the community that supporting 32-bit ARM builds have become troublesome, and furthermore, telemetry indicated that very few ROS2 robot builders were using 32-bit anyway. Thus the support for that platform is demoted to tier 3 for the current release Foxy Fitzroy.

This was made official on REP 2000 ROS 2 Releases and Target Platforms showing arm32 as a Tier 3 platform. As per that document, tier 3 means:

Tier 3 platforms are those for which community reports indicate that the release is functional. The development team does not run the unit test suite or perform any other tests on platforms in Tier 3. Installation instructions should be available and up-to-date in order for a platform to be listed in this category. Community members may provide assistance with these platforms.

Looking at the history of ROS 2 releases, we can see 64-bit has always been the focus. The first release Ardent Apalone (December 2017) only supported amd64 and arm64. Support for arm32 was only added a year ago for Dashing Diademata (May 2019) and only at tier 2. They kept it at tier 2 for another release Eloquent Elusor (November 2019) but now it is getting dropped to tier 3.

Another contributing factor is the release of Raspberry Pi 4 with 8GB of memory. It exceeded the 4GB limit of 32-bit addressing. This was accompanied by an update to the official Raspberry Pi operating system, renamed from Raspbian to Raspberry Pi OS, is still 32-bit but with mechanisms to allow addressing 8GB of RAM across the operating system even though individual processes are limited to 3GB. The real way forward is to move to a 64-bit operating system, and there’s a beta 64-bit build of Raspberry Pi OS.

Or we can go straight to Ubuntu’s release of 64-bit operating system for Raspberry Pi.

And the final note on ROS2: a bunch of new tutorials have been posted! The barrier for transitioning to ROS2 is continually getting dismantled, one brick at a time. And it’s even getting some new attention in long problematic areas like cross-compilation.

Learning DOT and Graph Description Languages Exist

One of the conventions of ROS is the /cmd_vel topic. Short for “command velocity”, it is commonly how high-level robot planning logic communicates “I want to move in this direction at this speed” to lower-level robot chassis control nodes of a robot. In ROS tutorials, this is usually the first practical topic that gets discussed. This convention helps with one of the core promises of ROS: portability of modules. High level logic can be created and output to /cmd_vel without worrying about low level motor control details, and robot chassis builders know teaching their hardware to understand /cmd_vel allows them to support a wide range of different robot modules.

Sounds great in theory, but there are limitations in practice and every once a while a discussion arises on how to improve things. I was reading one such discussion when I noticed one message had an illustrative graph accompanied by a “source” link. That went to a Github Gist with just a few simple lines of text describing that graph, and it took me down a rabbit hole learning about graph description languages.

In my computer software experience, I’ve come across graphical description languages like OpenGL, PostScript, and SVG. But they are complex and designed for general purpose computer graphics, I had no idea there were entire languages designed just for describing graphs. This particular file was DOT, with more information available on Wikipedia including the limitations of the language.

I’m filing this under the “TIL” (Today I Learned) section of the blog, but it’s more accurately a “How did I never come across this before?” post. It seems like an obvious and useful tool but not adopted widely enough for me to have seen it before. I’m adding it to my toolbox and look forward to the time when it would be the right tool for the job, versus something heavyweight like firing up Inkscape or Microsoft Office just to create a graph to illustrate an idea.

First Few Issues of ROS on Ubuntu on Crouton on Chrome OS

Some minor wrong turns aside, I think I’ve successfully installed ROS Melodic on Ubuntu 18 running within a Crouton chroot inside a Toshiba Chromebook 2 (CB35-B3340). The first test is to launch roscore, verify it is up and running without errors, then run rostopic /list to verify the default set of topics are listed.

With that done, then next challenge is to see if ROS works across machines. First I tried running roscore on another machine, and set ROS_MASTER_URI to point to that remote machine. With this configuration, rostopic /list shows the expected list of topics.

Then I tried the reverse: I started roscore on the Chromebook and pointed another machine’s ROS_MASTER_URI to the IP address my WiFi router assigned to the Chromebook. In this case rostopic/list failed to communicate with master. There’s probably some sort of networking translation or tunneling between Chrome OS and an installation of Ubuntu running inside Crouton chroot, and that’s something I’ll need to dig into and figure out. Or it might be a firewall issue similar to what I encountered when running ROS under Windows Subsystem for Linux.

In addition to the networking issue, if I want to embed this Chromebook into a robot as its brain, I’ll also need to figure out power-up procedure.

First: upon power-up, a Chromebook in developer mode puts up a dialog box notifying the user as such, letting normal users know a Chromebook in developer mode is not trustworthy for their personal data. This screen is held for about 30 seconds with an audible beep, unless the user presses a key combination prescribed onscreen. How might this work when embedded in a robot?

Second: when Chrome OS boots up, how do I also launch Ubuntu 18 inside Crouton chroot? The good news is that this procedure is covered in Crouton wiki, the bad news is that it is pretty complex and involves removing a few more Chromebook security provisions.

Third: Once Ubuntu 18 is up and running inside Crouton chroot, how do I launch ROS automatically? My current favorite “run on bootup” procedure for Linux is to create a service, but systemctl does not run inside chroot so I’ll need something else.

And that’s only what I can foresee right now, I’m sure there are others I haven’t even thought about yet. There’ll be several more challenges to overcome before a Chrome OS machine can be a robot brain. Perhaps instead of wrestling with Chrome OS, I should consider bypassing Chrome OS entirely?

Ubuntu 18 and ROS on Toshiba Chromebook 2 (CB35-B3340)

Following default instructions, I was able to put Ubuntu 16 on a Chromebook in developer mode. But the current LTS (Longer Term Support) release for ROS (Robot Operating System) is their “M” or Melodic Morenia release whose corresponding Ubuntu LTS is 18. (Bionic Beaver)

As of this writing, Ubuntu 18 is not officially supported for Crouton. It’s not explicitly forbidden, but it does come with a warning: “May work with some effort.” I didn’t know exactly what the problem might be, but given how easy it is to erase and restart on a Chromebook I decided to try it and see what happens.

It failed failed with a hash sum failure during download. This wasn’t the kind of failure I thought might occur with an unsupported build, download hash sum failure seems more like a flawed or compromised download server. I didn’t understand enough about the underlying infrastructure to know what went wrong, never mind fixing it. So in an attempt to tackle a smaller problem with a smaller surface area, I backed off to the minimalist “cli-extra” install of Bionic which skips graphical user interface components. This path succeeded without errors, and I now have a command line interface that reported itself to be Ubuntu 18 Bionic.

As a quick test to see if hardware is visible to software running inside this environment, I plugged in a USB to serial adapter. I was happy to see dmesg reported the device was visible and accessible via /dev/ttyUSB0. Curiously, the owner showed up as serial group instead of the usual dialout I see on Ubuntu installations.

A visible serial peripheral was promising enough for me to proceed and install ROS Melodic. I thought I’d try installation with Python 3 as the Python executable, but that went awry. I then repeated installation with the default Python 2. Since I have no GUI, I installed the ros-melodic-ros-base package. Its installation completed with no errors, allowing me to poke around and see how ROS works in this environment.

Preparing For ROS 2 Transition Looks Complicated

Before I decided to embark on a ROS Melodic software stack for Sawppy, I thought about ignoring the long legacy of ROS 1 and going to the newer ROS 2 built on more modern infrastructure. I mean, I told people to look into it, so I should walk the walk right? Eventually I decided against putting Sawppy on ROS 2, the deal breaker was that the Raspberry Pi is not a tier 1 platform for ROS 2. This means there’s no guarantee on regular binary releases for it, or that it will always function. I may have to build my own arm32 binaries for Raspbian from source code, and I would be on my own to verify functionality. I’ve done a superficial survey of other candidates for a Sawppy brain, but for today Sawppy is still thinking with a Raspberry Pi.

But even after making that decision I wanted to keep ROS 2 in mind. Open Robotics has a  ROS 2 migration guide for helping ROS node authors navigate the transition, and it doesn’t look trivial to me. But then again, I don’t have the ROS expertise to accurately judge the effort involved.

The biggest headache for some nodes will be the lack of Python 2 support. Mainly impact ROS nodes with a long legacy of Python 2 code, it does not impact a new project written against ROS Melodic which is supposed to support Python 3.

The next headache is the fact that it’s not possible to write if/else blocks to allow a single ROS node to simultaneously support ROS 1 and 2. The recommendation is to put all specialized logic into generic non-ROS-specific code in a library that can be shared. Then have separate code tailored to the infrastructure paradigms of ROS and ROS 2. This way all the code integrating with a ROS platform can be separated, but calling into a shared library.

And it also sounds like the ROS/ROS 2 build systems conflict so they can’t even coexist side by side at the same time. Different variants of a node have to live in separate branches of a repository, with the shared library code merged across branches as development continues. Leaving ROS/ROS 2 specific infrastructure code live in their separate branches.

I can see why a vocal fraction of ROS developers are unhappy with this “best practice”. And since ROS is open source, I foresee one or more groups joining forces to keep ROS 1 alive and working with old code even as Open Robotics move on to ROS 2. Right now there are noises being made from people who proclaims to do a similar thing, saying they’ll keep Python 2 alive past official EOL. In a few years we can look back and see if those Python 2 holdouts actually thrived, and we can also see how the ROS 1/ROS 2 situation has evolved.

First ROS 2 LTS Has Arrived, Let’s Switch

Making a decision to go explore the less popular path of smarter software for imperfect robot hardware has a secondary effect: it also means I can switch to ROS 2 going forward. One of the downsides of going over to ROS 2 now is that I lose access to the vast library of open ROS nodes freely available online. But if I’ve decided I’m not going to use most of them anyway, there’s less of a draw to stay in the ROS 1 ecosystem.

ROS 2 offers a lot of infrastructure upgrades that should be, on paper, very helpful for work going forward. First and foremost on my list is the fact I can now use Python 3 to write code for ROS 2. ROS 1 is coupled to Python 2, whose support stops in January 2020 and there’s been a great deal of debate in ROS land on what to do about it. Open Robotics has declared their future work along this line is all Python 3 on ROS 2. So the community has been devising various ways to make Python 3 work on ROS 1. Switching to ROS 2 now let’s me use Python 3 in a fully supported manner, no workarounds necessary.

And finally, investing in learning ROS 2 now has a much lower risk of having that time be thrown away by a future update. ROS 2 Dashing Diademata has just been released, and it is the first longer term service (LTS) release for ROS 2. I read this as a sign that Open Robotics is confident the period of major code churn for ROS 2 is coming to an end. No guarantees, naturally, especially if they learn of something that affects long term viability of ROS 2, but the odds have dropped significantly with evolution over the past few releases.

The only detraction for my personal exploration is the fact that ROS 2 has not yet released binaries for running on Raspberry Pi. I could build my own Raspberry Pi 3 version of ROS 2 from open source code, but I’m more likely to use the little Dell Inspiron 11 (3180) I had bought as candidate robot brain. It is already running Ubuntu 18.04 LTS on an amd64 processor, making it a directly supported Tier 1 platform for ROS 2.

Let’s Learn To Love Imperfect Robots Just The Way They Are

A few months ago, as part of preparing to present Sawppy to the Robotics Society of Southern California, I described a few of the challenges involved in putting ROS on my Sawppy rover. That was just the tip of the iceberg and I’ve been thinking and researching in this problem area on-and-off over the past few months.

Today I see two divergent paths ahead for a ROS-powered rover.

I can take the traditional route, where I work to upgrade Sawppy components to meet expectations from existing ROS libraries. It means spending a lot of money on hardware upgrades:

  • Wheel motors that can deliver good odometry data.
  • Laser distance scanners faster and more capable than one salvaged from a Neato vacuum.
  • Depth camera with better capabilities than a first generation Kinect
  • etc…

This conforms to a lot of what I see in robotics hardware evolution: more accuracy, more precision, an endless pursuit of perfection. I can’t deny the appeal of having better hardware, but it comes at a steeply rising cost. As anyone dealing with precision machinery or machining knows, physical accuracy costs money: how far can you afford to go? My budget is quite limited.

I find more appeal in pursuing the nonconformist route: instead of spending ever more money on precision hardware, make the software smarter to deal with imperfect mechanicals. Computing power today is astonishingly cheap compared to what they cost only a few years ago. We can add more software smarts for far less money than buying better hardware, making upgrades far more affordable. It is also less wasteful: retired software are just bits, while retired hardware gather dust sitting there reminding us of past spending.

And we know there’s nothing fundamentally wrong with looking for a smarter approach, because we have real world examples in our everyday life. Autonomous vehicle research brag about sub-centimeter accuracy in their 3D LIDAR… but I can drive around my neighborhood without knowing the number of centimeters from one curb to another. A lot of ROS navigation is built on an occupancy grid data structure, but again I don’t need a centimeter-aligned grid of my home in order to make my way to a snack in the kitchen. We might not yet understand how it could be done with a robot, but we know the tasks are possible without the precision and accuracy demanded by certain factions of robotics research.

This is the path less traveled by, and trying to make less capable hardware function using smarter software would definitely have their moments of frustration. However, the less beaten path is always a good place to go looking for something interesting and different. I’m optimistic there will be rewarding moments to balance out those moments of frustration. Let’s learn to love imperfect robots just the way they are, and give them the intelligence to work with what they have.

Researching Simulation Speed in Gazebo vs. Unity

In order to train reinforcement learning agents quickly, we want our training environment to provide high throughput. There are many variables involved, but I started looking at two of them: how fast it would be to run a single simulation, and how easy it would be to run multiple simulation in parallel.

The Gazebo simulator commonly associated with ROS research projects has never been known for its speed. Gazebo environment for the NASA Space Robotic Challenge was infamous for slowing far below real time speed. Taking over 6 hours to simulate a 30 minute event. There are ways to speed up Gazebo simulation, but this forum thread implies it’s unrealistic to expect more than 2-3 times as fast as real time speed.

In contrast, Unity simulation can be cranked all the way up to 100 times real time speed. It’s not clear where the maximum limit of 100 comes from, but it is documented under limitations.md. Furthermore, it doesn’t seem to be a theoretical limit no one can realistically reach – at least one discussion on Unity ML Agents indicate people do indeed crank up time multiplier to 100 for training agents.

On the topic of running simulations in parallel, with Gazebo such a resource hog it is difficult to get multiple instances running. This forum thread explains it is possible and how it could be done, but at best it still feels like shoving a square peg in a round hole and it’ll be a tough act to get multiple Gazebo running. And we haven’t even considered the effort to coordinate learning activity across these multiple instances.

Things weren’t much better in Unity until recently. This announcement blog post describes how Unity has just picked up the ability to run multiple simulations on a single machine and, just as importantly, coordinate learning knowledge across all instances.

These bits of information further cements Unity as something I should strongly consider as my test environment for playing with reinforcement learning. Faster than real time simulation speed and option for multiple parallel instances are quite compelling reasons.

 

Making Neato Robot ROS Package More Generally Usable

Neato mini USB cable connection to laptopNow that I have some idea of what happens inside ROS package neato_robot, and modified it to (mostly) run on my Neato vacuum, I thought I’d look in its Github repository’s list of issues to see if I understand what people have said about it. I didn’t have to look very far: top of its open issues list is “Robot compatibility.”

The author agrees this is a good idea but it has to be done by someone with access to other Neato units, pull requests are welcome. This comment was dated four years ago but my experience indicates no pull requests to improve compatibility were ever merged into this code.

But even though modifications were never merged back into this branch, they are still available. I just had to be pointed to know where to look. That comment thread referenced this particular commit which eliminates those problematic fixed arrays of expected response strings. Now it will read a Neato’s responses without trying to match it up against a preset list. Of course, if a specific response is still required (like LeftWheel_PositionInMM and RightWheel_PositionInMM) all it could do is ensure code will not hang, it is not capable of inferring requested data from other robot responses.

But possibly more interesting than code changes is this comment block:

This driver reads responses until it receives a control-z. Neato Robotics has
documented that all responses have a control-Z (^Z) at the end of the
response string: http://www.neatorobotics.com.au/programmer-s-manual
CTRL_Z = chr(26)

That URL implies it was a link to some type of a programmer’s reference manual, but unfortunately that link is now dead. Still, this claim of control-Z termination (if true) gives me ideas on how I would approach talking to a Neato with my own code.

Neato Robot ROS Package Splits Laser Scan Read Operations

Neato mini USB cable connection to laptopI want to understand how the neato_robot ROS package works, and debugging its hang inside getMotors() was a great first step. Now that I’m past the problem of mismatching responses between author’s Neato robot vacuum and mine, I started looking at other parts of neato_driver.py for potential explanations to its new (possibly timing-related) hang.

Since I had just examined getMotors(), I looked at its counterpart setMotors() and found nothing remarkable. getAnalogSensors(), getDigitalSensors(), and getCharger() used the exact pattern as getMotors() which meant they share the same fragility against different Neato responses but they’ll work for now. setBacklight() is nearly trivial.

That leaves the two methods for reading Neato’s laser distance scanner. Wait, two? Yes, and that’s where my concern lies. Other data retrieval methods like getMotors() would issue a command and then parse its response before returning to caller. For the laser distance scanner, this is split into a requestScan() which issues a getldsscan command and immediately returns. Reading and parsing laser distance data is done in a separate method getScanRanges() which the caller is expected to call later.

Why was this code structured in such a manner? My hypothesis is this was motivated by performance. When a getldsscan command is issued over serial, a lot of data is returned. There are 360 lines of data, one for each degree of laser scanning with distance and intensity information, plus a few more lines of overhead. This is far more than any of the other data retrieval methods. So rather than wait for all that data to transmitted before it could be processed, this two-call pattern allows the caller to go off and do something else. The transmitted data is trusted to be buffered and waiting in serial communication module.

But this only works when the caller is diligent about making sure these two calls always follow each other, with no chance for some other command to accidentally get in between. If they fail to do so, everything will fall out of whack. Would that cause the hang I’ve observed? I’m not sure yet, but it would be better if we didn’t even have to worry about such things.

Neato Robot ROS Package Expects Specific Response But Responses Actually Differ Between Neato Units

Neato mini USB cable connection to laptop

I got the outdated neato_robot ROS package mostly working just by adding a timeout to its serial communications. But this only masked the symptom of an unknown problem with no understanding of why it failed. To understand what happened, I removed the timeout and add the standard Python debugging library to see where it had hung.

import pdb; pdb.set_trace()

I found the hang was getMotors() in neato_driver.py. It is waiting for my Neato to return all the motor parameters specified in the list xv11_motor_info. This list appears to reflect data returned by author’s Neato robot vacuum, but my Neato returns a much shorter list with only a partial overlap. Hence getMotors() waits forever for data that will never come. This is a downside of writing ROS code without full information from hardware maker: We could write code that works on our own Neato, but we would have no idea how responses differ across different robot vacuums, or how to write code to accommodate those variations.

Turning attention back to this code,  self.state[] is supposed to be filled with responses to the kind of data listed in xv11_motor_info.  Once I added a timeout, though, getMotors() breaks out of its for loop with incomplete data in self.state[].  How would this missing information manifest? What behavior does it change for the robot?

Answer: it doesn’t affect behavior at all. At the end of getMotors()we see that it only really cared about two parameters: LeftWheel_PositionInMM and RightWheel_PositionInMM. Remainder parameters are actually ignored. Happily, the partial overlap between author’s Neato and my Neato does include these two critical parameters, and that’s why I was able to obtain /odom data running on my Neato after adding a timeout. (Side note: I have only looked to see there is data – I have not yet checked to see if /odom data reflects actual robot odometry.)

Next I need to see if there are other similar problems in this code. I changed xv11_motor_info list of parameters to match those returned by my Neato. Now getMotors() will work as originally intended and cycle through all the parameters returned by my Neato (even though it only needs two of them.) If this change to neato_robot package still hangs without a timeout, I know there are similar problems hiding elsewhere in this package. If my modification allow it to run without a timeout, I’ll know there aren’t any others I need to go hunt for.

Experiment result: success! There are no other hangs requiring a timeout to break out of their loop. This was encouraging, so I removed import pdb.

Unfortunately, that removal caused the code to hang again. Unlike the now-understood problem, adding a timeout does not restore functionality. Removal of debugger package isn’t supposed to affect behavior, but when it does, it usually implies a threading or related timing issue in the code. This one will be annoying as the hang only manifests without Python’s debugging library, which meant I’d have to track it down without debugger support.

Neato Robot ROS Package Runs After Adding Serial Communication Timeout

Neato mini USB cable connection to laptop

The great thing about ROS is that it is a popular and open platform for everyone to work with, resulting in a large ecosystem of modules that cover a majority of robot hardware. But being so open to everyone does have a few downsides, because not everyone has the same resources to develop and maintain their ROS modules. This means some modules fall out of date faster than others, or works with its author’s version of hardware but not another version.

Such turned out to be the case for the existing neato_robot ROS package, which was created years ago and while the author kept updated it through four ROS releases, that maintenance effort stopped and so this code is now five releases behind latest ROS. That in itself is not necessarily a problem – Open Source Robotics Foundation tries to keep ROS backwards compatible as much as they can – but it does mean potential for hidden gotchas.

When I followed instructions for installation and running, the first error message was about configuring the serial port which was fine. But after that… nothing. No information published to /odom or /base_scan, and no response to command sent to /cmd_vel.

Digging into the source code, the first thing to catch my attention was that the code opened a serial port for communication but did not set provision for timeout. Any problem in serial communication will cause it to hang, which is exactly what it is doing now. Adding a timeout is a quick and easy way to test if this is actually related.

Existing:

self.port = serial.Serial(port,115200)

Modified:

self.port = serial.Serial(port,115200,timeout=0.01)

And that was indeed helpful, restoring nominal functionality to the ROS node. I could now drive the robot vacuum by sending commands to /cmd_vel and I can see a stream of data published to /odom and /base_scan.

I briefly contemplated being lazy and stopping there, but I thought I should try to better understand the failure. Onward to debugging!