Unity-Python Communication for ML-Agents: Good, Bad, and Ugly

I’ve only just learned that Unity DOTS exists and it seems like something interesting to learn as an approach for utilizing resources on modern multicore computers. But based on what I’ve learned so far, adopting DOTS by itself won’t necessarily solve the biggest bottleneck in Unity ML-Agents as per this forum thread: the communication between Unity and Python.

Which is unfortunate, because this mechanism is also a huge strength of this system. Unity is a native code executable with modules written in C# and compiled, while deep learning neural network frameworks like TensorFlow and PyTorch runs under a Python interpreted environment. The easiest and most cross-platform friendly way for these two types of software to interact is via network ports even though data is merely looped back to the same computer and not sent over a network.

With a documented communication protocol, it allowed ML-Agents components to evolve independently as long as they conform to the same protocol. This was why they were able to change the default deep learning framework from TensorFlow to PyTorch between ML-Agents version 1.0 and 2.0 but without breaking backwards compatibility. (They did it in release 10, in case it’s important) Developers who prefer to use TensorFlow could continue doing so, they are not forced to switch to PyTorch as long as everyone talks the same language.

Functional, capable, flexible. What’s not to love? Well, apparently “performance”. I don’t know the details for Unity ML-Agents bottlenecks but I do know “fast” for a network protocol is a snail’s pace compared to high performance inter-process communications mechanisms such as shared memory.

To work around the bottleneck, the current recommendations are to manually stack things up in parallel. Starting at the individual agent level: multiple agents can train in parallel, if the environment supports it. This explains why the 3D Ball Balancing example scene has twelve agents. If the environment doesn’t support it, we can manuall copy the same training environment several times in the scene. We can see this in the Crawler example scene, which has ten boxes one for each crawler. Beyond that, we now have the capability to run multiple Unity instances in parallel.

All of these feel… suboptimal. The ML-Agents team is aware of the problem and working on solutions but have nothing to announce yet. I look forward to seeing their solution. In the meantime, learning about DOTS has sucked up all of my time. No, not learning… I got sucked into Hardspace:Shipbreaker, a Unity game built with DOTS.

Unity DOTS = Data Oriented Technology Stack

Looking over resources for Unity ML-Agents toolkit for reinforcement learning AI algorithms, I’ve come across multiple discussion threads about how it has difficulties scaling up to take advantage of modern multicore computers. This is not just a ML-Agents challenge, this is a Unity-wide challenge. Arguably even a software development industry-wide challenge. When CPUs stopped getting faster clock rates and started gaining more cores, games have had problem taking advantage of them. Historically while a game engine is running, there is one CPU core running at max. The remaining cores may help out with a few auxiliary tasks but mostly sit idle. This is why gamers have been focused on single-core performance in CPU benchmarks.

Having multiple CPUs running in parallel isn’t new, nor are the challenges of leveraging that power in full. From established software toolkits to leading edge theoretical research papers, there are many different approaches out there. Reading these Unity forum posts, I learned that Unity is working on a big revamp under the umbrella of DOTS: Data-Oriented Technology Stack.

I came across the DOTS acronym several times without understanding what it was. But after it came up in the context of better physics simulation and a request for ML-Agents to adopt DOTS, I knew I couldn’t ignore the acronym anymore.

I don’t know a whole lot yet, but I’ve got the distinct impression that working under DOTS will require a mental shift in programming. There were multiple references to Unity developers saying it took some time for the concepts to “click”, so I expect some head-scratching ahead. Here are some resources I plan to use to get oriented:

DOTS is Unity’s implementation of Data-oriented Design, a more generalized set of concepts that helps write code that will run well on modern machines with many cores and multiple levels of memory caches. An online eBook for Data-oriented Design is available, which might be good to read so I can see if I want to adopt these concepts in my own programming projects outside of Unity.

And just to bring this full circle: it looks like the ML-Agents team has already started DOTS work as well. However it’s not clear to me how DOTS will (or will not) help with the current gating performance factor: Unity environment’s communication with PyTorch (formerly TensorFlow) running in a Python environment.

Miscellaneous Gems from ML-Agents Resources

I browsed through ML-Agents GitHub issues and forums looking for an explanation why there hasn’t been a release in half a year, I came up empty handed on an answer. But the time is not all wasted since I found a few other scattered tidbits that might be useful for the future.

The simulation time scale used in most examples is 20, and is the default used by the mlagents-learn script. If the script is not used, the simulation runs in real time and will feel very slow. The tradeoff here is accuracy of physics simulation, as per the comment “If you go too fast, the physics gets kind of wonky, and sometimes objects/agents will go through each other.”

The official “Hummingbird” tutorial on Unity Learn targets the LTS build of Unity with ML-Agents version 1. Looks like the goal is to update it to work with a new release of ML-Agents, and instructions to get a preview has been posted.

Issue #4129 is pretty old but there is a lot of detail here about why Unity ML-Agents doesn’t necessarily benefit from GPU accelerated neural network training. Since then, ML-Agents has switched from TensorFlow to PyTorch but many of the points might still apply.

But never mind the GPU, ML-Agents can’t even make full use of multicore CPUs like this person’s 16-core Threadripper. The Unity person who responded explained this is on the list of things to improve but there’s nothing to show yet. In the meantime, there are workarounds.

Also on the subject of utilizing parallel hardware, here’s a request for Unity to use Google’s Brax physics engine. Based on my experience I was certain the answer was “No” (the physics engine isn’t something that can “just” be switched around) but this question was valuable for two reasons. One, it led me to look up and learn about Google Brax, a physics engine for reinforcement learning that runs on Google TPUs and CUDA GPUs. And second, Unity actually is investing in the work towards a faster physics engine “for the DOTS platform.” Um… what’s DOTS? Time for some reading.

Browsing ML-Agents Resources: GitHub and Forums

I had taken a quick look back through evolution of Unity’s ML-Agents from initial prerelease announcement to the present day. The features have been good, but looking at the dates highlighted a conspicuous gap: there have been no releases or announcements in the second half of 2021. This is notable because ML-Agents had been on a rapid releases schedule, with versions coming out every few weeks, since the original prerelease announcement. But things had come to a screeching halt — why?

[Update: Release 19 became available on 2022/1/14, but I have yet to find information on why there was a long delay between 18 and 19.]

In the absence of official announcements, I went poking around through other publicly available information. The first resource was a brief look at the GitHub commit history for the ML-Agents repository. We’ve been warned the “main” branch can be unstable and I thought that meant it was the active development branch. This is apparently false, as I found hints of additional working branches that are out of public view. For example, the original commit for “Deterministic actions” mentioned “release 19” even though the latest release is currently 18. These mentions of “release 19” had to be removed as merge #5637. In the description, it mentioned the branch “release_19_docs” which is not visible to us.

With development progress partially obscured from view, I moved on to looking at currently open GitHub issues. When doing a first glance at an unfamiliar repository, I first look at the most recent items (the default view) and then I sort by “Most Commented” to see the most popular items. Nothing caught my eye as a likely reason. However, it did eliminate the possibility that Unity had killed the project, because team members are still responding to issues. So that’s comforting.

The next resource was to browse Unity Forum for ML-Agents. Scrolling through issues between June and now, I saw multiple references to memory leaks. (“Unity ML 2.0.0 Memory leak” “Training is more and more slow” “Memory leak using imitation learning“) Returning to issues list, I saw #5458 “Memory Leak” is currently open for investigation. However, there’s no comment one way or another if this bug (or another) is holding up release 19, so my quest came up empty-handed. However, I did get the idea if I run into a memory leak with release 18, I can try downgrading to release 17 to see that helps. Plus I came across a bunch of other interesting pieces of information.

Notes on ML-Agents Development History (Part 2: Version 1.0 to Present)

Looking back at Unity blog posts and GitHub release notes, we can see ML-Agent’s evolution during the prerelease beta phase. From initial announcement leading up to an official version 1.0, they added many features promised in the original announcement, and made big architectural changes like how brains fit in the object hierarchy of a Unity project.

On 2020/5/12, ML-Agents reached an official version 1.0, with a package organization that is covered by version compatibility guarantees going forward in the future. This guarantee is significant, because it means users can have better confidence their own projects will function. It also means more work for Unity because any future large-scale architectural changes will have to be made in a compatible way.

Another change in ML-Agents development is that they’re no longer writing a Unity blog post for every release. I had thought this merely reflected a slower, more deliberate development with fewer changes to announce, but looking over release notes I still see plenty of significant changes. Given this, I suspect the lower blog traffic reflect a change in customer communication priorities inside the Unity organization. Perhaps they’ve moved on to YouTube videos or something? If so, that would be a shame, as I prefer the written word.

In any case, ML-Agents GitHub repository release notes made it clear development continued rapidly:

  • Release 2 (2020/5/20) has minor fixes and the current “Verified” build.
  • Release 3 (2020/6/10)
  • Release 4 (2020/7/15) added parameter randomization
  • Release 5 (2020/7/31)
  • Release 6 (2020/8/17) updated version requirements: Python now 3.6.1 and NumPy now 1.19.0 in sync with TensorFlow
  • Release 7 (2020/9/21) IActuator abstract classes for generic action spaces. Initial PyTorch implementation.
  • Release 8 (2020/10/14)
  • Release 9 (2020/11/3)
  • Release 10 (2020/11/19) Match3 environment (ML-Agents play Bejeweled!) and PyTorch is now the default.
  • Release 11 (2020/12/21)
  • Release 12 (2020/12/22)

The above releases were summarized in the ML-Agents 2020 End of Year recap blog post, and development continued through 2021:

  • Release 13 (2021/2/24) TensorFlow removed. (--torch-device=cpu to tell PyTorch to use CPU for training. This will be useful later.)
  • Release 14 (2021/3/8)
  • Release 15 (2021/3/17) BufferSensor for agents to observe variable number of entities. MultiAgentGroup interface for training multiple different agents simultaneously, and MA-POCA trainer for them.
  • Release 16 (2021/4/13)
  • Release 17 (2021/4/27): Minimum Unity up to 2019.4. API breaking changes. Multiple behaviors via HyperNetworks
  • Release 18 (2021/6/9): Added colab notebooks.

The version and API changes for release 17 smelled like preparation for a new version, which was confirmed by a blog post talking about training complex cooperative behaviors. This is all very exciting stuff, but I noticed development activity came to a screeching halt. After years of releases every few weeks (sometimes multiple times in a single month) there hasn’t been anything in the second half of 2021. I don’t know why but I poked around a bit to see if I can find clues.

[Update: Release 19 became available on 2022/1/14.]

Notes on ML-Agents Development History (Part 1: Up to Version 1.0)

I’ve just installed and tested basic functionality of Unity ML-Agents Release 18. And just before that, I did the same with Release 2 which is also referred to as “Verified 1.0.8”. I was surprised at the changes visible just between these releases. This made me curious about how this package evolved, and I went looking for information from its past.

Most of them were announced on Unity blog, but some just had GitHub release notes. Here is a compilation of links alongside a few highlights that caught my eye, follow these links for a complete list of changes:

2017/6/26: The earliest public information I could find was Unity announcing their intent to join in AI research and applications. Annoyingly, some of the linked blog posts have since disappeared, apparently in some sort of migration of their blog hosting system. For example the “second part of this blog series” link now leads to a 404 error.

2017/9/18: The ML-Agents Toolkit officially kicks off with version 0.1, describing a general architecture that I’m sure has since evolved and a long list of ambitious ideas they wanted to support. Many of them did come to be! Though of course not all of them, and some have since disappeared.

2017/12/8: Version 0.2 introduced curriculum learning, and launched a community challenge to motivate people to play with the toolkit.

2018/3/15: Version 0.3 introduced imitation learning, multi-brain training, and an optional poll model. Recurrent Neural Networks came in as part of a “Memory-Enhanced Agents” umbrella.

2018/6/18: Version 0.4 allowed training using the Unity editor, no longer requiring a compiled executable. An Udacity nanodegree was introduced, though sadly that’s too rich for my blood. More training environments were added, one (Pyramids) specifically demonstrates the “Curiosity” capability. Curiosity got its own blog post.

2018/9/11: Version 0.5 added a Gym interface and replicating a few environments from OpenAI Gym. Also expanded capability to enable/disable discrete actions, but not clear if it was related to OpenAI Gym.

2018/12/17: Version 0.6 is an architectural revamp changing how ml-agents AI brains fit in the Unity object hierarchy. Introduced “demonstration recorder” for off-line imitation learning. Is that still around?

2019/3/1: Version 0.7 is another big infrastructure change, switching runtime neural network inference from external TensorFlowSharp to Unity’s own Inference Engine (a.k.a. Barracuda) to support more Unity runtime platforms.

2019/4/15: Version 0.8 infrastructure change allows multiple Unity simulations to run in parallel on a single machine. Strange this is the recommended approach to take advantage of machines with many processing cores. (Later research found that Unity is working to improve multicore performance across the board, not just ml-agents, with something called DOTS.)

2019/8/1: Version 0.9 (release notes) is the first of two releases focused on throughput and efficiency.

2019/9/30: Version 0.10 finished what 0.9 started. Improving sample throughput (asynchronous environments) and sample efficiency via GAIL (0.9) and SAC (0.10) algorithms.

2019/11/4: Version 0.11 (release notes) changed again the brain’s place in Unity object hierarchy.

2019/12/2: Version 0.12 (release notes) moved from TensorFlow 1 to 2 via the TF1 compatible interfaces. It appears this work was never finished, ml-agents moved to PyTorch instead of finishing TF2 migration.

2020/1/8: Version 0.13 (release notes)

2020/2/28: Version 0.14 now has ability to train via adversarial self-play. Includes a short history of learning from self-play.

2020/3/6: Not a version, but this is when ml-agents got serious enough to get a course up on Unity Learn (Hummingbirds) as well an “AI for Beginners” course on Unity Learn Premium.

2020/3/18: Version 0.15 (release notes) wrapped up a lot of housekeeping in preparation for 1.0 release.

Development focus for ml-agents changed to more refinement after 1.0 release, along with corresponding reduction in blog announcements.

Notes on Installing Unity ML-Agents (Release 18)

I’m dipping my toes into playing with deep reinforcement learning via Unity’s ML-Agents package. I made my first run with the safest most mature option “Verified Package 1.0.8”, which mapped to Release 2 by the ML-Agents repository versioning scheme. No problems were encountered during installation and I was able to run the 3D balancing ball project in the Getting Started guide. From there I could either explore Release 2 further or try a more adventurous release. I chose the latter and proceeded to install ML-Agents Release 18.

Doing this experiment on the same machine meant I had to keep the two installations separate. Unity Hub is already well-suited to keeping distinct versions isolated so they could run in parallel (Unity 2019.4.25f1 for release 18) though there’s a potential point of conflict if Unity editors required different versions of Visual Studio Community Edition for editing code. On the Python side, Anaconda is well suited to keep Python environments separate. Since files are referenced by directory, though, I cloned the ml-agents GitHub repository separately for each release instead trying to switch back and forth within the same directory.

I very much appreciated Unity for their project documentation, as my installation and Getting Started process went just as smoothly. I didn’t expect to notice much different between release 2 and 18, but even just in installation and Getting Started I saw they’ve made changes. The biggest one that caught my eyes is that ml-agents switched from TensorFlow to PyTorch sometime during this time. There are other smaller changes, the most welcome one to me is a much more comprehensive collection of example configurations in the release_18 /config/ subdirectory. Release 2 had only a handful of files, release 18 had a far larger directory tree to let people (like me) have more than one starting point.

I’m not quite sure where to go from here, but given how well documented ml-agents appears to be, I thought it would be interesting to take a quick look back to see where they’ve been.

Notes on Installing Unity ML-Agents (Release 2)

I thought it would be fun to play with reinforcement learning via Unity ML-Agents. The official product landing page sends us to the ml-agents repository on GitHub. And just like every other repository, it’s always a good idea to look over the README.md to understand their branch organization. Especially before we start cloning anything.

And indeed, the README includes a handy chart of releases. As of this writing there are eight releases listed plus main which is labeled as unstable. I’m glad I didn’t blindly clone main! Of the eight stable releases, six are named “Release 13” to “Release 18” inclusive. The final two are named “Verified Package 1.0.7” and “Verified Package 1.0.8”.

The “Verified” label is a guarantee of safety in the lifecycle of Unity packages. Therefore the most recent release with the highest guarantee of functionality is “Verified Package 1.0.8”. In Unity’s world, these verified packages are good enough for commercial production use. If our needs aren’t quite that rigorous, we can use the builds labeled “Release.” These numbers are explained in the ML-Agents versioning page, and it’s something we can play with if we aren’t shouldering the weight of commercial Unity production.

I think I’m fine to play with more recent “Release” builds, but I wanted to start with the most guarantee build to make sure I can at least get that working. Which meant cloning the build labeled “Verified Package 1.0.8” and that maps to “Release 2.”

In order to open the Unity Project that is a part of this release, I wanted to get the version of Unity that exactly matched the version number in ProjectVersion.txt: 2018.4.17f1. If I tried to install Unity 2018 in Unity Hub, it offers me 2018.4.36f1 because that was the most recent supported version. In order to match version, I had to click the download archive link and look for 2018.4.17 under 2018 builds. (It was released February 11th, 2020.) Once found, I could click the “Unity Hub” button to prompt Unity Hub to install the build on my machine.

While Unity installed, I cloned the repository tagged release_2 and installed corresponding Python binaries. I encountered no problems following installation directions for this release, though there were slight modifications as I used Anaconda Individual to manage my Python virtual environments. I had the option of installing the locally cloned versions of the ml-agents-envs and ml-agents packages and I did so. I noticed that the installation had TensorFlow in CPU-only mode, but running without GPU acceleration is perfectly OK for a starting point.

Once Unity Editor 2018.4.17 was installed, I used it to open the Project directory of my cloned release_2 repository. It opened without errors. I proceeded to the Getting Started Guide for this release and verified I had basic functionality both running the pretrained 3D Balance Ball model and training a model of my own. The training was pretty quick, it took just under 8 minutes on the Core i5-7300HQ CPU of my Dell 7577 laptop plugged in to an AC power adapter.

Encouraged by this success, I proceeded to try Release 18 as well.

Switching Back to Unity ML-Agents

It was quite enlightening for me to read Deep Reinforcement Learning Doesn’t Work Yet. And to be honest, a little depressing as well. I was vaguely aware of the challenges involved but only in a general sense. Just small tidbits here and there over the past few years, as I looked at this field with interest. Now that I finally got around to looking at reinforcement learning in more detail, I realized that it was overly optimistic of me to expect all major problems to have been solved by now.

My original motivation for getting into reinforcement learning was to make my Sawppy an autonomous rover. Based on what I’ve learned so far, my original hopes for Sawppy intelligence via reinforcement learning is extremely ambitious and still quite far away. If I want to do some deep RL projects more likely to succeed in the near term, I probably shouldn’t put it on a real physical rover. In all likelihood, whatever can be accomplished on a real robot using deep reinforcement learning could be done faster and more easily with some other AI technique.

It would certainly be nice if some aspect of Sawppy intelligence will eventually result in a research project that can contribute to the state of the art. But I’m not so arrogant as to assume I can accomplish that feat and certainly not as my first project in reinforcement learning. I’ll aim for something simple as my starting point. Got to crawl before I can walk, and all that.

Transferring reinforcement learning from a simulator to work in the real world is still a lot to tackle. So I’m going to look at a simulated world and stay within that simulated world while I learn the ropes. And before I can realistically think about contributing to algorithm advancements, I should get familiar with applying existing implementations of reinforcement learning. All of these new priorities turned my attention back to the game world of Unity ML-Agents.