Notes on ML-Agents Development History (Part 2: Version 1.0 to Present)

Looking back at Unity blog posts and GitHub release notes, we can see ML-Agent’s evolution during the prerelease beta phase. From initial announcement leading up to an official version 1.0, they added many features promised in the original announcement, and made big architectural changes like how brains fit in the object hierarchy of a Unity project.

On 2020/5/12, ML-Agents reached an official version 1.0, with a package organization that is covered by version compatibility guarantees going forward in the future. This guarantee is significant, because it means users can have better confidence their own projects will function. It also means more work for Unity because any future large-scale architectural changes will have to be made in a compatible way.

Another change in ML-Agents development is that they’re no longer writing a Unity blog post for every release. I had thought this merely reflected a slower, more deliberate development with fewer changes to announce, but looking over release notes I still see plenty of significant changes. Given this, I suspect the lower blog traffic reflect a change in customer communication priorities inside the Unity organization. Perhaps they’ve moved on to YouTube videos or something? If so, that would be a shame, as I prefer the written word.

In any case, ML-Agents GitHub repository release notes made it clear development continued rapidly:

  • Release 2 (2020/5/20) has minor fixes and the current “Verified” build.
  • Release 3 (2020/6/10)
  • Release 4 (2020/7/15) added parameter randomization
  • Release 5 (2020/7/31)
  • Release 6 (2020/8/17) updated version requirements: Python now 3.6.1 and NumPy now 1.19.0 in sync with TensorFlow
  • Release 7 (2020/9/21) IActuator abstract classes for generic action spaces. Initial PyTorch implementation.
  • Release 8 (2020/10/14)
  • Release 9 (2020/11/3)
  • Release 10 (2020/11/19) Match3 environment (ML-Agents play Bejeweled!) and PyTorch is now the default.
  • Release 11 (2020/12/21)
  • Release 12 (2020/12/22)

The above releases were summarized in the ML-Agents 2020 End of Year recap blog post, and development continued through 2021:

  • Release 13 (2021/2/24) TensorFlow removed. (--torch-device=cpu to tell PyTorch to use CPU for training. This will be useful later.)
  • Release 14 (2021/3/8)
  • Release 15 (2021/3/17) BufferSensor for agents to observe variable number of entities. MultiAgentGroup interface for training multiple different agents simultaneously, and MA-POCA trainer for them.
  • Release 16 (2021/4/13)
  • Release 17 (2021/4/27): Minimum Unity up to 2019.4. API breaking changes. Multiple behaviors via HyperNetworks
  • Release 18 (2021/6/9): Added colab notebooks.

The version and API changes for release 17 smelled like preparation for a new version, which was confirmed by a blog post talking about training complex cooperative behaviors. This is all very exciting stuff, but I noticed development activity came to a screeching halt. After years of releases every few weeks (sometimes multiple times in a single month) there hasn’t been anything in the second half of 2021. I don’t know why but I poked around a bit to see if I can find clues.

[Update: Release 19 became available on 2022/1/14.]

Notes on ML-Agents Development History (Part 1: Up to Version 1.0)

I’ve just installed and tested basic functionality of Unity ML-Agents Release 18. And just before that, I did the same with Release 2 which is also referred to as “Verified 1.0.8”. I was surprised at the changes visible just between these releases. This made me curious about how this package evolved, and I went looking for information from its past.

Most of them were announced on Unity blog, but some just had GitHub release notes. Here is a compilation of links alongside a few highlights that caught my eye, follow these links for a complete list of changes:

2017/6/26: The earliest public information I could find was Unity announcing their intent to join in AI research and applications. Annoyingly, some of the linked blog posts have since disappeared, apparently in some sort of migration of their blog hosting system. For example the “second part of this blog series” link now leads to a 404 error.

2017/9/18: The ML-Agents Toolkit officially kicks off with version 0.1, describing a general architecture that I’m sure has since evolved and a long list of ambitious ideas they wanted to support. Many of them did come to be! Though of course not all of them, and some have since disappeared.

2017/12/8: Version 0.2 introduced curriculum learning, and launched a community challenge to motivate people to play with the toolkit.

2018/3/15: Version 0.3 introduced imitation learning, multi-brain training, and an optional poll model. Recurrent Neural Networks came in as part of a “Memory-Enhanced Agents” umbrella.

2018/6/18: Version 0.4 allowed training using the Unity editor, no longer requiring a compiled executable. An Udacity nanodegree was introduced, though sadly that’s too rich for my blood. More training environments were added, one (Pyramids) specifically demonstrates the “Curiosity” capability. Curiosity got its own blog post.

2018/9/11: Version 0.5 added a Gym interface and replicating a few environments from OpenAI Gym. Also expanded capability to enable/disable discrete actions, but not clear if it was related to OpenAI Gym.

2018/12/17: Version 0.6 is an architectural revamp changing how ml-agents AI brains fit in the Unity object hierarchy. Introduced “demonstration recorder” for off-line imitation learning. Is that still around?

2019/3/1: Version 0.7 is another big infrastructure change, switching runtime neural network inference from external TensorFlowSharp to Unity’s own Inference Engine (a.k.a. Barracuda) to support more Unity runtime platforms.

2019/4/15: Version 0.8 infrastructure change allows multiple Unity simulations to run in parallel on a single machine. Strange this is the recommended approach to take advantage of machines with many processing cores. (Later research found that Unity is working to improve multicore performance across the board, not just ml-agents, with something called DOTS.)

2019/8/1: Version 0.9 (release notes) is the first of two releases focused on throughput and efficiency.

2019/9/30: Version 0.10 finished what 0.9 started. Improving sample throughput (asynchronous environments) and sample efficiency via GAIL (0.9) and SAC (0.10) algorithms.

2019/11/4: Version 0.11 (release notes) changed again the brain’s place in Unity object hierarchy.

2019/12/2: Version 0.12 (release notes) moved from TensorFlow 1 to 2 via the TF1 compatible interfaces. It appears this work was never finished, ml-agents moved to PyTorch instead of finishing TF2 migration.

2020/1/8: Version 0.13 (release notes)

2020/2/28: Version 0.14 now has ability to train via adversarial self-play. Includes a short history of learning from self-play.

2020/3/6: Not a version, but this is when ml-agents got serious enough to get a course up on Unity Learn (Hummingbirds) as well an “AI for Beginners” course on Unity Learn Premium.

2020/3/18: Version 0.15 (release notes) wrapped up a lot of housekeeping in preparation for 1.0 release.

Development focus for ml-agents changed to more refinement after 1.0 release, along with corresponding reduction in blog announcements.

Notes on Installing Unity ML-Agents (Release 18)

I’m dipping my toes into playing with deep reinforcement learning via Unity’s ML-Agents package. I made my first run with the safest most mature option “Verified Package 1.0.8”, which mapped to Release 2 by the ML-Agents repository versioning scheme. No problems were encountered during installation and I was able to run the 3D balancing ball project in the Getting Started guide. From there I could either explore Release 2 further or try a more adventurous release. I chose the latter and proceeded to install ML-Agents Release 18.

Doing this experiment on the same machine meant I had to keep the two installations separate. Unity Hub is already well-suited to keeping distinct versions isolated so they could run in parallel (Unity 2019.4.25f1 for release 18) though there’s a potential point of conflict if Unity editors required different versions of Visual Studio Community Edition for editing code. On the Python side, Anaconda is well suited to keep Python environments separate. Since files are referenced by directory, though, I cloned the ml-agents GitHub repository separately for each release instead trying to switch back and forth within the same directory.

I very much appreciated Unity for their project documentation, as my installation and Getting Started process went just as smoothly. I didn’t expect to notice much different between release 2 and 18, but even just in installation and Getting Started I saw they’ve made changes. The biggest one that caught my eyes is that ml-agents switched from TensorFlow to PyTorch sometime during this time. There are other smaller changes, the most welcome one to me is a much more comprehensive collection of example configurations in the release_18 /config/ subdirectory. Release 2 had only a handful of files, release 18 had a far larger directory tree to let people (like me) have more than one starting point.

I’m not quite sure where to go from here, but given how well documented ml-agents appears to be, I thought it would be interesting to take a quick look back to see where they’ve been.

Notes on Installing Unity ML-Agents (Release 2)

I thought it would be fun to play with reinforcement learning via Unity ML-Agents. The official product landing page sends us to the ml-agents repository on GitHub. And just like every other repository, it’s always a good idea to look over the README.md to understand their branch organization. Especially before we start cloning anything.

And indeed, the README includes a handy chart of releases. As of this writing there are eight releases listed plus main which is labeled as unstable. I’m glad I didn’t blindly clone main! Of the eight stable releases, six are named “Release 13” to “Release 18” inclusive. The final two are named “Verified Package 1.0.7” and “Verified Package 1.0.8”.

The “Verified” label is a guarantee of safety in the lifecycle of Unity packages. Therefore the most recent release with the highest guarantee of functionality is “Verified Package 1.0.8”. In Unity’s world, these verified packages are good enough for commercial production use. If our needs aren’t quite that rigorous, we can use the builds labeled “Release.” These numbers are explained in the ML-Agents versioning page, and it’s something we can play with if we aren’t shouldering the weight of commercial Unity production.

I think I’m fine to play with more recent “Release” builds, but I wanted to start with the most guarantee build to make sure I can at least get that working. Which meant cloning the build labeled “Verified Package 1.0.8” and that maps to “Release 2.”

In order to open the Unity Project that is a part of this release, I wanted to get the version of Unity that exactly matched the version number in ProjectVersion.txt: 2018.4.17f1. If I tried to install Unity 2018 in Unity Hub, it offers me 2018.4.36f1 because that was the most recent supported version. In order to match version, I had to click the download archive link and look for 2018.4.17 under 2018 builds. (It was released February 11th, 2020.) Once found, I could click the “Unity Hub” button to prompt Unity Hub to install the build on my machine.

While Unity installed, I cloned the repository tagged release_2 and installed corresponding Python binaries. I encountered no problems following installation directions for this release, though there were slight modifications as I used Anaconda Individual to manage my Python virtual environments. I had the option of installing the locally cloned versions of the ml-agents-envs and ml-agents packages and I did so. I noticed that the installation had TensorFlow in CPU-only mode, but running without GPU acceleration is perfectly OK for a starting point.

Once Unity Editor 2018.4.17 was installed, I used it to open the Project directory of my cloned release_2 repository. It opened without errors. I proceeded to the Getting Started Guide for this release and verified I had basic functionality both running the pretrained 3D Balance Ball model and training a model of my own. The training was pretty quick, it took just under 8 minutes on the Core i5-7300HQ CPU of my Dell 7577 laptop plugged in to an AC power adapter.

Encouraged by this success, I proceeded to try Release 18 as well.

Switching Back to Unity ML-Agents

It was quite enlightening for me to read Deep Reinforcement Learning Doesn’t Work Yet. And to be honest, a little depressing as well. I was vaguely aware of the challenges involved but only in a general sense. Just small tidbits here and there over the past few years, as I looked at this field with interest. Now that I finally got around to looking at reinforcement learning in more detail, I realized that it was overly optimistic of me to expect all major problems to have been solved by now.

My original motivation for getting into reinforcement learning was to make my Sawppy an autonomous rover. Based on what I’ve learned so far, my original hopes for Sawppy intelligence via reinforcement learning is extremely ambitious and still quite far away. If I want to do some deep RL projects more likely to succeed in the near term, I probably shouldn’t put it on a real physical rover. In all likelihood, whatever can be accomplished on a real robot using deep reinforcement learning could be done faster and more easily with some other AI technique.

It would certainly be nice if some aspect of Sawppy intelligence will eventually result in a research project that can contribute to the state of the art. But I’m not so arrogant as to assume I can accomplish that feat and certainly not as my first project in reinforcement learning. I’ll aim for something simple as my starting point. Got to crawl before I can walk, and all that.

Transferring reinforcement learning from a simulator to work in the real world is still a lot to tackle. So I’m going to look at a simulated world and stay within that simulated world while I learn the ropes. And before I can realistically think about contributing to algorithm advancements, I should get familiar with applying existing implementations of reinforcement learning. All of these new priorities turned my attention back to the game world of Unity ML-Agents.

Notes on “Deep Reinforcement Learning Doesn’t Work Yet”

OpenAI’s guide Spinning Up in Deep RL has been very educational for me to read, even though I only understood a fraction of the information on my first pass through and I hardly understood the code examples at all. But the true riches of this guide are in the links, so I faithfully followed the first link on the first page (Spinning Up as a Deep RL Researcher) of the Resources section and got a big wet towel dampening my enthusiasm.

Spinning Up‘s link text is “it’s hard and it doesn’t always work” and it led to Alex Irpan’s blog post titled Deep Reinforcement Learning Doesn’t Work Yet. The blog post covered all the RL pitfalls I’ve already learned about, either from Spinning Up or elsewhere, and added many more I hadn’t known about. And boy, it paints a huge gap between the promise of reinforcement learning and what has actually been accomplished as of its publishing date almost four years ago in February 2018.

The whole thing was a great read. At the end, my first question was: has the situation materially changed since its publishing in February 2018? As a beginner I have yet to learn of the sources that would help me confirm or disprove this post, so I started with the resource right at hand: the blog site this item was hosted on. Fortunately there weren’t too many posts so I could quickly skim content of the past four years within a few hours. The author seems to still be involved in the field of reinforcement learning and would critique some notable papers during this time. But none seemed particularly earth-shattering.

In the “Doesn’t Work Yet” blog post, the author made references to ImageNet. (Eample quote: Perception has gotten a lot better, but deep RL has yet to have its “ImageNet for control” moment.) I believe this is referring to the ImageNet 2012 Challenge. Historically the top performers in this competition were separated by very narrow margins, but in 2012 AlexNet won with a margin of more than 10% using a GPU-trained convolutional neural network. This was one of the major events (sometimes credited as THE event) that kicked off the current wave of deep learning.

So for robotics control systems, reinforcement learning has yet to see that kind of breakthrough. There have been many notable advancements, OpenAI themselves hyped a robot hand that could manipulate Rubik’s Cube. (Something Alex Irpan has also written about.) But looking under the covers, they’ve all had too many asterisks for the results to make a significant impact on real world applications. Researchers are making headway, but it’s been a long tough slog for incremental advances.

I appreciate OpenAI Spinning Up linking to the Doesn’t Work Yet blog post. Despite all the promise and all the recent advances, there’s still a long way to go. People new to the field need to have realistic expectations and maybe make adjustment to plans.

Notes on Reinforcement Learning Algorithm Implementations Published by OpenAI

OpenAI’s Spinning Up in Deep RL guide has copious written resources, but it also offers resources in the form of algorithm implementations in code. Each implementation is accompanied by a background section talking about how the algorithm works, summarized by a “Quick Facts” section that serves as a high-level view of how these algorithms differ from each other. As of this writing, there are six implementations. I understand the main division is that there are three “on-policy” algorithms and three “off-policy” algorithms. Within each division, the three algorithms are roughly sorted by age and illustrate an arc of researchin the field. The best(?) on-policy algorithm here is Proximal Policy Optimization (PPO) and representing off-policy vanguard is Soft Actor-Critic (SAC).

These implementations are geared towards teaching these algorithms. Since priority is placed on this learning context, these implementations are missing enhancements and optimizations that would make the code harder to understand. For example, many of these implementations do not support parallelization.

This is great for its intended purpose of supplementing Spinning Up, but some people wanted an already optimized implementation. For this audience, OpenAI published the OpenAI baselines a few years ago to serve as high water marks for algorithm implementation. These baselines can serve either as starting point for other projects, or benchmarks to measure potential improvements against.

However, it appears these baselines have gone a little stale. Not any fault of the implementations themselves, but merely due to the rapidly moving nature of this research field. The repository hosts implementations using TensorFlow version 1.0, which has been deprecated. There’s a (partial? full?) conversion to TensorFlow 2.0 in a separate branch, but that never merged back to the main branch for whatever reason. As of this writing there’s an open issue asking for PyTorch implementations, prompted by OpenAI’s own proclamation that they will be standardizing on PyTorch. (This proclamation actually led to the PyTorch conversion of Spinning Up in Deep RL examples.) However, there’s no word yet on PyTorch conversions for OpenAI baselines so anyone who want implementations in PyTorch will either have to optimize the Spinning Up implementations themselves or look elsewhere.

Of course, all of this assumes reinforcement learning will actually solve the problem, which I’ve learned might not be a good assumption.

Notes on Deep Reinforcement Learning Resources by OpenAI

I’m going through the Spinning Up in Deep RL (Reinforcement Learning) guide published by OpenAI, and I found the Introduction to RL section quite enlightening. While it was just a bit too advanced for me to understand everything, I understood enough to make sense of many other things I’ve come across recently as I had browsed the publicly available resources in this field. Following that introductory section was a Resources section with the following pages:

Spinning Up as a Deep RL Researcher

This section is for a target audience that wishes to contribute to the frontier of research in reinforcement learning. Building the background, learn by doing, and how to be rigorous in research projects. This section is filled with information that feels like it was distilled from the author’s experience. The most interesting part for me is the observation that “broken RL code almost always fails silently“. Meaning there are no error messages, just a failure for the agent to learn from its experience. The worst part when this happens is that it’s hard to tell the difference between a flawed algorithm and a flawed implementation of a good algorithm.

Key Papers in Deep RL

This was the completely expected directory of papers, roughly organized by the taxonomy used for the introduction in this guide. I believe the author assembled this list to form a solid foundation for further exploration. It appears that most (and possibly all) of them are freely available, a refreshing change from the paywalls that usually block curious minds.

Exercises

Oh no, I didn’t know there would be homework in this class! Yet here we are. A few problems for readers to attempt implementing using their choice of TensorFlow or PyTorch, along with solutions to check answers against. The first set cover basic implementation, the second set covers some algorithm failure modes. This reinforces what was covered earlier: broken RL code fails silently, so an aspiring practitioner must recognize symptoms of failure modes.

Benchmarks for Spinning Up Implementations

The Spinning Up in Deep RL guide is accompanied by implementations of several representative algorithms. Most of them have two implementations: one in TensorFlow and one in PyTorch. But when I run them on my own computer, how would I know if they’re running correctly? This page is a valuable resource to check against. It has charts for these algorithms’ performance in five MuJoCo Gym environments under conditions also described on this page. And once I have confidence they are running correctly, these are also benchmarks to see if I can improve their performance. For these implementations are written for beginners to read and understand, which means skipping some of the esoteric hard-to-understand optimizations that readers are challenged to put in themselves.

For people who want optimized implementations of these algorithms OpenAI has them covered too. Or at least, they used to.

Notes on “Introduction to RL” by OpenAI

With some confidence that I could practice simple reinforcement learning algorithms, I moved on to what I consider the meat of OpenAI’s Spinning Up in Deep RL guide: the section titled Introduction to RL. I thought it was pretty good stuff.

Part 1: Key Concepts in RL

Since I had been marginally interested in the field, the first section What Can RL Do? was mostly review for me. More than anything else, it confirmed that I was reading the right page. This section proceeded to Key Concepts and Terminology and that quickly caught up then surpassed what knowledge I already had. I was fine with the verbal descriptions and the code snippets, but once the author started phrasing in terms of equations it became more abstract than what I can easily follow. I’m a code person, not an equations person. Given my difficulty comprehending the formalized equations, I was a little bemused at the third section (Optional) Formalism as things were already becoming more formalized than my usual style.

Part 2: Kinds of RL Algorithms

Given that I didn’t comprehend everything in part 1, I was a little scared at what I would find in part 2. Gold. I found pure gold. Here I found a quick survey of major characteristics of various reinforcement learning approaches, explaining those terms in ways I (mostly) understood. Explanations like what “Model-Free” vs. “Model-Based” means. What “on-policy” vs. “off-policy” means. I had seen a lot of these terms thrown around in things I read before, but I never found a place laying them down against one another until I found this section. This section alone was worth the effort of digging into this guide and will help me understand other things. For example, a lot of these terms were in the documentation for Unity ML-Agents. I didn’t understand what they meant at the time, but now I hope to understand more when I review them.

Part 3: Intro to Policy Optimization

Cheered by how informative I found part 2, I proceeded to part 3 and was promptly brought back down to size by the equations. The good thing is that this section had both equations and a code example implementation, and they were explained more or less in parallel. Someone like myself can read the code and try to understand the equations. I expect there are people out there who are the reverse, more comfortable with the equations and appreciate seeing what they look like in code. That said, I can’t claim I completely understand the code, either. Some of the mathematical complexities represented in the equations were not in the sample source code: They were handed off to implementations in PyTorch library.

I foresee a lot of studying PyTorch documentation before I can get proficient at writing something new on my own. But just from reading this Introduction to RL section, I have the basic understanding to navigate PyTorch API documentation and maybe the skill to apply existing implementations written by somebody else. And as expected of an introduction, there are tons of links to more resources.

Old PyTorch Without GPU Is Enough To Start

I’ve mostly successfully followed the installation instructions for OpenAI’s Spinning Up in Deep RL. I am optimistic this particular Anaconda environment (in tandem with the OpenAI guide) will be enough to get me off the ground. However, I don’t expect it to be enough for doing anything extensive. Because when I checked the installation, I saw it pulled down PyTorch 1.3.1 which is now fairly old. As of this writing, PyTorch LTS is 1.8.2 and Stable is 1.10. On top of that, this old PyTorch runs without CUDA GPU acceleration.

>>> print(torch.__version__)
1.3.1
>>> print(torch.cuda.is_available())
False

NVIDIA’s CUDA API has been credited with making the current boom in deep learning possible, because CUDA opened up GPU hardware for usage other than its original gaming intent. Such massively parallel computing hardware made previously impractical algorithms practical. However, such power does incur overhead. For one thing, data has to be copied from a computer’s main memory to memory modules on board the GPU before they can be processed. And then, the results have to be copied back. For smaller problems, the cost of such overhead can swamp the benefit of GPU acceleration.

I believe I ran into this when working through Codecademy’s TensorFlow exercises. I initially set up CPU-only TensorFlow on my computer to get started and, once I had that initial experience, I installed TensorFlow with GPU support. I was a little surprised to see that small teaching examples from Codecademy took more time overall on the GPU accelerated installation than the CPU-only installation. One example took two minutes to train in CPU-only mode, but took two and a half minutes with GPU overhead. By all reports, the GPU will become quite important once I start tackling larger neural networks, but I’m not going to sweat the complexity until then.

So for Spinning Up in Deep RL, my first goal is to get some experience running these algorithms in a teaching examples context. If I am successful through that phase (which is by no means guaranteed) and start going beyond small examples, then I’ll worry about setting up another Anaconda environment with a modern version of PyTorch with GPU support.

Installing Code for OpenAI “Spinning Up in Deep RL”

I want to start playing with training deep reinforcement learning agents, and OpenAI’s “Spinning Up in Deep RL” guide seems like a good place to start. But I was a bit worried about the age of this guide, parts of which has become out of date. One example is that it still says MuJoCo requires a paid license but actually MuJoCo has since become free to use.

Despite this, I decided it’s worth my time to try its software installation guide on my computer running Ubuntu. OpenAI’s recommended procedure uses Anaconda, meaning I’ll have a Python environment dedicated to this adventure and largely isolated from everything else. Starting from Python version (3.6, instead of the latest 3.10) and on down to all the dependent libraries. The good news is that all the Python-based infrastructure seemed to work without problems. But MuJoCo is not Python and thus not under Anaconda isolation, so all my problems came from trying to install mujoco_py. (A Python library to bridge MuJoCo.)

The first problem was apparently expected by the project’s authors, as I got a prompt to set my LD_LIBRARY_PATH environment variable. The script that gave me this prompt even gave me its best guess on the solution:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/roger/.mujoco/mujoco210/bin

That looked reasonable to me, so I tried it and it successfully allowed me to see the second and third problems. Both were Ubuntu packages that were not explicitly named in the instructions, but I had to install them before things could move on. I did a web search for the first error, which returned the suggestion to run “sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3” and that seemed to have resolved this message:

/home/roger/anaconda3/envs/spinningup/lib/python3.6/site-packages/mujoco_py/gl/osmesashim.c:1:10: fatal error: GL/osmesa.h: No such file or directory

The second missing package error was straightforward to fix, running “sudo apt install patchelf” to resolve this message:

FileNotFoundError: [Errno 2] No such file or directory: 'patchelf': 'patchelf'

The final step was to install dependencies for MuJoCo-based Gym environments. The instruction page said to run pip install gym[mujoco,robotics] but when I did so, I was distressed to see my work to install mujoco-py was removed. (!!)

  Attempting uninstall: mujoco-py
    Found existing installation: mujoco-py 2.1.2.14
    Uninstalling mujoco-py-2.1.2.14:
      Successfully uninstalled mujoco-py-2.1.2.14

The reason pip wanted to do this was because the gym packages said they want mujoco_py version greater than or equal to 1.5 but less than 2.0.

Collecting mujoco-py<2.0,>=1.50
Downloading mujoco-py-1.50.1.68.tar.gz (120 kB)

Fortunately(?) this attempt to install 1.50.1.68 failed and pip rolled everything back, restoring 2.1.2.14 that I had already installed. Leaving me in a limbo state.

Well, when in doubt, try the easy thing first. I decided to be optimistic and moved on to “check that things are working” command to run Walker2d-v2 environment. I saw a lot of numbers and words flashed by. I only marginally understood them, but I didn’t see anything I recognized as error messages. So while I see a few potential problems ahead, right now it appears this mishmash of old and new components will work well enough for a beginner.

Today I Learned: MuJoCo Is Now Free To Use

I’ve contemplated going through OpenAI’s guide Spinning Up in Deep RL. It’s one of many resources OpenAI made available, and builds upon the OpenAI Gym system of environments for training deep reinforcement learning agents. They range from very simple text-based environments, to 2D Atari games, to full 3D environments built with MuJoCo. Whose documentation explained that name is a shorthand for the type of interactions it simulates: “Multi Joint Dynamics with Contact”

I’ve seen MuJoCo mentioned in various research contexts, and I’ve inferred it is a better physics simulation than something that we would find in, say, a game engine like Unity. No simulation engine is perfect, they each make different tradeoffs, and it sounds like AI researchers (or at least those at OpenAI) believe MuJoCo to be the best one to use for training deep reinforcement learning agents with the best chance of being applicable to the real world.

The problem is that, when I looked at OpenAI Gym the first time, MuJoCo was expensive. This time around, I visited the MuJoCo page hoping that they’ve launched a more affordable tier of licensing, and there I got the news: sometime in the past two years (I didn’t see a date stamp) DeepMind has acquired MuJoCo and intend to release it as free open source software.

DeepMind was itself acquired by Google and, when the collection of companies were reorganized, it became one of several companies under the parent company Alphabet. At a practical level, it meant DeepMind had indirect access to Google money for buying things like MuJoCo. There’s lots of flowery wordsmithing about how opening MuJoCo will advance research, what I care about is the fact that everyone (including myself) can now use MuJoCo without worrying about the licensing fees it previously required. This is a great thing.

At the moment MuJoCo is only available as compiled binaries, which is fine enough by me. Eventually it is promised to be fully open-sourced at a GitHub repository set up for the purpose. The README of the repository made one thing very clear:

This is not an officially supported Google product.

I interpret this to mean I’ll be on my own to figure things out without Google technical support. Is that a bad thing? I won’t know until I dive in and find out.

Window Shopping: OpenAI Spinning Up in Deep Reinforcement Learning

I was very encouraged after my second look at Unity ML-Agents. When I first looked at it a few years ago, it motivated me to look at the OpenAI Gym for training AI agents via reinforcement learning. But once I finished the “Hello World” examples, I got lost trying to get further and quickly became distracted by other more fruitful project ideas. With this past experience in mind, I set out to find something more instructive for a beginner and found OpenAI’s guide “Spinning Up in Deep RL“. It opened with the following text:

Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).

This sounds exactly like what I needed. Jackpot! At first I thought this was something new since my last look, but the timeline at the bottom of the page indicated this was already available when I last looked at reinforcement learning resources on OpenAI. I had missed it! I regret the lost opportunity, but at least I’ve found it this time.

The problem with finding such a resource a few years after publication is that it may already be out of date. The field of deep learning moves so fast! I’m pretty sure the fundamentals will still be applicable, but the state of the art has certainly moved on. I’m also worried about the example code that goes with this resource, which looks stale at first glance. For example, it launched with examples that used the now-deprecated TensorFlow 1 API. (Instead of the current TensorFlow 2 API.) I don’t care to learn TF1 just for the sake of this course, but fortunately in January 2020 they added alternative examples implemented using PyTorch instead. If I’m lucky, PyTorch hasn’t made a major version breaking change and I could still use those examples.

In addition to the PyTorch examples, there’s another upside of finding this resource now instead of later. For 3D environment simulations OpenAI uses MuJoCo. When I looked at OpenAI Gyms earlier, running the 3D environments require a MuJoCo license that costs $500/year and I couldn’t justify that money for playing around. But good news! MuJoCo is now free to use.

Unity Machine Learning Agents Almost Within My Reach

While poking around Google’s Machine Learning Crash Course, I found that they have released a TensorFlow library for building agents with deep reinforcement learning. This might be fun but I don’t know enough about the field to make use of that library yet. It also reminded me to take another look at game engine Unity 3D’s development in this area. A lot has happened!

I first took a quick glance at Unity ML-Agents more than two years ago. At the time, the project was still an experimental thing for Unity and a lot was still in flux. Since I didn’t know much about working in Unity or in reinforcement learning, that was too many variables in flux for my taste. A year later, Unity ML-Agents reached an official version 1.0, but it was still technically a preview technology. But not long after that they had become a “verified” package for use with Unity 2020.3 LTS build, signifying a mature tool. As part of being a verified package for use with Unity LTS, ML-Agents got some nice things like an official Unity technology landing page and a few pieces of curriculum have been posted to Unity Learn to help people get started.

The primary focus of Unity ML-Agents is for creating agents in the virtual world of a Unity game. Not necessarily for real-world robots which is where my interests lie. This is an important caveat because the Unity physics engine is not an accurate representation of the real world, and reinforcement learning agents are notorious for exploiting flaws in virtual engines to do “impossible” things. But that’s no reason to give up on Unity, which can still be a useful tool for robotics research. These caveats are just some tradeoffs amongst many more to keep in mind.

During this time that Unity evolved their ML-Agents library, I’ve occasionally dabbled in Unity with projects like Bouncy Bouncy Lights. I’m not bold enough to call myself a Unity developer yet, but I’m no longer completely overwhelmed by Unity editor user interface as I once were. I haven’t done much more in Unity because I haven’t felt particularly motivated to make games. But ML-Agents? That looks like pretty good motivation for me to put serious effort into understanding reinforcement learning.

Window Shopping: Google Machine Learning Crash Course

I’ve learned some machine learning fundamentals through Codecademy Pro, and I took a quick look at what Kaggle offered for free. There’s another option, from another name that (for better and worse) taken leadership in monetizing machine learning: Google. Their freely available Machine Learning Crash Course supposedly started as a way for Google to get their own developers up to speed, and has since been made suitable for public consumption and now available for all.

My favorite part of this course was their prerequisites & prework page. I used this as a guide to choose some of my courses on Codecademy Pro, it’s how I knew Numpy and Pandas were important to the field. However their linked “UltraQuick Tutorial” for those libraries were a bit too terse for me. I have no doubt they provided enough background for people smarter than I am. Personally, I needed help from Codecademy to get ramped up on NumPy and Pandas.

For interactive learning, these courses use Google Colab. Which feels a lot like Kaggle in the sense that they are both cloud-hosted Jupyter (or at least Jupyter style) notebooks. They differ by the server-end hardware, and the software images running on their hardware. I’m too new at this to have any feel of the tradeoffs between those two options, but I do understand the storage difference: Google Colab uses my Google Drive account for storage.

The primary reason I didn’t go with this Machine Learning Crash Course was the fact that instructions came in the form of video lectures, and that turns me off. For self-directed education I prefer the written word because I can go at my own pace and I can easily search and skip around. Many people prefer video lectures as the most direct counterpart to in-person instruction, but I don’t see it that way. For me the most valuable aspect of in-person instruction is the interactive question-and-answer aspect, which is completely lost in a video lecture.

And finally, this course does not cover reinforcement learning, which is my primary interest in applying machine learning. A search for RL on Google developer site leads to the Agents library and associated documentation, but that material was written to help apply reinforcement learning instead of teaching it. I’ll return to the TensorFlow Agents library after getting ramped up on deep RL somewhere else.

Window Shopping: Kaggle Courses

Over the past few months, I’ve spent much of my free time learning with a Codecademy Pro subscription. I was on a self-curated curriculum starting with “Learn Python 3” and ending with “Building Deep Learning Models on TensorFlow”. I felt it was time (and money) well-spent, helping me get up to speed on a lot of concepts and vocabulary on machine learning with deep learning. I enjoyed it a lot and can recommend going through the same path I did, with upsides and caveats documented over the past few blog entries.

But no one path is best for everyone, so I thought it’s worth noting a few alternative approaches. Top of the list is the learning or “Courses” section on Kaggle. They are free of monetary cost and does not require equivalent of a Codecademy Pro subscription. The course materials are Kaggle’s interactive documents that resemble Jupyter notebooks. (Whether Kaggle was inspired by Jupyter or actually cloud-hosted Jupyter I don’t know.) But it provides an interactive learning environment of a different style than Codecademy’s learning environment. In hindsight I can see how this approach might be better. Codecademy course tells us about Jupyter and Kaggle and encourages us to learn them, but going with Kaggle courses means we can start with The Real Deal immediately. Another upside is that I found it much easier to navigate/skip/skim through Kaggle course sections in their notebook format. In comparison I thought it was cumbersome to navigate through Codecademy units whenever I want to go back and review material.

One downside is that these courses assume more background than Codecademy does. For example there’s nothing to teach basic statistics and probability — the student is assumed to know them (or learn them elsewhere.) A final downside for me personally is that there’s very little here on applying deep learning to reinforcement learning. Their “Intro to Game AI and Reinforcement Learning” unit has just a single notebook “Deep Reinforcement Learning” that opens with the disclaimer:

In this notebook, we won’t be able to explore this complex field in detail, but you’ll learn about the big picture and explore code that you can use to train your own agent.

Given that what I find interesting is missing, I don’t think I’ll spend very much time on Kaggle. In principle, there’s value in seeing Kaggle instructors take on the same fundamentals for a more well-rounded foundation. In practice I’m impatient to get closer to my target. Perhaps I’ll return later for a refresher. In the meantime, Kaggle Learn might be a better option than Codecademy Pro for those who (A) already have a stronger background in the area, or (B) who prefer the notebook learning format, or (C) could not justify spending money for Codecademy Pro.

And for those that prefer video lecture format, there’s Google’s Machine Learning Crash Course.

Notes on Codecademy “Build Deep Learning Models with TensorFlow”

Once I upgraded to a Codecademy Pro membership, I started taking courses from its Python catalog with the goal of building a foundation to understand deep learning neural networks. Aside from a few scenic detours, most of my course choices were intended to build upon each other to fulfill what I consider prerequisites for a Codecademy “Skill Path”: Build Deep Learning Models with TensorFlow

This was the first “Skill Path” I took, and I wasn’t quite sure what to expect as Codecademy implied they are different than the courses I took before. But once I got into this “skill path”… it feels pretty much like another course. Just a longer one, with more sessions. It picked up where the “Learn the Basics of Machine Learning” course left off with neural perceptrons, and dived deeper into neural networks.

In contrast to earlier courses that taught various concepts by using them to solve regression problems, this course spent more time on classification problems. We are still using scikit-learn a lot, but as promised by the title we’re also using TensorFlow. Note the course work mostly stayed in the Keras subset of TensorFlow 2 API. Keras used to be a separate library for making it easier to work with TensorFlow version 1, but it has since been merged into TensorFlow version 2 as part of the big revamp between versions.

I want to call attention to an item linked as “additional resources” for the skill path: a book titled “Deep Learning with Python” by François Chollet. (Author, or at least one of the primary people, behind Keras.) Following various links associated with the title, I found that there’s since been a second edition and the first chapter of the book is available to read online for free! I loved reading this chapter, which managed to condense a lot of background on deep learning into a concise history of the field. If the rest of the book is as good as the first chapter, I will learn a lot. The only reason I haven’t bought the book (yet) is that, based on the index, the book doesn’t get into unsupervised reinforcement learning like the type I want to put into my robot projects.

Back to the Codecademy course…. err, skill path: we get a lot of hands-on exercises using Keras to build TensorFlow models and train them on data for various types of problems. This is great, but I felt there was a significant gap in the material. I appreciated learning that different loss functions and optimizers will be used for regression versus classification problems, and we put them to work in their respective domains. But we were merely told which function to use for each exercise, the course doesn’t go into why they were chosen for the problem. I had hoped that the Keras documentation Optimizers Overview page would describe relative strengths and weaknesses of each optimizer, but it was merely a list of optimizers by name. I feel like such a comparison chart must exist somewhere, but it’s not here.

I didn’t quite finish this skill path. I lost motivation to finish the “Portfolio Project” portion of the skill path where we are directed to create a forest cover classification model. My motivation for deep learning lies in reinforcement learning, not classification or regression problems, so my attention has wandered elsewhere. At this point I believe I’ve exhausted all the immediately applicable resources on Codecademy as there are no further deep learning material nor is there anything on reinforcement learning. So I bid a grateful farewell to Codecademy for teaching me many important basics over the past few months and started looking elsewhere.

Notes on Codecademy Intermediate Python Courses

I thought Codecademy’s course “Getting Started Off Platform for Data Science” really deserved more focus than it did when I initially browsed the catalog, regretting that I saw it at the end of my perusal of beginner friendly Python courses. But life moves on. I started going through some intermediate courses with an eye on future studies in machine learning. Here are some notes:

  • Learn Recursion with Python I took purely for fun and curiosity with no expectation of applicability to modern machine learning. In school I learned recursion with Lisp, a language ideally suited for the task. Python wasn’t as good of a fit for the subject, but it was alright. Lisp was also the darling of artificial intelligence research for a while, but I guess the focus has since evolved.
  • Learn Data Visualization with Python gave me more depth on two popular Python graphing libraries: Matplotlib and Seaborn. These are both libraries with lots of functionality so “more depth” is still only a brief overview. Still, I anticipate skills here to be useful in the future and not just in machine learning adventures.
  • Learn Statistics with NumPy was expected to be a direct follow-up to the beginner-friendly Statistics with Python course, but it was not a direct sequel and there’s more overlap than I thought there’d be. This course is shorter, with less coverage on statistics but more about NumPy. After taking the course I think I had parsed the course title as “(Learn Statistics) with NumPy” but I think it’s more accurate to think of it as “Learn (Statistics with NumPy)”
  • Linear Regression in Python is a small but important step up the foothills on the way to climbing the mountain of machine learning. Finding the best line to fit a set of data teaches important concepts like loss functions. And doing it on a 2D plot of points gives us an intuitive grasp of what the process looks like before we start adding variables and increasing the number of dimensions involved. Many concepts are described and we get exercises using the scikit-learn library which implements those algorithms.
  • Learn the Basics of Machine Learning was the obvious follow-up, diving deeper into machine learning fundamentals. All of my old friends are here: Pandas, NumPy, scikit-learn, and more. It’s a huge party of Python libraries! I see this course as a survey of major themes in machine learning, of which neural networks was only a part. It describes a broader context which I believe is a good thing to have in the back of my head. I hope it helps me avoid the trap of trying to use neural nets to solve everything a.k.a. “When I get a shiny new hammer everything looks like a nail”.

Several months after I started reorienting myself with Python 3, I felt like I had the foundation I needed to start digging into the current state of the art of deep learning research. I have no illusions about being able to contribute anything, I’m just trying to learn enough to apply what I can read in papers. My next step is to learn to build a deep learning model.

Notes on Codecademy “Getting Started Off Platform for Data Science”

I like Codecademy’s format of having a bit of information that is followed immediately by an opportunity to try it myself. I like learn-by-doing as a beginner, even if the teaching/learning environment can be limited at times. But one thing that I didn’t like was the fact if I am to put my Python knowledge to use, I would have to venture outside of the learning environment and Codecademy didn’t used to provide information how.

The Learn Python 3 course made effort to help students work outside of the Codecademy environment with “Off-Platform Project”. These came in the form of Jupyter notebooks that I could download, and a page with some instructions on how to use them: a link to Codecademy’s command line course, a link to instructions for installing Python on my own computer, and a link on installing Jupyter notebooks. It’s a bit scattered.

What I didn’t know at the time was that Codecademy had already assembled an entire course covering these points. Getting Started Off Platform for Data Science is an orientation for everyone as we eventually venture off Codecademy’s learning platform. It starts with an introduction to the command line, then Python development tools like Jupyter Notebooks and other IDEs, wrapping up with an introduction to Github. This is great! Why didn’t they put more emphasis on this earlier? I think it would have been super helpful to beginners.

Though admittedly, I didn’t follow those installation instructions anyway. Python isn’t very good about library version management and the community has sidestepped the issue by using virtual environments to keep Python libraries separated in different per-project worlds. I’ve used venv and Anaconda to do this, and recently I’ve also started playing with Docker containers. For my own trip through Codecademy’s off-platform projects using Jupyter notebooks, I ran Jupyter Lab using their jupyter/datascience-notebook Docker image. That turned out to be sheer overkill and I probably could have just used the much lighter-weight jupyter/base-notebook image.

In hindsight I think it would have been useful for me to review Getting Started Off Platform for Data Science before I started reorienting myself with Python. I wouldn’t have followed it by the letter, but it had information that would have been useful beforehand. But as fate had it, it became the final course I took in the beginner-friendly section before I started trying intermediate courses.

Codecademy Beginner Friendly Python Fields

Once Codecademy got me reoriented with the Python programming language, I looked at some of their other beginner-friendly courses under the Python umbrella. I wanted to get some practice using Python, but I didn’t want to go through exercises for the sake of exercises. I wanted to make some effort at keeping things focused on my ultimate goal of learning about modern advances in machine learning.

  1. Learn Data Analysis with Pandas was my first choice, because I recognized “Pandas” as the name of a popular Python library for preparing data for machine learning. Making it relevant to the direction I am aiming for. The course title has “Data Analysis” and not “Machine Learning” but that was fine because it was only an introduction to the library. Not enough to get into field-specific knowledge, but more than enough to teach me Pandas vocabulary so I could navigate Pandas references and find my own answers in the future.
  2. How to Clean Data with Python followed up with more examples of Pandas in action. Again the course is nominally focused for data analytics but all the same concepts apply to cleaning data before feeding into machine learning algorithms.
  3. Exploratory Data Analysis in Python is a longer course with more ways to apply Pandas, including a machine learning specific section. Relative to other courses, this one is heavy on reading and light on hands-on practice, a consequence of the more general nature of the topic. And finally, this course let me dip my toes in another popular Python library I wanted to learn: NumPy.
  4. Learn Statistics with Python was how I dove into NumPy waters. After barely skating by some statics and number crunching in the previous course, I wanted a refresher in basic statistics. Alongside the refresher I also learn how to calculate common statistics using the NumPy library. And after the statistics calculations are done, we want to visualize them! Enter yet another popular Python library: matplotlib.
  5. Probability is the natural course to follow a refresher in basic statistics. They cover only the most basic and common applications of statistics and probability for data analysis, we’re on our own to explore in further depth outside of the class. I anticipate probability to play a role in machine learning, as some answers are going to be vague with room for interpretation. I foresee a poor (or misleadingly confident) grasp of probability will lead me astray.
  6. Differential Calculus was a course I poked my head into. I remembered it was quite a complex subject in school and was surprised Codecademy claimed anyone could learn it in two hours. It turns out the course should be more accurately titled “an introduction to numpy.gradient()“. Which… yes, it is a numerical application of differential calculus but it is definitely not the entirety of differential calculus. I guess it follows the trend of these courses: overly simplfied titles that skim the basics of a few things. Teach just enough for us to learn more on our own later.
  7. Linear Algebra starts to get into Python code that has direct relevance to machine learning. I know linear regression is a starting point and I knew I needed an introduction to linear algebra before I could grasp how linear regression algorithms work.
  8. Learn How to Get Started with Natural Language Processing was a disappointment to me, but it was not the fault of the course. It’s just that the machine learning systems in this field aren’t usually reinforcement learning systems. Which was the subfield of machine learning that most interested me. At least the course was short, and taught me enough so I know to skip other Codecademy natural language courses for myself.

The final Codecademy “Beginner friendly” Python course I took was titled “Getting Started Off Platform for Data Science.” I don’t think Codecademy put enough emphasis on this one.