So far I’ve dipped my toes in the water of reinforcement learning, reinstalled Ubuntu and TensorFlow, and looked into Unity ML-Agents. It looks like I have a tentative plan for building my own reinforcement learning agents trained with TensorFlow in an Unity 3D environment.
There’s one problem with this plan, though: I have GPU-accelerated TensorFlow running in Ubuntu, but today Unity editor only supports MacOS and Windows. If I wanted to put them all together, on paper it means I’d have to get Nvidia GPU support up and running on my Windows partition and take on all the headaches that entails.
Thankfully, I’m not under a deadline to make this work immediately, so I can hope that Unity brings their editing and creation environment to Ubuntu. The latest preview build was released only a few days ago, and they expected that Linux will be a fully supported operating system for Unity Editor by the end of the year.
I suspect that they’ll be ready before I am, because I still have to climb the newcomer learning curve of reinforcement learning. I first have to learn the ropes using prebuilt OpenAI environments. It’ll be awhile before I can realistically contemplate designing on own agents and simulation environments.
Once I reach that point I hope I will be able to better evaluate whether my plan will actually work. Will Unity ML-Agents work with GPU-accelerated TensorFlow running in a Docker container on Ubuntu? I’ll have to find out when I get there.
In order to train reinforcement learning agents quickly, we want our training environment to provide high throughput. There are many variables involved, but I started looking at two of them: how fast it would be to run a single simulation, and how easy it would be to run multiple simulation in parallel.
The Gazebo simulator commonly associated with ROS research projects has never been known for its speed. Gazebo environment for the NASA Space Robotic Challenge was infamous for slowing far below real time speed. Taking over 6 hours to simulate a 30 minute event. There are ways to speed up Gazebo simulation, but this forum thread implies it’s unrealistic to expect more than 2-3 times as fast as real time speed.
In contrast, Unity simulation can be cranked all the way up to 100 times real time speed. It’s not clear where the maximum limit of 100 comes from, but it is documented under
limitations.md. Furthermore, it doesn’t seem to be a theoretical limit no one can realistically reach – at least one discussion on Unity ML Agents indicate people do indeed crank up time multiplier to 100 for training agents.
On the topic of running simulations in parallel, with Gazebo such a resource hog it is difficult to get multiple instances running. This forum thread explains it is possible and how it could be done, but at best it still feels like shoving a square peg in a round hole and it’ll be a tough act to get multiple Gazebo running. And we haven’t even considered the effort to coordinate learning activity across these multiple instances.
Things weren’t much better in Unity until recently. This announcement blog post describes how Unity has just picked up the ability to run multiple simulations on a single machine and, just as importantly, coordinate learning knowledge across all instances.
These bits of information further cements Unity as something I should strongly consider as my test environment for playing with reinforcement learning. Faster than real time simulation speed and option for multiple parallel instances are quite compelling reasons.
Out of all the general categories of machine learning, I find myself most interested in reinforcement learning. These problems (and associated solutions) are most applicable to robotics, forming the foundation of projects like Amazon’s DeepRacer. And the fundamental requirement of reinforcement learning is a training environment where our machine can learn by experimentation.
While it is technically possible to train a reinforcement learning algorithm in the real world with real robots, it is not really very practical. First, because a physical environment will be subject to wear and tear, and second, because doing things in the real world at real time takes too long.
For that reason there are many digital simulation environments in which to train reinforcement learning algorithms. I thought it would be an obvious application of robot simulation software like Gazebo for ROS, but this turned out to only be partially true. Gazebo only addresses half of the requirements: a virtual environment that can be easily rearranged and rebuilt and not subject to wear and tear. However, Gazebo is designed to run in a single instance, and its simulation engine is complex enough it can fall behind real time meaning it takes longer to simulation something than it would be in the real world.
For faster training of reinforcement learning algorithms, what we want is a simulation environment that can scale up to run multiple instances in parallel and can run faster than real time. This is why people started looking at 3D game engines. They were designed from the start to represent a virtual environment for entertainment, and they were built for performance in mind for high frame rates.
The physics simulation inside Unity would be less accurate than Gazebo, but it might be good enough for exploring different concepts. Certainly the results would be good enough if the whole goal is to build something for a game with no aspirations for adapting them to the real world.
Hence the Unity ML-Agents toolkit for training reinforcement learning agents inside Unity game engine. The toolkit is nominally focused on building smart agents for game non-player characters (NPC) but that is a big enough toolbox to offer possibilities into much more. It has definitely earned a spot on my to-do list for closer examination in the future.
After an intense few weeks with Unity, I’ve decided to put it on hold and come back to it later. The biggest problem is that, even though it can output to web browsers via WebGL, it is extreme overkill for projects that can be easily handled via Plain Jane HTML.
As an Unity exercise, I pulled art and assets from the old game Star Control 2. This was possible because (1) the creators of the game released the code in 2002, and (2) I’m no longer working at Microsoft. I was aware of the code release when it occurred, but my employment as a Microsoft developer at the time meant I could not look at anything released under GPL. Even that of an old game with zero relation to the work I’m paid to do.
Since it was a simple 2D game, bringing the assets to life using Unity severely underutilized Unity capabilities. That isn’t a problem, the part that gave me pause is that the end product is a lot heavier-weight than I thought it would be: Even when there’s only a few 2D sprites animating with some text overlay, it is a larger download with correspondingly slower startup than I wanted to see.
Given what I see so far, Unity is the wrong tool for this job. Now I have a decision to make. Option #1: Keep digging in on an old favorite, and work on SC2 code using some other framework, or Option #2: Find a project that plays more to Unity’s (many) strengths.
Option #1 goes to the past, option #2 goes to the future. (Specifically VR…. Unity is great for exploring VR development.)
Today, I find that I’ve caught the old SC2 fever again, so I’m going with #1. I’ll come back to Unity/VR/other explorations later.
Unity’s learning center has a lot of information, I chose to start with the headliner tutorials. These appear to be full-day lectures during the Unite conference, where they take the class through building a game from beginning to end.
Since real games take more than a day to build, many shortcuts were taken. One of these shortcuts were the use of built art assets. All came already created, complete with their own associated animation sequences. The tutorial only covered how to import the items and write a few lines of code to trigger the animation sequences.
I have no illusions about being an artist but I also know I don’t have an artist to call upon as I learn Unity. So I had to know something about creating these assets for myself. I thought I would start small with a few simple sprite animations… that turned out to be not so simple.
The Unity animation engine (sometimes called Mecanim in the documentation) is a very complex machine optimized to work with humanoid figures in 3D space. It can certainly do simple sprite animations, but trying to do so became an exercise in figuring out what to turn off in the big complex machine. It keeps trying to do too much, blending and interpolating and trying to be helpful when all I really want was to put a few 2D images on screen at discrete coordinates at specific points in time.
It took way more time than it should, but (1) I got my simple sprite animations working, and (2) I learned a whole bunch about what the animation engine can do for me, down the line, when I’m ready to move beyond sprites.
It was a bit frustrating, but now that I’m through it, I’ll call it a win. Time to move on.
After deciding to move on from Phaser, I looked around for other game engines to generate web games. I came across Unity again and again. Unity is not new to me, but I also knew their web support was via the Unity web player browser plug-in. This is a problem, as browser plug-ins have fallen into disfavor. All the major desktop browsers Firefox / Chrome / IE, are moving away from plug-ins.
So… dead end (or at least dying). Plug-ins are not the future of the web.
Because of the web player, I dismissed every mention of Unity for web development I encountered, until I came across information that slapped me upside the head and woke me up. The web player is not the only Unity web target: Unity now has another web-friendly build target: WebGL, no plug-in required! Wow!
This warrants a closer look. While WebGL might be immature and inconsistent across browsers, it is a shared goal of many interests to evolve and mature the platform and I share their high hopes. The browser plug-in model is going away. WebGL may or may not take off but all the signs today look promising.
And now I learned that I can even explore some parts of the web world with Unity. It is quite the Swiss Army Knife of software development. With this latest discovery, Unity has moved to the top of my to-do list. It is time to roll up my sleeves and dig in.
This should be fun.