Notes on “Deep Reinforcement Learning Doesn’t Work Yet”

OpenAI’s guide Spinning Up in Deep RL has been very educational for me to read, even though I only understood a fraction of the information on my first pass through and I hardly understood the code examples at all. But the true riches of this guide are in the links, so I faithfully followed the first link on the first page (Spinning Up as a Deep RL Researcher) of the Resources section and got a big wet towel dampening my enthusiasm.

Spinning Up‘s link text is “it’s hard and it doesn’t always work” and it led to Alex Irpan’s blog post titled Deep Reinforcement Learning Doesn’t Work Yet. The blog post covered all the RL pitfalls I’ve already learned about, either from Spinning Up or elsewhere, and added many more I hadn’t known about. And boy, it paints a huge gap between the promise of reinforcement learning and what has actually been accomplished as of its publishing date almost four years ago in February 2018.

The whole thing was a great read. At the end, my first question was: has the situation materially changed since its publishing in February 2018? As a beginner I have yet to learn of the sources that would help me confirm or disprove this post, so I started with the resource right at hand: the blog site this item was hosted on. Fortunately there weren’t too many posts so I could quickly skim content of the past four years within a few hours. The author seems to still be involved in the field of reinforcement learning and would critique some notable papers during this time. But none seemed particularly earth-shattering.

In the “Doesn’t Work Yet” blog post, the author made references to ImageNet. (Eample quote: Perception has gotten a lot better, but deep RL has yet to have its “ImageNet for control” moment.) I believe this is referring to the ImageNet 2012 Challenge. Historically the top performers in this competition were separated by very narrow margins, but in 2012 AlexNet won with a margin of more than 10% using a GPU-trained convolutional neural network. This was one of the major events (sometimes credited as THE event) that kicked off the current wave of deep learning.

So for robotics control systems, reinforcement learning has yet to see that kind of breakthrough. There have been many notable advancements, OpenAI themselves hyped a robot hand that could manipulate Rubik’s Cube. (Something Alex Irpan has also written about.) But looking under the covers, they’ve all had too many asterisks for the results to make a significant impact on real world applications. Researchers are making headway, but it’s been a long tough slog for incremental advances.

I appreciate OpenAI Spinning Up linking to the Doesn’t Work Yet blog post. Despite all the promise and all the recent advances, there’s still a long way to go. People new to the field need to have realistic expectations and maybe make adjustment to plans.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s