I’m going through the Spinning Up in Deep RL (Reinforcement Learning) guide published by OpenAI, and I found the Introduction to RL section quite enlightening. While it was just a bit too advanced for me to understand everything, I understood enough to make sense of many other things I’ve come across recently as I had browsed the publicly available resources in this field. Following that introductory section was a Resources section with the following pages:
This section is for a target audience that wishes to contribute to the frontier of research in reinforcement learning. Building the background, learn by doing, and how to be rigorous in research projects. This section is filled with information that feels like it was distilled from the author’s experience. The most interesting part for me is the observation that “broken RL code almost always fails silently“. Meaning there are no error messages, just a failure for the agent to learn from its experience. The worst part when this happens is that it’s hard to tell the difference between a flawed algorithm and a flawed implementation of a good algorithm.
This was the completely expected directory of papers, roughly organized by the taxonomy used for the introduction in this guide. I believe the author assembled this list to form a solid foundation for further exploration. It appears that most (and possibly all) of them are freely available, a refreshing change from the paywalls that usually block curious minds.
Oh no, I didn’t know there would be homework in this class! Yet here we are. A few problems for readers to attempt implementing using their choice of TensorFlow or PyTorch, along with solutions to check answers against. The first set cover basic implementation, the second set covers some algorithm failure modes. This reinforces what was covered earlier: broken RL code fails silently, so an aspiring practitioner must recognize symptoms of failure modes.
The Spinning Up in Deep RL guide is accompanied by implementations of several representative algorithms. Most of them have two implementations: one in TensorFlow and one in PyTorch. But when I run them on my own computer, how would I know if they’re running correctly? This page is a valuable resource to check against. It has charts for these algorithms’ performance in five MuJoCo Gym environments under conditions also described on this page. And once I have confidence they are running correctly, these are also benchmarks to see if I can improve their performance. For these implementations are written for beginners to read and understand, which means skipping some of the esoteric hard-to-understand optimizations that readers are challenged to put in themselves.
For people who want optimized implementations of these algorithms OpenAI has them covered too. Or at least, they used to.