Notes on Reinforcement Learning Algorithm Implementations Published by OpenAI

OpenAI’s Spinning Up in Deep RL guide has copious written resources, but it also offers resources in the form of algorithm implementations in code. Each implementation is accompanied by a background section talking about how the algorithm works, summarized by a “Quick Facts” section that serves as a high-level view of how these algorithms differ from each other. As of this writing, there are six implementations. I understand the main division is that there are three “on-policy” algorithms and three “off-policy” algorithms. Within each division, the three algorithms are roughly sorted by age and illustrate an arc of researchin the field. The best(?) on-policy algorithm here is Proximal Policy Optimization (PPO) and representing off-policy vanguard is Soft Actor-Critic (SAC).

These implementations are geared towards teaching these algorithms. Since priority is placed on this learning context, these implementations are missing enhancements and optimizations that would make the code harder to understand. For example, many of these implementations do not support parallelization.

This is great for its intended purpose of supplementing Spinning Up, but some people wanted an already optimized implementation. For this audience, OpenAI published the OpenAI baselines a few years ago to serve as high water marks for algorithm implementation. These baselines can serve either as starting point for other projects, or benchmarks to measure potential improvements against.

However, it appears these baselines have gone a little stale. Not any fault of the implementations themselves, but merely due to the rapidly moving nature of this research field. The repository hosts implementations using TensorFlow version 1.0, which has been deprecated. There’s a (partial? full?) conversion to TensorFlow 2.0 in a separate branch, but that never merged back to the main branch for whatever reason. As of this writing there’s an open issue asking for PyTorch implementations, prompted by OpenAI’s own proclamation that they will be standardizing on PyTorch. (This proclamation actually led to the PyTorch conversion of Spinning Up in Deep RL examples.) However, there’s no word yet on PyTorch conversions for OpenAI baselines so anyone who want implementations in PyTorch will either have to optimize the Spinning Up implementations themselves or look elsewhere.

Of course, all of this assumes reinforcement learning will actually solve the problem, which I’ve learned might not be a good assumption.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s