Notes on “Introduction to RL” by OpenAI

With some confidence that I could practice simple reinforcement learning algorithms, I moved on to what I consider the meat of OpenAI’s Spinning Up in Deep RL guide: the section titled Introduction to RL. I thought it was pretty good stuff.

Part 1: Key Concepts in RL

Since I had been marginally interested in the field, the first section What Can RL Do? was mostly review for me. More than anything else, it confirmed that I was reading the right page. This section proceeded to Key Concepts and Terminology and that quickly caught up then surpassed what knowledge I already had. I was fine with the verbal descriptions and the code snippets, but once the author started phrasing in terms of equations it became more abstract than what I can easily follow. I’m a code person, not an equations person. Given my difficulty comprehending the formalized equations, I was a little bemused at the third section (Optional) Formalism as things were already becoming more formalized than my usual style.

Part 2: Kinds of RL Algorithms

Given that I didn’t comprehend everything in part 1, I was a little scared at what I would find in part 2. Gold. I found pure gold. Here I found a quick survey of major characteristics of various reinforcement learning approaches, explaining those terms in ways I (mostly) understood. Explanations like what “Model-Free” vs. “Model-Based” means. What “on-policy” vs. “off-policy” means. I had seen a lot of these terms thrown around in things I read before, but I never found a place laying them down against one another until I found this section. This section alone was worth the effort of digging into this guide and will help me understand other things. For example, a lot of these terms were in the documentation for Unity ML-Agents. I didn’t understand what they meant at the time, but now I hope to understand more when I review them.

Part 3: Intro to Policy Optimization

Cheered by how informative I found part 2, I proceeded to part 3 and was promptly brought back down to size by the equations. The good thing is that this section had both equations and a code example implementation, and they were explained more or less in parallel. Someone like myself can read the code and try to understand the equations. I expect there are people out there who are the reverse, more comfortable with the equations and appreciate seeing what they look like in code. That said, I can’t claim I completely understand the code, either. Some of the mathematical complexities represented in the equations were not in the sample source code: They were handed off to implementations in PyTorch library.

I foresee a lot of studying PyTorch documentation before I can get proficient at writing something new on my own. But just from reading this Introduction to RL section, I have the basic understanding to navigate PyTorch API documentation and maybe the skill to apply existing implementations written by somebody else. And as expected of an introduction, there are tons of links to more resources.

New Screwdriver

My Project Diary of Coding, Making, and Tinkering

Notes on “Introduction to RL” by OpenAI

Part 1: Key Concepts in RL

Part 2: Kinds of RL Algorithms

Part 3: Intro to Policy Optimization

Leave a comment Cancel reply

Part 1: Key Concepts in RL

Part 2: Kinds of RL Algorithms

Part 3: Intro to Policy Optimization

Share this:

Related

Leave a comment Cancel reply