RLAI
- RLAI
- Introduction
- Quick Start
- Installation, Use, and Development
- Features
- Case Studies
- Figures from the Textbook
- Links to Code by Topic
- Links to Code by Book Chapter
- Chapter 1: Introduction
- Chapter 2: Multi-armed Bandits
- Chapter 3: Finite Markov Decision Processes
- Chapter 4: Dynamic Programming
- Chapter 5: Monte Carlo Methods
- Chapter 6: Temporal-Difference Learning
- Chapter 8: Planning and Learning with Tabular Methods
- Chapter 9: On-policy Prediction with Approximation
- Chapter 10: On-policy Control with Approximation
- Chapter 13: Policy Gradient Methods
Introduction
This is an implementation of concepts and algorithms described in “Reinforcement Learning: An Introduction” (Sutton and Barto, 2018, 2nd edition). It is a work in progress, implemented with the following objectives in mind.
- Complete conceptual and algorithmic coverage: Implement all concepts and algorithms described in the text, plus some.
- Minimal dependencies: All computation specific to the text is implemented here.
- Complete test coverage: All implementations are paired with unit tests.
- General-purpose design: The text provides concise pseudocode that is not difficult to implement for the examples covered; however, such implementations do not necessarily lead to reusable and extensible code that is generally applicable beyond such examples. The approach taken here should be generally applicable well beyond the text.
Quick Start
For single-click access to a graphical interface for RLAI, please click below:
Note that Binder notebooks are hosted for free by sponsors who donate computational infrastructure. Limitations are placed on each notebook, so don’t expect the Binder interface to support heavy workloads. See the following section for alternatives.
Installation, Use, and Development
The RLAI code is distributed via PyPI and can be installed with pip install rlai
.
There are several ways to use the package.
-
JupyterLab notebook: Most of the RLAI functionality is exposed via the companion JupyterLab notebook. See the JupyterLab guide for more information.
-
Package dependency: See the example repository for how a project can be structured to consume the RLAI package functionality within source code.
-
Command-line interface: Using RLAI from the command-line interface (CLI) is demonstrated in the case studies below and is also explored in the CLI guide.
-
See here for how to use RLAI on a Raspberry Pi system.
Looking for a place to dig in? Below are a few ideas organized by area of interest.
-
Explore new OpenAI Gym environments: OpenAI Gym provides a wide range of interesting environments, and experimenting with them can be as simple as modifying an existing training command (e.g., the one for inverted pendulum) and replacing the
--gym-id
with something else. Other changes might be needed depending on the environment, but Gym is particularly convenient. -
Incorporate new statistical learning methods: The RLAI SKLearnSGD module demonstrates how to use methods in scikit-learn (in this case stochastic gradient descent regression) to approximate state-action value functions. This is just one approach, and it would be interesting to compare time, memory, and reward performance with a nonparametric approach like KNN regression.
-
Feel free to ask questions, submit issues, and submit pull requests.
Features
- Diagnostic and interpretation tools: Diagnostic and interpretation tools become critical as the environment and agent increase in complexity (e.g., from tabular methods in small, discrete-space gridworlds to value function approximation methods in large, continuous-space control problems). Such tools can be found here.
Case Studies
The gridworld and other simple environments (e.g., gambler’s problem) are used throughout the package to develop, implement, and test algorithmic concepts. Sutton and Barto do a nice job of explaining how reinforcement learning works for these environments. Below is a list of environments that are not covered in as much detail (e.g., the mountain car) or are not covered at all (e.g., Robocode). They are more difficult to train agents for and are instructive for understanding how agents are parameterized and rewarded.
OpenAI Gym
OpenAI Gym is a collection of environments that range from traditional control to advanced robotics. Case studies have been developed for the following OpenAI Gym environments, which are ordered roughly by increasing complexity:
- Inverted Pendulum
- Acrobot
- Mountain Car
- Mountain Car with Continuous Control
- Lunar Lander with Continuous Control
- MuJoCo Swimming Worm with Continuous Control
- A follow-up using process-level parallelization for faster, better results.
Robocode
Robocode is a simulation-based robotic combat programming game with a dynamically rich environment, multi-agent teaming, and a large user community. Read more here.
Figures from the Textbook
A list of figures can be found here. Most of these are reproductions of those shown in the Sutton and Barto text; however, even the reproductions typically provide detail not shown in the text.