Skip to the content.



This is an implementation of concepts and algorithms described in “Reinforcement Learning: An Introduction” (Sutton and Barto, 2018, 2nd edition). It is a work in progress, implemented with the following objectives in mind.

  1. Complete conceptual and algorithmic coverage: Implement all concepts and algorithms described in the text, plus some.
  2. Minimal dependencies: All computation specific to the text is implemented here.
  3. Complete test coverage: All implementations are paired with unit tests.
  4. General-purpose design: The text provides concise pseudocode that is not difficult to implement for the examples covered; however, such implementations do not necessarily lead to reusable and extensible code that is generally applicable beyond such examples. The approach taken here should be generally applicable well beyond the text.

Quick Start

For single-click access to a graphical interface for RLAI, please click below:


Note that Binder notebooks are hosted for free by sponsors who donate computational infrastructure. Limitations are placed on each notebook, so don’t expect the Binder interface to support heavy workloads. See the following section for alternatives.

Installation, Use, and Development

The RLAI code is distributed via PyPI and can be installed with pip install rlai. There are several ways to use the package.

Looking for a place to dig in? Below are a few ideas organized by area of interest.


Case Studies

The gridworld and other simple environments (e.g., gambler’s problem) are used throughout the package to develop, implement, and test algorithmic concepts. Sutton and Barto do a nice job of explaining how reinforcement learning works for these environments. Below is a list of environments that are not covered in as much detail (e.g., the mountain car) or are not covered at all (e.g., Robocode). They are more difficult to train agents for and are instructive for understanding how agents are parameterized and rewarded.

OpenAI Gym

OpenAI Gym is a collection of environments that range from traditional control to advanced robotics. Case studies have been developed for the following OpenAI Gym environments, which are ordered roughly by increasing complexity:


Robocode is a simulation-based robotic combat programming game with a dynamically rich environment, multi-agent teaming, and a large user community. Read more here.

Figures from the Textbook

A list of figures can be found here. Most of these are reproductions of those shown in the Sutton and Barto text; however, even the reproductions typically provide detail not shown in the text.

Links to Code by Topic





Feature Extractors



Training and Running Agents

Value Estimation

Links to Code by Book Chapter

Chapter 1: Introduction

Chapter 2: Multi-armed Bandits

Chapter 3: Finite Markov Decision Processes

Chapter 4: Dynamic Programming

Chapter 5: Monte Carlo Methods

Chapter 6: Temporal-Difference Learning

Chapter 8: Planning and Learning with Tabular Methods

Chapter 9: On-policy Prediction with Approximation

Chapter 10: On-policy Control with Approximation

Chapter 13: Policy Gradient Methods