Skip to the content.

RLAI

Introduction

This is an implementation of concepts and algorithms described in “Reinforcement Learning: An Introduction” (Sutton and Barto, 2018, 2nd edition). It is a work in progress, implemented with the following objectives in mind.

  1. Complete conceptual and algorithmic coverage: Implement all concepts and algorithms described in the text, plus some.
  2. Minimal dependencies: All computation specific to the text is implemented here.
  3. Complete test coverage: All implementations are paired with unit tests.
  4. General-purpose design: The text provides concise pseudocode that is not difficult to implement for the examples covered; however, such implementations do not necessarily lead to reusable and extensible code that is generally applicable beyond such examples. The approach taken here should be generally applicable well beyond the text.

Status

Quick Start

For single-click access to a graphical interface for RLAI, please click below:

Binder

Note that Binder notebooks are hosted for free by sponsors who donate computational infrastructure. Limitations are placed on each notebook, so don’t expect the Binder interface to support heavy workloads. See the following section for alternatives.

Installation and Use

RLAI requires swig and ffmpeg to be installed on the system. These can be installed using a package manager on your OS (e.g., Homebrew for macOS, apt for Ubuntu, etc.). If installing with Homebrew on macOS, then you might need to add an environment variable pointing to ffmpeg as follows:

echo 'export IMAGEIO_FFMPEG_EXE="/opt/homebrew/bin/ffmpeg"' >> ~/.bash_profile

The RLAI code is distributed via PyPI. There are several ways to use the package.

Development

Looking for a place to dig in? Below are a few ideas organized by area of interest.

Features

Case Studies

The gridworld and other simple environments (e.g., gambler’s problem) are used throughout the package to develop, implement, and test algorithmic concepts. Sutton and Barto do a nice job of explaining how reinforcement learning works for these environments. Below is a list of environments that are not covered in as much detail (e.g., the mountain car) or are not covered at all (e.g., Robocode). They are more difficult to train agents for and are instructive for understanding how agents are parameterized and rewarded.

Gymnasium

Gymnasium is a collection of environments that range from traditional control to advanced robotics. Case studies have been developed for the following environments, which are ordered roughly by increasing complexity:

MuJoCo

RLAI works with MuJoCo either via Gymnasium described above or directly via the MuJoCo-provided Python bindings. On macOS, see here for how to fix OpenGL errors.

Robocode

Robocode is a simulation-based robotic combat programming game with a dynamically rich environment, multi-agent teaming, and a large user community. Read more here.

Figures from the Textbook

A list of figures can be found here. Most of these are reproductions of those shown in the Sutton and Barto text; however, even the reproductions typically provide detail not shown in the text.

Links to Code

See here.

Incrementing and Tagging Versions with Poetry

  1. Begin the next prerelease number within the current prerelease phase (e.g., 0.1.0a00.1.0a1):
    OLD_VERSION=$(poetry version --short)
    poetry version prerelease
    VERSION=$(poetry version --short)
    git commit -a -m "Next prerelease number:  ${OLD_VERSION}${VERSION}"
    git push
    
  2. Begin the next prerelease phase (e.g., 0.1.0a10.1.0b0):
    OLD_VERSION=$(poetry version --short)
    poetry version prerelease --next-phase
    VERSION=$(poetry version --short)
    git commit -a -m "Next prerelease phase:  ${OLD_VERSION}${VERSION}"
    git push
    

    The phases progress as alpha (a), beta (b), and release candidate (rc), each time resetting to a prerelease number of 0. After rc, the prerelease suffix (e.g., rc3) is stripped, leaving the major.minor.patch version.

  3. Release the next minor version (e.g., 0.1.0b10.1.0):
    OLD_VERSION=$(poetry version --short)
    poetry version minor
    VERSION=$(poetry version --short)
    git commit -a -m "New minor release:  ${OLD_VERSION}${VERSION}"
    git push
    
  4. Release the next major version (e.g., 0.1.0a02.0.0):
    OLD_VERSION=$(poetry version --short)
    poetry version major
    VERSION=$(poetry version --short)
    git commit -a -m "New major release:  ${OLD_VERSION}${VERSION}"
    git push
    
  5. Tag the current version:
    VERSION=$(poetry version --short)
    git tag -a -m "rlai v${VERSION}" "v${VERSION}"
    git push --follow-tags
    
  6. Begin the next minor prerelease (e.g., 0.1.00.2.0a0):
    OLD_VERSION=$(poetry version --short)
    poetry version preminor
    VERSION=$(poetry version --short)
    git commit -a -m "Next minor prerelease:  ${OLD_VERSION}${VERSION}"
    git push