Links to Code by Topic

Actions

Agents

Diagnostics

Environments

Feature Extractors

Rewards

States

Training and Running Agents

Value Estimation

Links to Code by Book Chapter

Chapter 1: Introduction

Chapter 2: Multi-armed Bandits

Chapter 3: Finite Markov Decision Processes

Chapter 4: Dynamic Programming

Chapter 5: Monte Carlo Methods

Chapter 6: Temporal-Difference Learning

Chapter 8: Planning and Learning with Tabular Methods

Chapter 9: On-policy Prediction with Approximation

Chapter 10: On-policy Control with Approximation

Chapter 13: Policy Gradient Methods