rlai | This is a Python implementation of concepts and algorithms described in “Reinforcement Learning: An Introduction” (Sutton and Barto, 2018, 2nd edition).

Home > Chapter 8: Planning and Learning with Tabular Methods

rlai.core.environments.mdp.MdpPlanningEnvironment

An MDP planning environment, used to generate simulated experience based on a model of the MDP that is learned
    through direct experience with the actual environment.

rlai.core.environments.mdp.EnvironmentModel

An environment model.

rlai.core.environments.mdp.PrioritizedSweepingMdpPlanningEnvironment

State-action transitions are prioritized based on the degree to which learning updates their values, and transitions
    with the highest priority are explored during planning.

rlai.core.environments.mdp.StochasticEnvironmentModel

A stochastic environment model.

rlai.core.environments.mdp.TrajectorySamplingMdpPlanningEnvironment

State-action transitions are selected by the agent based on the agent's policy, and the selected transitions are
    explored during planning.