Skip to the content.

Home > Chapter 2: Multi-armed Bandits

rlai.core.environments.bandit.Arm

Bandit arm.

rlai.core.EpsilonGreedyQValueAgent

Nonassociative, epsilon-greedy agent.

rlai.core.q_value.QValueAgent

Nonassociative, q-value agent.

rlai.core.environments.bandit.KArmedBandit

K-armed bandit.

rlai.utils.IncrementalSampleAverager

An incremental, constant-time and -memory sample averager. Supports both decreasing (i.e., unweighted sample
    average) and constant (i.e., exponential recency-weighted average, pp. 32-33) step sizes.

rlai.core.UpperConfidenceBoundAgent

Nonassociatve, upper-confidence-bound agent.

rlai.core.PreferenceGradientAgent

Preference-gradient agent.