Home > Chapter 2: Multi-armed Bandits
rlai.core.environments.bandit.Arm
Bandit arm.
rlai.core.EpsilonGreedyQValueAgent
Nonassociative, epsilon-greedy agent.
rlai.core.QValueAgent
Nonassociative, q-value agent.
rlai.core.environments.bandit.KArmedBandit
K-armed bandit.
rlai.utils.IncrementalSampleAverager
An incremental, constant-time and -memory sample averager. Supports both decreasing (i.e., unweighted sample
average) and constant (i.e., exponential recency-weighted average, pp. 32-33) step sizes.
rlai.core.UpperConfidenceBoundAgent
Nonassociative, upper-confidence-bound agent.
rlai.core.PreferenceGradientAgent
Preference-gradient agent.