bayesianbandits

bayesianbandits (`bayesianbandits`)

A Python library for Bayesian Multi-Armed Bandits.

This library implements a variety of multi-armed bandit algorithms, including epsilon-greedy, Thompson sampling, and upper confidence bound. It also handles a number of common problems in multi-armed bandit problems, including contextual bandits, delayed reward, and restless bandits.

This library is designed to be easy to use and extend. It is built on top of scikit-learn, and uses scikit-learn-style estimators to model the arms. This allows you to use any scikit-learn estimator that supports the partial_fit and sample methods as an arm in a bandit. Restless bandits also require the decay method.

The Agent API found in bayesianbandits.api is reasonably stable and is currently used in production.

Agent API

The Agent API is the most ergonomic way to use this library in production. It is designed to maximize your IDE’s ability to autocomplete and type-check your code. Additionally, it is designed to make it easy to modify the arms and the policies of your bandit as your needs change.

The Agent API requires a slightly different interface for choice policies than the old Bandit API, but these policies and the policy decorators use the same underlying code. Both are available for backwards compatibility.

`Agent`(arms, policy[, random_seed])	Agent for a non-contextual multi-armed bandit problem.
`ContextualAgent`(arms, policy[, random_seed])	Agent for a contextual multi-armed bandit problem.
`EpsilonGreedy`([epsilon, samples])	Policy object for epsilon-greedy.
`ThompsonSampling`()	Policy object for Thompson sampling.
`UpperConfidenceBound`([alpha, samples])	Policy object for upper confidence bound.

Bandit and Arm Classes

The Arm class is the base class for all arms in a bandit. Its constructor takes two arguments, action_function and reward_function, which represent the action taken by the pull method of the arm and the mechanism for computing the reward from the outcome of the action.

`Bandit`([rng, cache])	Base class for bandits.
`Arm`(action_token[, reward_function, learner])	Arm of a bandit.

Bandit Decorators

These class decorators can be used to specialize Bandit subclasses for particular problems.

`contextual`(cls)	Decorator for making a bandit contextual.
`restless`(cls)	Decorator for restless bandits.

Policies

These functions can be used to create policy functions for bandits. They should be passed to the policy argument of the bandit decorator.

`epsilon_greedy`([epsilon, samples])	Creates an epsilon-greedy choice algorithm.
`thompson_sampling`()	Creates a Thompson sampling choice algorithm.
`upper_confidence_bound`([alpha, samples])	Creates a UCB choice algorithm.

Estimators

These estimators are the underlying models for the arms in a bandit. They should be passed to the learner argument of the bandit decorator. Each of these Bayesian estimators can be converted to a recursive estimator by passing a learning_rate argument to the constructor that is less than 1. Each of them implement a decay method that uses the learning_rate to increase the variance of the prior. This is a type of state-space model that is useful for restless bandits.

`DirichletClassifier`(alphas, *[, ...])	Intercept-only Dirichlet Classifier
`GammaRegressor`(alpha, beta, *[, ...])	Intercept-only Gamma regression model.
`NormalRegressor`(alpha, beta, *[, ...])	A Bayesian linear regression model that assumes a Gaussian noise distribution.
`NormalInverseGammaRegressor`(*[, mu, lam, a, ...])	Bayesian linear regression with unknown variance.

Exceptions

These are custom exceptions raised by the bandit classes.

`DelayedRewardException`	Exception raised when the user does not handle delayed reward bandits correctly.
`DelayedRewardWarning`	Warning raised when the user does not handle delayed reward bandits correctly.

bayesianbandits