bayesianbandits.LipschitzContextualAgent#
- class bayesianbandits.LipschitzContextualAgent(arms: Sequence[Arm[Any, TokenType]], policy: PolicyProtocol[Any, TokenType], arm_featurizer: ArmFeaturizer[TokenType], learner: Learner[Any], batch_reward_function: Callable[[ndarray[tuple[int, ...], dtype[float64]], List[Any]], ndarray[tuple[int, ...], dtype[float64]]] | Callable[[ndarray[tuple[int, ...], dtype[float64]], List[Any], Sized], ndarray[tuple[int, ...], dtype[float64]]] | None = None, random_seed: int | None | Generator = None)#
Bases:
Generic[TokenType]Contextual agent with a shared learner and a configurable design matrix.
This is the most general agent in the library. The design matrix, constructed by the
arm_featurizer, encodes your assumptions about how arms relate to each other and to the context. By choosing the right design matrix structure, you can express a spectrum of models:One-hot arms only (no context features): recovers
Agent(non-contextual bandits).One-hot arms interacted with context (block-diagonal design matrix): recovers
ContextualAgent(disjoint bandits, independent parameters per arm).Shared features + arm-specific intercepts: hybrid bandits with cross-arm learning and Bayesian shrinkage toward shared structure (see the
hybrid-banditstutorial).Continuous arm features: Lipschitz-style bandits (the namesake), where nearby arms in feature space share information.
Formally, the agent uses a single shared learner that conditions on both context and arm features:
\[\tilde{x}_{a} = \phi(x, a), \qquad r \mid \tilde{x}_{a} \sim p(r \mid \theta, \tilde{x}_{a})\]where \(\phi\) is the arm featurizer that constructs the design matrix from context \(x\) and arm identity \(a\), and \(\theta\) is the shared parameter vector. At each round, posterior samples for all arms are drawn in a single vectorized call:
\[a^* = \pi\bigl( \{\tilde{\theta} \sim p(\theta \mid \mathcal{D})\}, \; \{\phi(x, a)\}_{a=1}^{K}\bigr)\]- Parameters:
arms (
Sequence[Arm[Any,TokenType]]) – Arms to choose from. Arms may havelearner=None; the shared learner is set on every arm during initialization.policy (
PolicyProtocol[Any,TokenType]) – Policy object for arm selection. All built-in policies (ThompsonSampling,UpperConfidenceBound,EpsilonGreedy) are compatible.arm_featurizer (
ArmFeaturizer[TokenType]) – Featurizer that constructs the design matrix from(context, action_tokens)in a single vectorized call. The structure of this matrix encodes assumptions about how arms relate to each other and to the context – see Notes.learner (
Learner) – Shared learner instance that will be set on all arms. Because all arms share this object, updates to any arm improve predictions for every arm.batch_reward_function (
BatchRewardFunctionorContextAwareBatchRewardFunctionorNone, defaultNone) –Optional function that processes rewards for all arms at once.
Traditional signature:
def batch_reward(samples, action_tokens): # samples: shape (n_arms, n_contexts, size, ...) # action_tokens: list of length n_arms return rewards # shape (n_arms, n_contexts, size)
Context-aware signature:
def batch_reward(samples, action_tokens, X): # X: original context, shape (n_contexts, n_features) return rewards # shape (n_arms, n_contexts, size)
The
action_tokenslist is ordered to match the first dimension ofsamples. If None and all arms use the identity reward function, an optimized batch identity is used automatically.random_seed (
int,np.random.Generator, orNone, defaultNone) – Controls the random number generator shared by the policy and the learner. Pass an int for reproducible results across calls.
See also
ContextualAgentIndependent-learner agent; equivalent to this agent with a block-diagonal design matrix (no parameter sharing).
AgentNon-contextual (intercept-only) agent; equivalent to this agent with one-hot arm indicators and no context features.
ArmColumnFeaturizerDefault featurizer that appends an arm identifier column to the context matrix.
Notes
Vectorized pull. During
pull(), contexts are enriched for all arms in a single featurizer call, followed by a single learnersamplecall for the entire(n_arms * n_contexts)batch. This yields significant speedups when \(K \gg 100\).Selective update. During
update(), contexts are enriched only for the selected arm, so the update cost is independent of \(K\).Design matrix as assumption encoding. The structure of \(\phi(x, a)\) is the mechanism by which you encode domain knowledge about the relationship between arms [3]. A block-diagonal design matrix (one-hot arms interacted with context) yields fully independent parameters per arm – equivalent to
ContextualAgent. Adding shared columns (e.g. user features that affect all arms) introduces cross-arm learning: the shared learner pools data across arms for those features while keeping arm-specific effects separate. This creates a “poor man’s hierarchical model” where Bayesian priors automatically shrink arm-specific effects toward the shared structure. See thehybrid-banditstutorial for a worked example.Relationship to other agents.
AgentandContextualAgentare special cases of this agent with particular design matrix structures. This makesLipschitzContextualAgentthe most general agent in the library, suitable for any problem where you can describe the arm structure through features.Name origin. The class name comes from the Lipschitz bandit literature [1] [2], where rewards vary smoothly with continuous arm features. The agent is not limited to that setting – it works equally well with discrete arms and arbitrary feature structures.
References
Examples
Create an agent for product recommendation with 100 products:
>>> import numpy as np >>> from bayesianbandits import Arm, NormalRegressor, ThompsonSampling >>> from bayesianbandits import ArmColumnFeaturizer >>> >>> # Define action space - product IDs >>> product_ids = list(range(100)) >>> >>> # Create arms without learners initially >>> arms = [Arm(token, learner=None) for token in product_ids] >>> >>> # Create agent with shared learner >>> agent = LipschitzContextualAgent( ... arms=arms, ... policy=ThompsonSampling(), ... arm_featurizer=ArmColumnFeaturizer(column_name='product_id'), ... learner=NormalRegressor(alpha=1.0, beta=1.0), ... random_seed=0 ... ) >>> >>> # Use normally - single call handles all arms efficiently >>> X = np.array([[25, 50000], [35, 75000]]) # age, income >>> selected_products = agent.pull(X) # Returns [product_id1, product_id2] >>> >>> # Update with observed rewards >>> for token, context, reward in zip(selected_products, X, [1.0, 0.5]): ... agent.select_for_update(token).update(np.atleast_2d(context), np.array([reward]))
Using a batch reward function for revenue optimization:
>>> # Pre-compute revenue array for all products (vectorized approach) >>> n_products = 100 >>> product_revenues = np.random.uniform(0.5, 3.0, n_products) # Revenue per product >>> >>> # Create vectorized batch reward function >>> def revenue_batch_reward(samples, action_tokens): ... # Direct numpy indexing - fully vectorized ... multipliers = product_revenues[action_tokens] ... # Broadcast to match samples shape: (n_arms, n_contexts, size) ... return samples * multipliers[:, np.newaxis, np.newaxis] >>> >>> # Create agent with batch reward function >>> agent = LipschitzContextualAgent( ... arms=arms, ... policy=ThompsonSampling(), ... arm_featurizer=ArmColumnFeaturizer(column_name="product_id"), ... learner=NormalRegressor(alpha=1, beta=1), ... batch_reward_function=revenue_batch_reward ... )
Using a context-aware batch reward function:
>>> # Context-aware: calculate gross profit from prices, costs, and taxes >>> # Arms represent different price points >>> price_points = np.array([9.99, 14.99, 19.99, 24.99, 29.99]) >>> arms = [Arm(i, learner=None) for i in range(len(price_points))] >>> >>> def gross_profit_reward(samples, action_tokens, X): ... # X contains: [customer_value, cost_per_unit, tax_rate] ... costs = X[:, 1] # shape: (n_contexts,) ... tax_rates = X[:, 2] # shape: (n_contexts,) ... ... # Get prices for selected arms ... prices = price_points[action_tokens] # shape: (n_arms,) ... ... # Vectorized profit calculation for all (arm, context) pairs ... # Revenue after tax: price * (1 - tax_rate) ... # Gross profit: revenue_after_tax - cost ... revenue_after_tax = prices[:, np.newaxis] * (1 - tax_rates[np.newaxis, :]) ... gross_profit = revenue_after_tax - costs[np.newaxis, :] ... ... # Apply profit multiplier to samples, clamping negative profits to 0 ... profit_multiplier = np.maximum(gross_profit, 0) ... return samples * profit_multiplier[:, :, np.newaxis]
- __init__(arms: Sequence[Arm[Any, TokenType]], policy: PolicyProtocol[Any, TokenType], arm_featurizer: ArmFeaturizer[TokenType], learner: Learner[Any], batch_reward_function: Callable[[ndarray[tuple[int, ...], dtype[float64]], List[Any]], ndarray[tuple[int, ...], dtype[float64]]] | Callable[[ndarray[tuple[int, ...], dtype[float64]], List[Any], Sized], ndarray[tuple[int, ...], dtype[float64]]] | None = None, random_seed: int | None | Generator = None)#
- add_arm(arm: Arm[Any, TokenType]) None#
Add an arm to the agent and set the shared learner.
- Parameters:
arm (
Arm[Any,TokenType]) – Arm to add to the agent.- Raises:
ValueError – If the arm’s action token is already in the agent.
- arm(token: TokenType) Arm[Any, TokenType]#
Get an arm by its action token.
- Parameters:
token (
TokenType) – Action token of the arm to get.- Returns:
Arm with the action token.
- Return type:
Arm[Any,TokenType]- Raises:
KeyError – If the arm’s action token is not in the agent.
- decay(X: Sized, decay_rate: float | None = None) None#
Decay the shared learner with all arms’ features.
- Parameters:
X (
Sized) – Context matrix to use for decaying.decay_rate (
Optional[float], defaultNone) – Decay rate to use. If None, the learner’s default decay rate is used.
Notes
This method enriches contexts with a single arm’s features and applies decay to the shared learner once. This ensures we decay based on the number of contexts, not the number of arms.
- pull(X: Sized) List[TokenType]#
- pull(X: Sized, *, top_k: int) List[List[TokenType]]
Choose arm(s) and pull based on the context(s).
- Parameters:
X (
Sized) – Context matrix to use for choosing arms.top_k (
int, optional) – Number of arms to select per context. If None (default), selects single best arm per context. If specified, selects top k arms per context.
- Returns:
If top_k is None: List of action tokens (one per context) If top_k is int: List of lists of action tokens
- Return type:
List[TokenType]orList[List[TokenType]]
Notes
When top_k is None, arm_to_update is set to the last selected arm. When top_k is specified, arm_to_update is NOT updated - you must explicitly call select_for_update() before update() to specify which arm’s feedback you’re providing.
The method performs vectorized operations: 1. Single featurizer call for all arms (major efficiency gain) 2. Single learner sample call for all arm-context pairs 3. Efficient reshape and reward function application 4. Policy selection using standard interface
- remove_arm(token: TokenType) None#
Remove an arm from the agent.
- Parameters:
token (
TokenType) – Action token of the arm to remove.- Raises:
KeyError – If the arm’s action token is not in the agent.
- property rng: Generator#
- select_for_update(token: TokenType) Self#
Set the arm_to_update and return self for chaining.
- Parameters:
token (
TokenType) – Action token of the arm to update.- Returns:
Self for chaining.
- Return type:
Self- Raises:
KeyError – If the arm’s action token is not in the agent.
- update(X: Sized, y: ndarray[tuple[int, ...], dtype[float64]], sample_weight: ndarray[tuple[int, ...], dtype[float64]] | None = None) None#
Update the arm_to_update with the context(s) and the reward(s).
- Parameters:
X (
Sized) – Context matrix to use for updating the arm.y (
NDArray[np.float64]) – Reward(s) to use for updating the arm.sample_weight (
Optional[NDArray[np.float64]], defaultNone) – Sample weights to use for updating the arm. If None, all samples are weighted equally.
Notes
This method enriches contexts with ONLY the selected arm’s features, then delegates to the policy’s update method which will call arm.update() using the shared learner.