Usage and Examples
This guide provides a progressive learning path from basic concepts to advanced production scenarios.
Getting Started
Start here if you’re new to multi-armed bandits or this library.
What you’ll learn: Your first bandit with Thompson sampling. Covers the basic pull-update cycle and core concepts like exploration vs exploitation.
Core Concepts
Essential techniques for effective bandit implementation.
Bayesian Updating (counts): Understanding how bandits learn from rewards using conjugate priors and posterior distributions.
Contextual Bandits (linear-bandits): Using user and item features to make personalized decisions with linear models.
Advanced Techniques
Powerful methods for improved efficiency and performance.
Cross-Arm Learning (hybrid-bandits): Share knowledge across similar arms for faster learning and better sample efficiency. Essential for large action spaces.
Production Deployment (persistence): Patterns for saving, loading, and updating bandits in live systems with proper serialization.
Specialized Scenarios
Handle challenging real-world conditions.
Adversarial Environments (adversarial): Robust bandits for non-stationary environments using EXP3A algorithm.
Delayed Feedback (delayed-reward): Handling scenarios where rewards arrive hours or days after actions.
Historical Data (offline-learning): Bootstrap bandits using existing data before deploying online.
Agent Types Guide
Choose the right agent for your use case:
- Agent
Non-contextual bandits where all users/items are identical. Use for simple A/B testing.
- ContextualAgent
Contextual bandits with separate models per arm. Use when arms are completely different (e.g., different product categories).
- LipschitzContextualAgent
Contextual bandits with shared model across arms. Use for large action spaces where arms are similar (e.g., thousands of articles, products).
Pipeline Integration
Integrate with sklearn preprocessing pipelines:
from bayesianbandits.pipelines import AgentPipeline
from sklearn.preprocessing import StandardScaler
# Preprocess contexts before bandit sees them
pipeline = AgentPipeline([
('scaler', StandardScaler()),
('agent', ContextualAgent(arms, policy))
])
# Or preprocess features within learners
from bayesianbandits.pipelines import LearnerPipeline
learner = LearnerPipeline([
('preprocessor', StandardScaler()),
('regressor', NormalRegressor())
])