Deploying to Production#
Serialize with joblib#
joblib is a dependency of scikit-learn, so it’s already installed.
Note
Lambdas, closures, and factory-produced functions (like the
make_profit_reward pattern in Writing Custom Reward Functions) are not
picklable with standard pickle. If your arms use reward
functions like these, use a callable class with __call__
instead, or serialize with cloudpickle.
import joblib
import numpy as np
from bayesianbandits import Agent, Arm, GammaRegressor, ThompsonSampling
arms = [
Arm("ad_a", learner=GammaRegressor(alpha=1, beta=1)),
Arm("ad_b", learner=GammaRegressor(alpha=1, beta=1)),
]
agent = Agent(arms, ThompsonSampling(), random_seed=42)
(choice,) = agent.pull()
agent.update(np.array([1.0]))
joblib.dump(agent, "agent.pkl", compress=True)
loaded = joblib.load("agent.pkl")
# Learned state is preserved
assert loaded.arms[0].learner.coef_[1][0] == agent.arms[0].learner.coef_[1][0]
Uncompressed pickles of precision matrices can be large. The learned state compresses efficiently: a sparse model with 1M features and 4M nonzeros is a couple hundred KB at rest.
Reseed the RNG after loading#
After deserialization, the RNG state is frozen from save time. Every copy loaded from the same file replays the exact same exploration sequence. Reseed immediately after loading:
loaded = joblib.load("agent.pkl")
loaded.rng = None # seeds from OS entropy
This creates a fresh numpy.random.Generator and propagates it to
all arm learners. Pass an int instead if you need reproducibility.
Add and remove arms at runtime#
New arms start with a fresh prior. Existing arms keep their learned state:
from bayesianbandits import Arm, GammaRegressor
# Add a new arm
loaded.add_arm(Arm("ad_c", learner=GammaRegressor(alpha=1, beta=1)))
# Remove an underperforming arm
loaded.remove_arm("ad_a")
joblib.dump(loaded, "agent.pkl")
Removing an arm is destructive: its learned state is gone on re-serialization. Action tokens must be unique across arms.
Important
Isolate from your application server. BLAS and LAPACK, which
back every pull() and update() call, will eagerly use all
available cores. A single Cholesky solve can saturate a machine for
the duration of the call. If the bandit lives on the same server as
your application, a burst of pulls can starve your request-handling
threads. Run the bandit in a separate process or on a dedicated
host.
Agents are mutable and not thread-safe. Keep one agent per process.
joblib.load produces an independent copy, so you can run
multiple reader processes that pull concurrently and funnel updates
through a single writer.