Bandit Persistence Recipes

In production, it is often necessary to persist data to disk. This notebook demonstrates how Bandit subclasses can be persisted to disk, reloaded, and even redefined on the fly.

First, let’s create a simple subclass of Bandit that will be trained a little, then persisted to disk.

[1]:

import numpy as np
from bayesianbandits import Arm, GammaRegressor, Agent, EpsilonGreedy

arms = [
    Arm("Action 1", learner=GammaRegressor(alpha=1, beta=1)),
    Arm("Action 2", learner=GammaRegressor(alpha=1, beta=1)),
]

agent = Agent(arms, EpsilonGreedy(epsilon=0.1), random_seed=1)

First, we’ll pull the arm once, update, and then persist the bandit to disk.

[2]:

agent.pull()
agent.update(np.atleast_1d(1))

print(f"Learned alpha and beta for arm 1: {agent.arms[0].learner.coef_[1]}")

Learned alpha and beta for arm 1: [2. 2.]

joblib is a great library for persisting objects to disk. It is a dependency of scikit-learn, so it is already installed when installing bayesianbandits.

As we can see, the learned state of the bandit is persisted to disk. We can reload the bandit from disk, and it will be in the same state as before.

[3]:

import joblib

joblib.dump(agent, "agent.pkl")

loaded: Agent[GammaRegressor, str] = joblib.load("agent.pkl")

print(f"Learned alpha and beta for arm 1: {loaded.arms[0].learner.coef_[1]}")

Learned alpha and beta for arm 1: [2. 2.]

After being reloaded, the bandit can be used as normal.

[4]:

loaded.pull()
loaded.update(np.atleast_1d(1))

print(f"Learned alpha and beta for arm 1: {loaded.arms[0].learner.coef_[1]}")
print(f"Learned alpha and beta for arm 2: {loaded.arms[1].learner.coef_[1]}")

joblib.dump(loaded, "agent.pkl")

Learned alpha and beta for arm 1: [3. 3.]
Learned alpha and beta for arm 2: [1. 1.]

[4]:

['agent.pkl']

After your learning session has gone on for some time, you may get an idea for a new arm. You want to try it out, but you don’t want to lose the state of the bandit you’ve already learned. Fortunately, you can just redefine the Bandit subclass definition and reload the bandit from disk. Any arms in the new definition will be initialized when the bandit is reloaded.

Note that the learned state of arm 1 is preserved.

[5]:

arm_3 = Arm("Action 3", learner=GammaRegressor(alpha=1, beta=1))
loaded_with_new_def: Agent[GammaRegressor, str] = joblib.load("agent.pkl")
loaded_with_new_def.add_arm(arm_3)

print(f"Learned alpha and beta for arm 1: {loaded_with_new_def.arms[0].learner.coef_[1]}")
print(f"Learned alpha and beta for arm 2: {loaded_with_new_def.arms[1].learner.coef_[1]}")

print(f"Arms: {loaded_with_new_def.arms}")

Learned alpha and beta for arm 1: [3. 3.]
Learned alpha and beta for arm 2: [1. 1.]
Arms: [Arm(action_token=Action 1, reward_function=<function identity at 0x7fde749b6f70>, Arm(action_token=Action 2, reward_function=<function identity at 0x7fde749b6f70>, Arm(action_token=Action 3, reward_function=<function identity at 0x7fde749b6f70>]

Again, the bandit can be used as normal.

[6]:

loaded_with_new_def.pull()
loaded_with_new_def.update(np.atleast_1d(1))

print(f"Learned alpha and beta for arm 1: {loaded_with_new_def.arms[0].learner.coef_[1]}")
print(f"Learned alpha and beta for arm 2: {loaded_with_new_def.arms[1].learner.coef_[1]}")
print(f"Learned alpha and beta for arm 3: {loaded_with_new_def.arms[2].learner.coef_[1]}")

joblib.dump(loaded_with_new_def, "agent.pkl")

Learned alpha and beta for arm 1: [3. 3.]
Learned alpha and beta for arm 2: [1. 1.]
Learned alpha and beta for arm 3: [2. 2.]

[6]:

['agent.pkl']

Now, you may decide that arm2 is not a good arm, and you want to remove it from the bandit. You can do this by redefining the Bandit subclass definition and reloading the bandit from disk. Any arms in the Bandit instance that are not in the new definition will be removed when the bandit is reloaded.

Note that this is a destructive operation upon re-serialization, and the learned state of arm 1 is lost forever!

[7]:

loaded_with_removed_arm: Agent[GammaRegressor, str] = joblib.load("agent.pkl")
loaded_with_removed_arm.remove_arm("Action 2")

print(f"Arms: {loaded_with_new_def.arms}")

print(f"Learned alpha and beta for arm 1: {loaded_with_removed_arm.arms[0].learner.coef_[1]}")
print(f"Learned alpha and beta for arm 3: {loaded_with_removed_arm.arms[1].learner.coef_[1]}")

Arms: [Arm(action_token=Action 1, reward_function=<function identity at 0x7fde749b6f70>, Arm(action_token=Action 2, reward_function=<function identity at 0x7fde749b6f70>, Arm(action_token=Action 3, reward_function=<function identity at 0x7fde749b6f70>]
Learned alpha and beta for arm 1: [3. 3.]
Learned alpha and beta for arm 3: [2. 2.]