Integrating with sklearn Transformers#
bayesianbandits estimators expect numeric arrays, but real data
arrives as JSON dicts, DataFrames, or raw features that need scaling.
Two wrapper classes let you plug in any sklearn transformer:
AgentPipelineWraps an
AgentorContextualAgent. Transforms run once before everypull()andupdate()call.LearnerPipelineWraps the shared learner inside a
LipschitzContextualAgent. Transforms run after the arm featurizer enriches the context, right before the underlying estimator sees the data.
Important
Every transformer must be stateless (like
FunctionTransformer) or pre-fitted before the pipeline is
created. Pipelines never call fit() on transformers.
Accept JSON input#
Use DictVectorizer to convert dicts to sparse feature matrices:
from sklearn.feature_extraction import DictVectorizer
from bayesianbandits import (
Arm, ContextualAgent, NormalRegressor, ThompsonSampling, AgentPipeline,
)
vectorizer = DictVectorizer(sparse=True)
vectorizer.fit([
{"user_age": 25, "region": "US"},
{"user_age": 40, "region": "EU"},
])
arms = [
Arm(f"variant_{i}", learner=NormalRegressor(alpha=1.0, beta=1.0, sparse=True))
for i in range(3)
]
agent = ContextualAgent(arms, ThompsonSampling(), random_seed=42)
pipeline = AgentPipeline(
steps=[("vectorize", vectorizer)],
final_agent=agent,
)
contexts = [{"user_age": 30, "region": "US"}]
(action,) = pipeline.pull(contexts)
pipeline.update(contexts, y=np.array([1.0]))
Accept DataFrame input#
Use ColumnTransformer to handle mixed column types:
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from bayesianbandits import (
Arm, ContextualAgent, NormalRegressor, ThompsonSampling, AgentPipeline,
)
sample_df = pd.DataFrame({
"age": [25, 40, 35],
"region": ["US", "EU", "US"],
})
ct = ColumnTransformer([
("num", StandardScaler(), ["age"]),
("cat", OneHotEncoder(sparse_output=False), ["region"]),
])
ct.fit(sample_df)
arms = [
Arm(f"variant_{i}", learner=NormalRegressor(alpha=1.0, beta=1.0))
for i in range(3)
]
agent = ContextualAgent(arms, ThompsonSampling(), random_seed=42)
pipeline = AgentPipeline(
steps=[("preprocess", ct)],
final_agent=agent,
)
df = pd.DataFrame({"age": [30], "region": ["US"]})
(action,) = pipeline.pull(df)
pipeline.update(df, y=np.array([1.0]))
Scale numeric features#
Fit the scaler on a representative historical dataset. In an online setting, any single batch may cover only a narrow slice of the input range, so fitting on live data would produce unstable statistics:
from sklearn.preprocessing import StandardScaler
from bayesianbandits import (
Arm, ContextualAgent, NormalRegressor, ThompsonSampling, AgentPipeline,
)
scaler = StandardScaler()
scaler.fit(historical_features) # shape (n_samples, n_features)
arms = [
Arm(f"variant_{i}", learner=NormalRegressor(alpha=1.0, beta=1.0))
for i in range(3)
]
agent = ContextualAgent(arms, ThompsonSampling(), random_seed=42)
pipeline = AgentPipeline(
steps=[("scale", scaler)],
final_agent=agent,
)
Access pipeline internals#
Both pipeline types support indexing by name or position:
pipeline["vectorize"] # by name
name, transformer = pipeline[0] # by position
pipeline.named_steps # {"vectorize": DictVectorizer(...)}
All agent methods (add_arm, remove_arm, select_for_update,
etc.) are forwarded through AgentPipeline unchanged.