bayesianbandits.BayesianGLM#
- class bayesianbandits.BayesianGLM(alpha: float = 1.0, *, link: Literal['logit', 'log'] = 'logit', learning_rate: float = 1.0, approximator: PosteriorApproximator | None = None, sparse: bool = False, random_state: int | None | Generator = None)#
Bases:
BaseEstimator,RegressorMixinBayesian Generalized Linear Model with Laplace approximation.
Extends Bayesian linear regression to non-Gaussian likelihoods (binary and count data) using a configurable posterior approximation method. The default uses Laplace approximation via iteratively reweighted least squares (IRLS). Supports both dense and sparse feature matrices, online learning via
partial_fit, and non-stationary environments viadecay.- Parameters:
alpha (
float, default1.0) – Prior precision for the weights. The prior is \(w \sim \mathcal{N}(0, \alpha^{-1} I)\). Higher values give stronger regularization toward zero.link (
{'logit', 'log'}, default'logit') –Link function relating the linear predictor to the mean of the response distribution:
'logit': Inverse logit (sigmoid). Use for binary outcomes (Bernoulli likelihood).'log': Exponential. Use for count outcomes (Poisson likelihood).
learning_rate (
float, default1.0) – Decay rate for the posterior precision on each call todecay. Values less than 1 geometrically shrink the precision matrix, increasing posterior uncertainty over time. This converts the model into a forgetting (non-stationary) estimator suitable for restless bandit problems.approximator (
PosteriorApproximator, defaultLaplaceApproximator()) – Strategy object for approximating the posterior. The defaultLaplaceApproximatorperforms 5 IRLS iterations per update. For fast online updates, useLaplaceApproximator(n_iter=1). For batch convergence, increasen_iterand set atol.sparse (
bool, defaultFalse) – If True, use sparse matrix operations for the precision matrix. InputXmust be ascipy.sparse.csc_array. When CHOLMOD is available (viascikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to SuperLU.random_state (
int,np.random.Generator, orNone, defaultNone) – Controls the random number generator forsample. Pass an int for reproducible results across calls.
- coef_#
Posterior mean of the weight vector (MAP estimate).
- Type:
ndarrayofshape (n_features,)
- cov_inv_#
Posterior precision matrix (inverse covariance). Dense when
sparse=False, sparse CSC whensparse=True.- Type:
ndarrayofshape (n_features,n_features)orscipy.sparse.csc_array
- n_features_#
Number of features seen during
fit.- Type:
int
See also
NormalRegressorBayesian linear regression with known noise variance.
NormalInverseGammaRegressorBayesian linear regression with unknown noise variance.
LaplaceApproximatorThe default posterior approximation strategy.
Notes
The model places a Gaussian prior on the weight vector:
\[w \sim \mathcal{N}(0, \alpha^{-1} I)\]For the logit link (Bernoulli likelihood):
\[p(y=1 \mid x) = \sigma(x^T w)\]For the log link (Poisson likelihood):
\[\mathbb{E}[y \mid x] = \exp(x^T w)\]Since these likelihoods are not conjugate to the Gaussian prior, the posterior is approximated using the Laplace approximation. This finds the MAP estimate \(\hat{w}\) and approximates the posterior as:
\[p(w \mid \mathcal{D}) \approx \mathcal{N}(\hat{w}, \Lambda^{-1})\]where \(\Lambda = \alpha I + X^T W X\) is the posterior precision and \(W\) is the diagonal matrix of IRLS weights. See Chapter 8 of [1] for details.
When
learning_rate < 1, callingdecayscales the precision matrix by \(\gamma^n\) where \(\gamma\) is the learning rate and \(n\) is the number of samples. This uniformly increases posterior uncertainty, allowing the model to adapt to non-stationary environments.References
Examples
Binary classification with logistic regression:
>>> from bayesianbandits import BayesianGLM >>> X = np.array([[1], [2], [3], [4]]) >>> y = np.array([0, 0, 1, 1]) # Binary outcomes >>> model = BayesianGLM(alpha=1.0, link='logit') >>> model.fit(X, y) BayesianGLM() >>> model.predict(X) # Returns probabilities array([0.56154064, 0.62124455, 0.67748791, 0.72902237])
Count regression with Poisson:
>>> y_counts = np.array([1, 2, 5, 8]) # Count data >>> model = BayesianGLM(alpha=1.0, link='log') >>> model.fit(X, y_counts) BayesianGLM(link='log') >>> model.predict(X) # Returns expected counts array([1.72636481, 2.98033545, 5.14514623, 8.88239939])
Poisson rate modeling with varying exposure (e.g., different observation periods). For Poisson with log link, fitting the rate
y / exposurewithsample_weight=exposureis mathematically equivalent to using an offset oflog(exposure)in the linear predictor:>>> exposure = np.array([1, 2, 5, 10]) >>> y_counts = np.array([2, 5, 12, 30]) >>> model = BayesianGLM(alpha=1.0, link='log') >>> model.fit(X, y_counts / exposure, sample_weight=exposure) BayesianGLM(link='log') >>> model.predict(X) * exposure # Scale predicted rates by exposure array([ 1.32755877, 3.52482459, 11.69852952, 31.06097099])
Online learning with fast single-iteration updates:
>>> from bayesianbandits import LaplaceApproximator >>> approximator = LaplaceApproximator(n_iter=1) # Fast online updates >>> model = BayesianGLM(alpha=1.0, link='logit', approximator=approximator) >>> stream = [(np.array([[1]]), [0]), (np.array([[2]]), [0]), ... (np.array([[3]]), [1]), (np.array([[4]]), [1])] >>> for X_batch, y_batch in stream: ... model = model.partial_fit(X_batch, y_batch) >>> model.predict(np.array([[1], [2], [3], [4]])) array([0.59667319, 0.68637901, 0.76402365, 0.82728258])
Batch convergence with tighter tolerance:
>>> approximator = LaplaceApproximator(n_iter=500, tol=1e-6) >>> model = BayesianGLM(alpha=0.1, approximator=approximator) >>> model = model.fit(X, y) # Iterates until convergence >>> model.predict(X) array([0.57027497, 0.63782727, 0.70034041, 0.75618799])
- __init__(alpha: float = 1.0, *, link: Literal['logit', 'log'] = 'logit', learning_rate: float = 1.0, approximator: PosteriorApproximator | None = None, sparse: bool = False, random_state: int | None | Generator = None) None#
- property cov_: Covariance | CholmodSparseFactor | SuperLUSparseFactor | ScaledSparseFactor#
Posterior covariance matrix (cached, lazily computed).
Returns a
scipy.stats.Covarianceobject (dense) or aSparseFactor(sparse) that wraps the Cholesky factorization of the covariance. Automatically invalidated when the model is updated viafit,partial_fit, ordecay.Warning
For dense models, this is an \(O(p^3)\) computation with \(O(p^2)\) memory. For high-dimensional problems, prefer
sparse=Trueor avoid accessing this property directly.
- decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#
Decay the posterior precision to increase uncertainty.
Scales the precision matrix by \(\gamma^n\), where \(\gamma\) is the decay rate and \(n\) is the number of rows in
X. This uniformly increases posterior variance while leaving the posterior mean unchanged, allowing the model to adapt to non-stationary environments.Has no effect if the model has not been fitted.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Used only for its number of rowsn_samples, which determines the exponent of the decay factor.decay_rate (
float, defaultNone) – Decay factor per sample. If None, uses the model’slearning_rate. Values less than 1 increase uncertainty; a value of 1 has no effect.
See also
partial_fitUpdate the model with new observations.
- fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#
Fit the model from scratch, resetting the prior.
Initializes the prior \(w \sim \mathcal{N}(0, \alpha^{-1} I)\) and computes the posterior using the configured approximation method. Any previously learned parameters are discarded.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Training data. Must be ascipy.sparse.csc_arraywhensparse=True.y (
array-likeofshape (n_samples,)) – Target values. Forlink='logit', values should be 0 or 1. Forlink='log', values should be non-negative counts.sample_weight (
array-likeofshape (n_samples,), defaultNone) – Individual weights for each sample. If None, all samples are given equal weight.
- Returns:
self – Fitted estimator.
- Return type:
See also
partial_fitIncremental update without resetting the prior.
- partial_fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#
Incrementally update the model with new data.
Uses the current posterior as the prior for the new update. If the model has not been fitted yet, this is equivalent to calling
fit.- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Training data. Must be ascipy.sparse.csc_arraywhensparse=True.y (
array-likeofshape (n_samples,)) – Target values. Forlink='logit', values should be 0 or 1. Forlink='log', values should be non-negative counts.sample_weight (
array-likeofshape (n_samples,), defaultNone) – Individual weights for each sample. If None, all samples are given equal weight.
- Returns:
self – Updated estimator.
- Return type:
See also
fitFit from scratch, resetting the prior.
- predict(X: NDArray[Any] | csc_array) NDArray[Any]#
Predict mean of the response distribution for each sample.
Computes the inverse link applied to the linear predictor \(g^{-1}(X \hat{w})\), where \(\hat{w}\) is the posterior mean.
If the model has not been fitted, the prior mean (zero) is used, returning 0.5 for logit link and 1.0 for log link.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Samples to predict. Must be ascipy.sparse.csc_arraywhensparse=True.- Returns:
y_pred – Predicted values. For
link='logit', probabilities in [0, 1]. Forlink='log', expected counts (positive reals).- Return type:
ndarrayofshape (n_samples,)
See also
sampleDraw from the posterior predictive distribution.
- sample(X: NDArray[Any] | csc_array, size: int = 1) NDArray[np.float64]#
Sample from the posterior predictive distribution.
Draws weight vectors from the posterior \(w \sim \mathcal{N}(\hat{w}, \Lambda^{-1})\) and computes \(g^{-1}(X w)\) for each draw. This marginalizes over parameter uncertainty, producing samples from the posterior predictive distribution.
If the model has not been fitted, samples are drawn from the prior predictive distribution.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Input data. Must be ascipy.sparse.csc_arraywhensparse=True.size (
int, default1) – Number of posterior samples to draw.
- Returns:
samples – Predicted values for each posterior sample. For
link='logit', probabilities in [0, 1]. Forlink='log', expected counts (positive reals).- Return type:
ndarrayofshape (size,n_samples)
See also
predictPoint predictions using the posterior mean.