bayesianbandits.BayesianGLM#

class bayesianbandits.BayesianGLM(alpha: float = 1.0, *, link: Literal['logit', 'log'] = 'logit', learning_rate: float = 1.0, approximator: PosteriorApproximator | None = None, sparse: bool = False, random_state: int | None | Generator = None)#

Bases: BaseEstimator, RegressorMixin

Bayesian Generalized Linear Model with Laplace approximation.

Extends Bayesian linear regression to non-Gaussian likelihoods (binary and count data) using a configurable posterior approximation method. The default uses Laplace approximation via iteratively reweighted least squares (IRLS). Supports both dense and sparse feature matrices, online learning via partial_fit, and non-stationary environments via decay.

Parameters:
  • alpha (float, default 1.0) – Prior precision for the weights. The prior is \(w \sim \mathcal{N}(0, \alpha^{-1} I)\). Higher values give stronger regularization toward zero.

  • link ({'logit', 'log'}, default 'logit') –

    Link function relating the linear predictor to the mean of the response distribution:

    • 'logit': Inverse logit (sigmoid). Use for binary outcomes (Bernoulli likelihood).

    • 'log': Exponential. Use for count outcomes (Poisson likelihood).

  • learning_rate (float, default 1.0) – Decay rate for the posterior precision on each call to decay. Values less than 1 geometrically shrink the precision matrix, increasing posterior uncertainty over time. This converts the model into a forgetting (non-stationary) estimator suitable for restless bandit problems.

  • approximator (PosteriorApproximator, default LaplaceApproximator()) – Strategy object for approximating the posterior. The default LaplaceApproximator performs 5 IRLS iterations per update. For fast online updates, use LaplaceApproximator(n_iter=1). For batch convergence, increase n_iter and set a tol.

  • sparse (bool, default False) – If True, use sparse matrix operations for the precision matrix. Input X must be a scipy.sparse.csc_array. When CHOLMOD is available (via scikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to SuperLU.

  • random_state (int, np.random.Generator, or None, default None) – Controls the random number generator for sample. Pass an int for reproducible results across calls.

coef_#

Posterior mean of the weight vector (MAP estimate).

Type:

ndarray of shape (n_features,)

cov_inv_#

Posterior precision matrix (inverse covariance). Dense when sparse=False, sparse CSC when sparse=True.

Type:

ndarray of shape (n_features, n_features) or scipy.sparse.csc_array

n_features_#

Number of features seen during fit.

Type:

int

See also

NormalRegressor

Bayesian linear regression with known noise variance.

NormalInverseGammaRegressor

Bayesian linear regression with unknown noise variance.

LaplaceApproximator

The default posterior approximation strategy.

Notes

The model places a Gaussian prior on the weight vector:

\[w \sim \mathcal{N}(0, \alpha^{-1} I)\]

For the logit link (Bernoulli likelihood):

\[p(y=1 \mid x) = \sigma(x^T w)\]

For the log link (Poisson likelihood):

\[\mathbb{E}[y \mid x] = \exp(x^T w)\]

Since these likelihoods are not conjugate to the Gaussian prior, the posterior is approximated using the Laplace approximation. This finds the MAP estimate \(\hat{w}\) and approximates the posterior as:

\[p(w \mid \mathcal{D}) \approx \mathcal{N}(\hat{w}, \Lambda^{-1})\]

where \(\Lambda = \alpha I + X^T W X\) is the posterior precision and \(W\) is the diagonal matrix of IRLS weights. See Chapter 8 of [1] for details.

When learning_rate < 1, calling decay scales the precision matrix by \(\gamma^n\) where \(\gamma\) is the learning rate and \(n\) is the number of samples. This uniformly increases posterior uncertainty, allowing the model to adapt to non-stationary environments.

References

Examples

Binary classification with logistic regression:

>>> from bayesianbandits import BayesianGLM
>>> X = np.array([[1], [2], [3], [4]])
>>> y = np.array([0, 0, 1, 1])  # Binary outcomes
>>> model = BayesianGLM(alpha=1.0, link='logit')
>>> model.fit(X, y)
BayesianGLM()
>>> model.predict(X)  # Returns probabilities
array([0.56154064, 0.62124455, 0.67748791, 0.72902237])

Count regression with Poisson:

>>> y_counts = np.array([1, 2, 5, 8])  # Count data
>>> model = BayesianGLM(alpha=1.0, link='log')
>>> model.fit(X, y_counts)
BayesianGLM(link='log')
>>> model.predict(X)  # Returns expected counts
array([1.72636481, 2.98033545, 5.14514623, 8.88239939])

Poisson rate modeling with varying exposure (e.g., different observation periods). For Poisson with log link, fitting the rate y / exposure with sample_weight=exposure is mathematically equivalent to using an offset of log(exposure) in the linear predictor:

>>> exposure = np.array([1, 2, 5, 10])
>>> y_counts = np.array([2, 5, 12, 30])
>>> model = BayesianGLM(alpha=1.0, link='log')
>>> model.fit(X, y_counts / exposure, sample_weight=exposure)
BayesianGLM(link='log')
>>> model.predict(X) * exposure  # Scale predicted rates by exposure
array([ 1.32755877,  3.52482459, 11.69852952, 31.06097099])

Online learning with fast single-iteration updates:

>>> from bayesianbandits import LaplaceApproximator
>>> approximator = LaplaceApproximator(n_iter=1)  # Fast online updates
>>> model = BayesianGLM(alpha=1.0, link='logit', approximator=approximator)
>>> stream = [(np.array([[1]]), [0]), (np.array([[2]]), [0]),
...           (np.array([[3]]), [1]), (np.array([[4]]), [1])]
>>> for X_batch, y_batch in stream:
...     model = model.partial_fit(X_batch, y_batch)
>>> model.predict(np.array([[1], [2], [3], [4]]))
array([0.59667319, 0.68637901, 0.76402365, 0.82728258])

Batch convergence with tighter tolerance:

>>> approximator = LaplaceApproximator(n_iter=500, tol=1e-6)
>>> model = BayesianGLM(alpha=0.1, approximator=approximator)
>>> model = model.fit(X, y)  # Iterates until convergence
>>> model.predict(X)
array([0.57027497, 0.63782727, 0.70034041, 0.75618799])
__init__(alpha: float = 1.0, *, link: Literal['logit', 'log'] = 'logit', learning_rate: float = 1.0, approximator: PosteriorApproximator | None = None, sparse: bool = False, random_state: int | None | Generator = None) None#
property cov_: Covariance | CholmodSparseFactor | SuperLUSparseFactor | ScaledSparseFactor#

Posterior covariance matrix (cached, lazily computed).

Returns a scipy.stats.Covariance object (dense) or a SparseFactor (sparse) that wraps the Cholesky factorization of the covariance. Automatically invalidated when the model is updated via fit, partial_fit, or decay.

Warning

For dense models, this is an \(O(p^3)\) computation with \(O(p^2)\) memory. For high-dimensional problems, prefer sparse=True or avoid accessing this property directly.

decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#

Decay the posterior precision to increase uncertainty.

Scales the precision matrix by \(\gamma^n\), where \(\gamma\) is the decay rate and \(n\) is the number of rows in X. This uniformly increases posterior variance while leaving the posterior mean unchanged, allowing the model to adapt to non-stationary environments.

Has no effect if the model has not been fitted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Used only for its number of rows n_samples, which determines the exponent of the decay factor.

  • decay_rate (float, default None) – Decay factor per sample. If None, uses the model’s learning_rate. Values less than 1 increase uncertainty; a value of 1 has no effect.

See also

partial_fit

Update the model with new observations.

fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#

Fit the model from scratch, resetting the prior.

Initializes the prior \(w \sim \mathcal{N}(0, \alpha^{-1} I)\) and computes the posterior using the configured approximation method. Any previously learned parameters are discarded.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data. Must be a scipy.sparse.csc_array when sparse=True.

  • y (array-like of shape (n_samples,)) – Target values. For link='logit', values should be 0 or 1. For link='log', values should be non-negative counts.

  • sample_weight (array-like of shape (n_samples,), default None) – Individual weights for each sample. If None, all samples are given equal weight.

Returns:

self – Fitted estimator.

Return type:

BayesianGLM

See also

partial_fit

Incremental update without resetting the prior.

partial_fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#

Incrementally update the model with new data.

Uses the current posterior as the prior for the new update. If the model has not been fitted yet, this is equivalent to calling fit.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data. Must be a scipy.sparse.csc_array when sparse=True.

  • y (array-like of shape (n_samples,)) – Target values. For link='logit', values should be 0 or 1. For link='log', values should be non-negative counts.

  • sample_weight (array-like of shape (n_samples,), default None) – Individual weights for each sample. If None, all samples are given equal weight.

Returns:

self – Updated estimator.

Return type:

BayesianGLM

See also

fit

Fit from scratch, resetting the prior.

predict(X: NDArray[Any] | csc_array) NDArray[Any]#

Predict mean of the response distribution for each sample.

Computes the inverse link applied to the linear predictor \(g^{-1}(X \hat{w})\), where \(\hat{w}\) is the posterior mean.

If the model has not been fitted, the prior mean (zero) is used, returning 0.5 for logit link and 1.0 for log link.

Parameters:

X (array-like of shape (n_samples, n_features)) – Samples to predict. Must be a scipy.sparse.csc_array when sparse=True.

Returns:

y_pred – Predicted values. For link='logit', probabilities in [0, 1]. For link='log', expected counts (positive reals).

Return type:

ndarray of shape (n_samples,)

See also

sample

Draw from the posterior predictive distribution.

sample(X: NDArray[Any] | csc_array, size: int = 1) NDArray[np.float64]#

Sample from the posterior predictive distribution.

Draws weight vectors from the posterior \(w \sim \mathcal{N}(\hat{w}, \Lambda^{-1})\) and computes \(g^{-1}(X w)\) for each draw. This marginalizes over parameter uncertainty, producing samples from the posterior predictive distribution.

If the model has not been fitted, samples are drawn from the prior predictive distribution.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data. Must be a scipy.sparse.csc_array when sparse=True.

  • size (int, default 1) – Number of posterior samples to draw.

Returns:

samples – Predicted values for each posterior sample. For link='logit', probabilities in [0, 1]. For link='log', expected counts (positive reals).

Return type:

ndarray of shape (size, n_samples)

See also

predict

Point predictions using the posterior mean.