bayesianbandits.NormalRegressor#

class bayesianbandits.NormalRegressor(alpha: float, beta: float, *, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#

Bases: BaseEstimator, RegressorMixin

Bayesian linear regression with known noise variance.

Places a Gaussian prior on the weight vector and performs exact conjugate updates. Supports both dense and sparse feature matrices, online learning via partial_fit, and non-stationary environments via decay.

Parameters:
  • alpha (float) – Prior precision for the weights. The prior is \(w \sim \mathcal{N}(0, \alpha^{-1} I)\). Higher values give stronger regularization toward zero.

  • beta (float) – Known noise precision. The likelihood is \(y \mid x, w \sim \mathcal{N}(x^T w, \beta^{-1})\).

  • learning_rate (float, default 1.0) – Decay rate for the posterior precision on each call to decay. Values less than 1 geometrically shrink the precision matrix, increasing posterior uncertainty over time. This converts the model into a forgetting estimator suitable for restless bandit problems.

  • sparse (bool, default False) – If True, use sparse matrix operations for the precision matrix. Input X must be a scipy.sparse.csc_array. When CHOLMOD is available (via scikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to UMFPACK (scikit-umfpack) or SuperLU.

  • random_state (int, np.random.Generator, or None, default None) – Controls the random number generator for sample. Pass an int for reproducible results across calls.

coef_#

Posterior mean of the weight vector.

Type:

ndarray of shape (n_features,)

cov_inv_#

Posterior precision matrix (inverse covariance).

Type:

ndarray of shape (n_features, n_features) or scipy.sparse.csc_array

n_features_#

Number of features seen during fit.

Type:

int

See also

NormalInverseGammaRegressor

Bayesian linear regression with unknown noise variance (marginal posterior is a multivariate t).

EmpiricalBayesNormalRegressor

Automatic hyperparameter tuning via evidence maximization.

BayesianGLM

Bayesian GLM for non-Gaussian likelihoods.

Notes

This model implements the “known variance” Bayesian linear regression formulation described in Chapter 7 of [1]. The posterior is:

\[\Lambda_n = \gamma^n \Lambda_0 + \beta X^T W X, \qquad \mu_n = \Lambda_n^{-1} (\gamma^n \Lambda_0 \mu_0 + \beta X^T W y)\]

where \(\gamma\) is the learning rate (1.0 for standard Bayesian update) and \(W\) is a diagonal matrix of effective sample weights incorporating both user-supplied weights and learning-rate decay.

When learning_rate < 1, calling decay scales the precision matrix by \(\gamma^n\), uniformly increasing posterior uncertainty while preserving the mean.

References

Examples

Basic linear regression:

>>> import numpy as np
>>> X = np.array([[1], [2], [3], [4], [5]])
>>> y = np.array([1, 2, 3, 4, 5])
>>> model = NormalRegressor(alpha=0.1, beta=1, random_state=0)
>>> model.fit(X, y)
NormalRegressor(alpha=0.1, beta=1, random_state=0)
>>> model.predict(X)
array([0.99818512, 1.99637024, 2.99455535, 3.99274047, 4.99092559])

The posterior mean weights are stored in coef_:

>>> model.coef_
array([0.99818512])

Online learning with partial_fit:

>>> model.partial_fit(X, y)
NormalRegressor(alpha=0.1, beta=1, random_state=0)
>>> model.predict(X)
array([0.99909173, 1.99818347, 2.9972752 , 3.99636694, 4.99545867])

Sampling from the posterior predictive:

>>> model.sample(X)
array([[1.0110742 , 2.02214839, 3.03322259, 4.04429678, 5.05537098]])
__init__(alpha: float, beta: float, *, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None) None#
property cov_: Covariance | CholmodSparseFactor | SuperLUSparseFactor | ScaledSparseFactor#

Posterior covariance matrix (cached, lazily computed).

Returns a scipy.stats.Covariance object (dense) or a SparseFactor (sparse) wrapping the Cholesky factorization. Automatically invalidated when the model is updated via fit, partial_fit, or decay.

Warning

For dense models, this is an \(O(p^3)\) computation with \(O(p^2)\) memory.

decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#

Decay the posterior precision to increase uncertainty.

Scales the precision matrix by \(\gamma^n\), where \(\gamma\) is the decay rate and \(n\) is the number of rows in X. The posterior mean is unchanged.

Has no effect if the model has not been fitted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Used only for its number of rows n_samples, which determines the exponent of the decay factor.

  • decay_rate (float, default None) – Decay factor \(\gamma\) in (0, 1]. If None, uses self.learning_rate.

See also

partial_fit

Update the model with new observations.

fit(X_fit: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#

Fit the model from scratch, resetting the prior.

Initializes the prior \(w \sim \mathcal{N}(0, \alpha^{-1} I)\) and computes the exact posterior. Any previously learned parameters are discarded.

Parameters:
  • X_fit (array-like of shape (n_samples, n_features)) – Training data. Must be a scipy.sparse.csc_array when sparse=True.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), default None) – Individual weights for each sample. If None, all samples are given equal weight.

Returns:

self – Fitted estimator.

Return type:

NormalRegressor

See also

partial_fit

Incremental update without resetting the prior.

partial_fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#

Incrementally update the posterior with new data.

Uses the current posterior as the prior for the new update, decayed by learning_rate. If the model has not been fitted, this is equivalent to calling fit.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data. Must be a scipy.sparse.csc_array when sparse=True.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), default None) – Individual weights for each sample. If None, all samples are given equal weight.

Returns:

self – Updated estimator.

Return type:

NormalRegressor

See also

fit

Fit from scratch, resetting the prior.

decay

Increase uncertainty without observing new data.

predict(X: NDArray[Any] | csc_array) NDArray[Any]#

Predict target values using the posterior mean.

Computes \(X \hat{w}\) where \(\hat{w}\) is the posterior mean of the weight vector.

If the model has not been fitted, the prior mean (zero) is used, returning all zeros.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input data. Must be a scipy.sparse.csc_array when sparse=True.

Returns:

y_pred – Predicted target values.

Return type:

ndarray of shape (n_samples,)

See also

sample

Draw from the posterior predictive distribution.

sample(X: NDArray[Any] | csc_array, size: int = 1) NDArray[np.float64]#

Sample from the posterior predictive distribution.

Draws weight vectors from the posterior \(w \sim \mathcal{N}(\hat{w}, \Lambda^{-1})\) and computes \(X w\) for each draw. This marginalizes over parameter uncertainty but not observation noise.

If the model has not been fitted, samples are drawn from the prior predictive distribution.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data. Must be a scipy.sparse.csc_array when sparse=True.

  • size (int, default 1) – Number of posterior samples to draw.

Returns:

samples – Predicted values for each posterior draw.

Return type:

ndarray of shape (size, n_samples)

See also

predict

Point predictions using the posterior mean.