bayesianbandits.NormalRegressor#
- class bayesianbandits.NormalRegressor(alpha: float, beta: float, *, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#
Bases:
BaseEstimator,RegressorMixinBayesian linear regression with known noise variance.
Places a Gaussian prior on the weight vector and performs exact conjugate updates. Supports both dense and sparse feature matrices, online learning via
partial_fit, and non-stationary environments viadecay.- Parameters:
alpha (
float) – Prior precision for the weights. The prior is \(w \sim \mathcal{N}(0, \alpha^{-1} I)\). Higher values give stronger regularization toward zero.beta (
float) – Known noise precision. The likelihood is \(y \mid x, w \sim \mathcal{N}(x^T w, \beta^{-1})\).learning_rate (
float, default1.0) – Decay rate for the posterior precision on each call todecay. Values less than 1 geometrically shrink the precision matrix, increasing posterior uncertainty over time. This converts the model into a forgetting estimator suitable for restless bandit problems.sparse (
bool, defaultFalse) – If True, use sparse matrix operations for the precision matrix. InputXmust be ascipy.sparse.csc_array. When CHOLMOD is available (viascikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to UMFPACK (scikit-umfpack) or SuperLU.random_state (
int,np.random.Generator, orNone, defaultNone) – Controls the random number generator forsample. Pass an int for reproducible results across calls.
- coef_#
Posterior mean of the weight vector.
- Type:
ndarrayofshape (n_features,)
- cov_inv_#
Posterior precision matrix (inverse covariance).
- Type:
ndarrayofshape (n_features,n_features)orscipy.sparse.csc_array
- n_features_#
Number of features seen during
fit.- Type:
int
See also
NormalInverseGammaRegressorBayesian linear regression with unknown noise variance (marginal posterior is a multivariate t).
EmpiricalBayesNormalRegressorAutomatic hyperparameter tuning via evidence maximization.
BayesianGLMBayesian GLM for non-Gaussian likelihoods.
Notes
This model implements the “known variance” Bayesian linear regression formulation described in Chapter 7 of [1]. The posterior is:
\[\Lambda_n = \gamma^n \Lambda_0 + \beta X^T W X, \qquad \mu_n = \Lambda_n^{-1} (\gamma^n \Lambda_0 \mu_0 + \beta X^T W y)\]where \(\gamma\) is the learning rate (1.0 for standard Bayesian update) and \(W\) is a diagonal matrix of effective sample weights incorporating both user-supplied weights and learning-rate decay.
When
learning_rate < 1, callingdecayscales the precision matrix by \(\gamma^n\), uniformly increasing posterior uncertainty while preserving the mean.References
Examples
Basic linear regression:
>>> import numpy as np >>> X = np.array([[1], [2], [3], [4], [5]]) >>> y = np.array([1, 2, 3, 4, 5]) >>> model = NormalRegressor(alpha=0.1, beta=1, random_state=0) >>> model.fit(X, y) NormalRegressor(alpha=0.1, beta=1, random_state=0) >>> model.predict(X) array([0.99818512, 1.99637024, 2.99455535, 3.99274047, 4.99092559])
The posterior mean weights are stored in
coef_:>>> model.coef_ array([0.99818512])
Online learning with
partial_fit:>>> model.partial_fit(X, y) NormalRegressor(alpha=0.1, beta=1, random_state=0) >>> model.predict(X) array([0.99909173, 1.99818347, 2.9972752 , 3.99636694, 4.99545867])
Sampling from the posterior predictive:
>>> model.sample(X) array([[1.0110742 , 2.02214839, 3.03322259, 4.04429678, 5.05537098]])
- __init__(alpha: float, beta: float, *, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None) None#
- property cov_: Covariance | CholmodSparseFactor | SuperLUSparseFactor | ScaledSparseFactor#
Posterior covariance matrix (cached, lazily computed).
Returns a
scipy.stats.Covarianceobject (dense) or aSparseFactor(sparse) wrapping the Cholesky factorization. Automatically invalidated when the model is updated viafit,partial_fit, ordecay.Warning
For dense models, this is an \(O(p^3)\) computation with \(O(p^2)\) memory.
- decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#
Decay the posterior precision to increase uncertainty.
Scales the precision matrix by \(\gamma^n\), where \(\gamma\) is the decay rate and \(n\) is the number of rows in
X. The posterior mean is unchanged.Has no effect if the model has not been fitted.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Used only for its number of rowsn_samples, which determines the exponent of the decay factor.decay_rate (
float, defaultNone) – Decay factor \(\gamma\) in (0, 1]. If None, usesself.learning_rate.
See also
partial_fitUpdate the model with new observations.
- fit(X_fit: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#
Fit the model from scratch, resetting the prior.
Initializes the prior \(w \sim \mathcal{N}(0, \alpha^{-1} I)\) and computes the exact posterior. Any previously learned parameters are discarded.
- Parameters:
X_fit (
array-likeofshape (n_samples,n_features)) – Training data. Must be ascipy.sparse.csc_arraywhensparse=True.y (
array-likeofshape (n_samples,)) – Target values.sample_weight (
array-likeofshape (n_samples,), defaultNone) – Individual weights for each sample. If None, all samples are given equal weight.
- Returns:
self – Fitted estimator.
- Return type:
See also
partial_fitIncremental update without resetting the prior.
- partial_fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#
Incrementally update the posterior with new data.
Uses the current posterior as the prior for the new update, decayed by
learning_rate. If the model has not been fitted, this is equivalent to callingfit.- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Training data. Must be ascipy.sparse.csc_arraywhensparse=True.y (
array-likeofshape (n_samples,)) – Target values.sample_weight (
array-likeofshape (n_samples,), defaultNone) – Individual weights for each sample. If None, all samples are given equal weight.
- Returns:
self – Updated estimator.
- Return type:
- predict(X: NDArray[Any] | csc_array) NDArray[Any]#
Predict target values using the posterior mean.
Computes \(X \hat{w}\) where \(\hat{w}\) is the posterior mean of the weight vector.
If the model has not been fitted, the prior mean (zero) is used, returning all zeros.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Input data. Must be ascipy.sparse.csc_arraywhensparse=True.- Returns:
y_pred – Predicted target values.
- Return type:
ndarrayofshape (n_samples,)
See also
sampleDraw from the posterior predictive distribution.
- sample(X: NDArray[Any] | csc_array, size: int = 1) NDArray[np.float64]#
Sample from the posterior predictive distribution.
Draws weight vectors from the posterior \(w \sim \mathcal{N}(\hat{w}, \Lambda^{-1})\) and computes \(X w\) for each draw. This marginalizes over parameter uncertainty but not observation noise.
If the model has not been fitted, samples are drawn from the prior predictive distribution.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Input data. Must be ascipy.sparse.csc_arraywhensparse=True.size (
int, default1) – Number of posterior samples to draw.
- Returns:
samples – Predicted values for each posterior draw.
- Return type:
ndarrayofshape (size,n_samples)
See also
predictPoint predictions using the posterior mean.