bayesianbandits.NormalInverseGammaRegressor#
- class bayesianbandits.NormalInverseGammaRegressor(*, mu: np.typing.ArrayLike = 0.0, lam: np.typing.ArrayLike | csc_array = 1.0, a: float = 0.1, b: float = 0.1, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#
Bases:
NormalRegressorBayesian linear regression with unknown noise variance.
Extends
NormalRegressorby placing a conjugate Normal-Inverse-Gamma (NIG) prior on the weights and noise variance jointly. Because the noise variance is integrated out analytically, the marginal posterior over the weights is a multivariate t distribution, producing heavier-tailed and more robust uncertainty estimates than the known-variance model.- Parameters:
mu (
floatorarray-likeofshape (n_features,), default0.0) – Prior mean of the weights. A scalar is broadcast to all features.lam (
float,array-likeofshape (n_features,), orarray-likeofshape (n_features,n_features), default1.0) – Prior precision (inverse covariance) of the weights. A scalar gives \(\lambda I\); a vector gives \(\text{diag}(\lambda)\); a matrix is used directly.a (
float, default0.1) – Prior shape parameter of the Inverse-Gamma distribution on the noise variance \(\sigma^2\). The prior is \(\sigma^2 \sim \text{IG}(a, b)\).b (
float, default0.1) – Prior rate parameter of the Inverse-Gamma distribution. The prior mean of the noise variance is \(b / (a - 1)\) for \(a > 1\).learning_rate (
float, default1.0) – Decay rate for sequential updates. Values less than 1 geometrically shrink the precision and Inverse-Gamma parameters on each call todecay, enabling adaptation to non-stationary environments.sparse (
bool, defaultFalse) – If True, use sparse matrix operations for the precision matrix. InputXmust be ascipy.sparse.csc_array. When CHOLMOD is available (viascikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to UMFPACK or SuperLU.random_state (
int,np.random.Generator, orNone, defaultNone) – Controls the random number generator forsample. Pass an int for reproducible results across calls.
- coef_#
Posterior mean of the weight vector.
- Type:
ndarrayofshape (n_features,)
- cov_inv_#
Posterior precision matrix of the weights (conditioned on \(\sigma^2\)).
- Type:
ndarrayofshape (n_features,n_features)orscipy.sparse.csc_array
- a_#
Posterior shape parameter of the Inverse-Gamma distribution.
- Type:
float
- b_#
Posterior rate parameter of the Inverse-Gamma distribution.
- Type:
float
- n_features_#
Number of features seen during
fit.- Type:
int
See also
NormalRegressorKnown-variance variant (Gaussian posterior on weights).
EmpiricalBayesNormalRegressorKnown-variance with empirical Bayes tuning of
alphaandbeta.BayesianGLMBayesian GLM for non-Gaussian likelihoods.
Notes
This model implements the “unknown variance” Bayesian linear regression formulation described in Chapter 7 of [1]. The joint prior is:
\[w \mid \sigma^2 \sim \mathcal{N}(\mu_0,\; \sigma^2 \Lambda_0^{-1}), \qquad \sigma^2 \sim \text{IG}(a_0, b_0)\]After observing data \((X, y)\), the posterior parameters are updated as:
\[\begin{split}\Lambda_n &= \Lambda_0 + X^T X \\ \mu_n &= \Lambda_n^{-1}(\Lambda_0 \mu_0 + X^T y) \\ a_n &= a_0 + \tfrac{N}{2} \\ b_n &= b_0 + \tfrac{1}{2}(y^T y + \mu_0^T \Lambda_0 \mu_0 - \mu_n^T \Lambda_n \mu_n)\end{split}\]The marginal posterior of the weights (integrating out \(\sigma^2\)) is a multivariate t distribution with \(2 a_n\) degrees of freedom, location \(\mu_n\), and shape \((b_n / a_n) \Lambda_n^{-1}\).
References
Examples
Batch fitting:
>>> from sklearn.datasets import make_regression >>> X, y, coef = make_regression(n_samples=30, n_features=2, ... coef=True, random_state=1) >>> coef array([34.8898342, 75.0942434])
>>> est = NormalInverseGammaRegressor() >>> est.fit(X, y) NormalInverseGammaRegressor() >>> est.coef_ array([32.89089478, 71.16073032])
Online learning with
partial_fit:>>> est = NormalInverseGammaRegressor(random_state=1) >>> for x_, y_ in zip(X, y): ... est = est.partial_fit(x_.reshape(1, -1), np.array([y_])) >>> est.coef_ array([32.89089478, 71.16073032])
Sampling from the marginal posterior predictive (multivariate t):
>>> est.sample(X[[0]], size=5) array([[15.01030526], [14.64281737], [15.21457505], [14.1703107 ], [14.57089036]])
- __init__(*, mu: np.typing.ArrayLike = 0.0, lam: np.typing.ArrayLike | csc_array = 1.0, a: float = 0.1, b: float = 0.1, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#
- decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#
Decay precision and variance parameters to increase uncertainty.
Applies exponential forgetting to the precision matrix and the Inverse-Gamma parameters:
\[\Lambda \leftarrow \gamma^n \Lambda, \quad a \leftarrow \gamma^n a, \quad b \leftarrow \gamma^n b\]The posterior mean is unchanged, but the marginal t distribution widens (fewer degrees of freedom and higher scale), reflecting greater uncertainty.
Has no effect if the model has not been fitted.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Used only for its number of rowsn_samples, which determines the exponent of the decay factor.decay_rate (
float, defaultNone) – Decay factor \(\gamma\) in (0, 1]. If None, usesself.learning_rate.
See also
partial_fitUpdate the model with new observations.
- sample(X: NDArray[Any] | csc_array, size: int = 1) NDArray[np.float64]#
Sample predicted values from the marginal posterior predictive.
Draws weight vectors from the marginal posterior, which is a multivariate t distribution with \(2 a_n\) degrees of freedom, location \(\mu_n\), and shape \((b_n / a_n) \Lambda_n^{-1}\). The noise variance is integrated out analytically, producing heavier tails than the Gaussian posterior of
NormalRegressor.If the model has not been fitted, samples are drawn from the prior predictive distribution.
- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Input data. Must be ascipy.sparse.csc_arraywhensparse=True.size (
int, default1) – Number of posterior samples to draw.
- Returns:
samples – Predicted values for each posterior draw.
- Return type:
ndarrayofshape (size,n_samples)
See also
predictPoint predictions using the posterior mean.
- property shape_: CholmodSparseFactor | SuperLUSparseFactor | ScaledSparseFactor | DenseFactor#
Precision of the shape matrix for the multivariate t posterior.
The shape covariance is (b/a)·Λ⁻¹, so the shape precision is (a/b)·Λ. For dense models this is represented as a DenseFactor with U_scaled = √(a/b)·U (zero extra factorizations); for sparse models it wraps the existing SparseFactor via
scale_factor.