bayesianbandits.NormalInverseGammaRegressor#

class bayesianbandits.NormalInverseGammaRegressor(*, mu: np.typing.ArrayLike = 0.0, lam: np.typing.ArrayLike | csc_array = 1.0, a: float = 0.1, b: float = 0.1, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#

Bases: NormalRegressor

Bayesian linear regression with unknown noise variance.

Extends NormalRegressor by placing a conjugate Normal-Inverse-Gamma (NIG) prior on the weights and noise variance jointly. Because the noise variance is integrated out analytically, the marginal posterior over the weights is a multivariate t distribution, producing heavier-tailed and more robust uncertainty estimates than the known-variance model.

Parameters:

mu (float or array-like of shape (n_features,), default 0.0) – Prior mean of the weights. A scalar is broadcast to all features.
lam (float, array-like of shape (n_features,), or array-like of shape (n_features, n_features), default 1.0) – Prior precision (inverse covariance) of the weights. A scalar gives \(\lambda I\); a vector gives \(\text{diag}(\lambda)\); a matrix is used directly.
a (float, default 0.1) – Prior shape parameter of the Inverse-Gamma distribution on the noise variance \(\sigma^2\). The prior is \(\sigma^2 \sim \text{IG}(a, b)\).
b (float, default 0.1) – Prior rate parameter of the Inverse-Gamma distribution. The prior mean of the noise variance is \(b / (a - 1)\) for \(a > 1\).
learning_rate (float, default 1.0) – Decay rate for sequential updates. Values less than 1 geometrically shrink the precision and Inverse-Gamma parameters on each call to decay, enabling adaptation to non-stationary environments.
sparse (bool, default False) – If True, use sparse matrix operations for the precision matrix. Input X must be a scipy.sparse.csc_array. When CHOLMOD is available (via scikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to UMFPACK or SuperLU.
random_state (int, np.random.Generator, or None, default None) – Controls the random number generator for sample. Pass an int for reproducible results across calls.

coef_#

Posterior mean of the weight vector.

Type:: ndarray of shape (n_features,)

cov_inv_#

Posterior precision matrix of the weights (conditioned on \(\sigma^2\)).

Type:: ndarray of shape (n_features, n_features) or scipy.sparse.csc_array

a_#

Posterior shape parameter of the Inverse-Gamma distribution.

Type:: float

b_#

Posterior rate parameter of the Inverse-Gamma distribution.

Type:: float

n_features_#

Number of features seen during fit.

Type:: int

See also

NormalRegressor: Known-variance variant (Gaussian posterior on weights).
EmpiricalBayesNormalRegressor: Known-variance with empirical Bayes tuning of alpha and beta.
BayesianGLM: Bayesian GLM for non-Gaussian likelihoods.

Notes

This model implements the “unknown variance” Bayesian linear regression formulation described in Chapter 7 of [1]. The joint prior is:

\[w \mid \sigma^2 \sim \mathcal{N}(\mu_0,\; \sigma^2 \Lambda_0^{-1}), \qquad \sigma^2 \sim \text{IG}(a_0, b_0)\]

After observing data \((X, y)\), the posterior parameters are updated as:

\[\begin{split}\Lambda_n &= \Lambda_0 + X^T X \\ \mu_n &= \Lambda_n^{-1}(\Lambda_0 \mu_0 + X^T y) \\ a_n &= a_0 + \tfrac{N}{2} \\ b_n &= b_0 + \tfrac{1}{2}(y^T y + \mu_0^T \Lambda_0 \mu_0 - \mu_n^T \Lambda_n \mu_n)\end{split}\]

The marginal posterior of the weights (integrating out \(\sigma^2\)) is a multivariate t distribution with \(2 a_n\) degrees of freedom, location \(\mu_n\), and shape \((b_n / a_n) \Lambda_n^{-1}\).

References

Examples

Batch fitting:

>>> from sklearn.datasets import make_regression
>>> X, y, coef = make_regression(n_samples=30, n_features=2,
...                              coef=True, random_state=1)
>>> coef
array([34.8898342, 75.0942434])

>>> est = NormalInverseGammaRegressor()
>>> est.fit(X, y)
NormalInverseGammaRegressor()
>>> est.coef_
array([32.89089478, 71.16073032])

Online learning with partial_fit:

>>> est = NormalInverseGammaRegressor(random_state=1)
>>> for x_, y_ in zip(X, y):
...     est = est.partial_fit(x_.reshape(1, -1), np.array([y_]))
>>> est.coef_
array([32.89089478, 71.16073032])

Sampling from the marginal posterior predictive (multivariate t):

>>> est.sample(X[[0]], size=5)
array([[15.01030526],
       [14.64281737],
       [15.21457505],
       [14.1703107 ],
       [14.57089036]])

__init__(*, mu: np.typing.ArrayLike = 0.0, lam: np.typing.ArrayLike | csc_array = 1.0, a: float = 0.1, b: float = 0.1, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#

decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) → None#

Decay precision and variance parameters to increase uncertainty.

Applies exponential forgetting to the precision matrix and the Inverse-Gamma parameters:

\[\Lambda \leftarrow \gamma^n \Lambda, \quad a \leftarrow \gamma^n a, \quad b \leftarrow \gamma^n b\]

The posterior mean is unchanged, but the marginal t distribution widens (fewer degrees of freedom and higher scale), reflecting greater uncertainty.

Has no effect if the model has not been fitted.

Parameters:

X (array-like of shape (n_samples, n_features)) – Used only for its number of rows n_samples, which determines the exponent of the decay factor.
decay_rate (float, default None) – Decay factor \(\gamma\) in (0, 1]. If None, uses self.learning_rate.

bayesianbandits.NormalInverseGammaRegressor#

This Page