bayesianbandits.NormalInverseGammaRegressor#

class bayesianbandits.NormalInverseGammaRegressor(*, mu: np.typing.ArrayLike = 0.0, lam: np.typing.ArrayLike | csc_array = 1.0, a: float = 0.1, b: float = 0.1, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#

Bases: NormalRegressor

Bayesian linear regression with unknown noise variance.

Extends NormalRegressor by placing a conjugate Normal-Inverse-Gamma (NIG) prior on the weights and noise variance jointly. Because the noise variance is integrated out analytically, the marginal posterior over the weights is a multivariate t distribution, producing heavier-tailed and more robust uncertainty estimates than the known-variance model.

Parameters:
  • mu (float or array-like of shape (n_features,), default 0.0) – Prior mean of the weights. A scalar is broadcast to all features.

  • lam (float, array-like of shape (n_features,), or array-like of shape (n_features, n_features), default 1.0) – Prior precision (inverse covariance) of the weights. A scalar gives \(\lambda I\); a vector gives \(\text{diag}(\lambda)\); a matrix is used directly.

  • a (float, default 0.1) – Prior shape parameter of the Inverse-Gamma distribution on the noise variance \(\sigma^2\). The prior is \(\sigma^2 \sim \text{IG}(a, b)\).

  • b (float, default 0.1) – Prior rate parameter of the Inverse-Gamma distribution. The prior mean of the noise variance is \(b / (a - 1)\) for \(a > 1\).

  • learning_rate (float, default 1.0) – Decay rate for sequential updates. Values less than 1 geometrically shrink the precision and Inverse-Gamma parameters on each call to decay, enabling adaptation to non-stationary environments.

  • sparse (bool, default False) – If True, use sparse matrix operations for the precision matrix. Input X must be a scipy.sparse.csc_array. When CHOLMOD is available (via scikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to UMFPACK or SuperLU.

  • random_state (int, np.random.Generator, or None, default None) – Controls the random number generator for sample. Pass an int for reproducible results across calls.

coef_#

Posterior mean of the weight vector.

Type:

ndarray of shape (n_features,)

cov_inv_#

Posterior precision matrix of the weights (conditioned on \(\sigma^2\)).

Type:

ndarray of shape (n_features, n_features) or scipy.sparse.csc_array

a_#

Posterior shape parameter of the Inverse-Gamma distribution.

Type:

float

b_#

Posterior rate parameter of the Inverse-Gamma distribution.

Type:

float

n_features_#

Number of features seen during fit.

Type:

int

See also

NormalRegressor

Known-variance variant (Gaussian posterior on weights).

EmpiricalBayesNormalRegressor

Known-variance with empirical Bayes tuning of alpha and beta.

BayesianGLM

Bayesian GLM for non-Gaussian likelihoods.

Notes

This model implements the “unknown variance” Bayesian linear regression formulation described in Chapter 7 of [1]. The joint prior is:

\[w \mid \sigma^2 \sim \mathcal{N}(\mu_0,\; \sigma^2 \Lambda_0^{-1}), \qquad \sigma^2 \sim \text{IG}(a_0, b_0)\]

After observing data \((X, y)\), the posterior parameters are updated as:

\[\begin{split}\Lambda_n &= \Lambda_0 + X^T X \\ \mu_n &= \Lambda_n^{-1}(\Lambda_0 \mu_0 + X^T y) \\ a_n &= a_0 + \tfrac{N}{2} \\ b_n &= b_0 + \tfrac{1}{2}(y^T y + \mu_0^T \Lambda_0 \mu_0 - \mu_n^T \Lambda_n \mu_n)\end{split}\]

The marginal posterior of the weights (integrating out \(\sigma^2\)) is a multivariate t distribution with \(2 a_n\) degrees of freedom, location \(\mu_n\), and shape \((b_n / a_n) \Lambda_n^{-1}\).

References

Examples

Batch fitting:

>>> from sklearn.datasets import make_regression
>>> X, y, coef = make_regression(n_samples=30, n_features=2,
...                              coef=True, random_state=1)
>>> coef
array([34.8898342, 75.0942434])
>>> est = NormalInverseGammaRegressor()
>>> est.fit(X, y)
NormalInverseGammaRegressor()
>>> est.coef_
array([32.89089478, 71.16073032])

Online learning with partial_fit:

>>> est = NormalInverseGammaRegressor(random_state=1)
>>> for x_, y_ in zip(X, y):
...     est = est.partial_fit(x_.reshape(1, -1), np.array([y_]))
>>> est.coef_
array([32.89089478, 71.16073032])

Sampling from the marginal posterior predictive (multivariate t):

>>> est.sample(X[[0]], size=5)
array([[15.01030526],
       [14.64281737],
       [15.21457505],
       [14.1703107 ],
       [14.57089036]])
__init__(*, mu: np.typing.ArrayLike = 0.0, lam: np.typing.ArrayLike | csc_array = 1.0, a: float = 0.1, b: float = 0.1, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None)#
decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#

Decay precision and variance parameters to increase uncertainty.

Applies exponential forgetting to the precision matrix and the Inverse-Gamma parameters:

\[\Lambda \leftarrow \gamma^n \Lambda, \quad a \leftarrow \gamma^n a, \quad b \leftarrow \gamma^n b\]

The posterior mean is unchanged, but the marginal t distribution widens (fewer degrees of freedom and higher scale), reflecting greater uncertainty.

Has no effect if the model has not been fitted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Used only for its number of rows n_samples, which determines the exponent of the decay factor.

  • decay_rate (float, default None) – Decay factor \(\gamma\) in (0, 1]. If None, uses self.learning_rate.

See also

partial_fit

Update the model with new observations.

sample(X: NDArray[Any] | csc_array, size: int = 1) NDArray[np.float64]#

Sample predicted values from the marginal posterior predictive.

Draws weight vectors from the marginal posterior, which is a multivariate t distribution with \(2 a_n\) degrees of freedom, location \(\mu_n\), and shape \((b_n / a_n) \Lambda_n^{-1}\). The noise variance is integrated out analytically, producing heavier tails than the Gaussian posterior of NormalRegressor.

If the model has not been fitted, samples are drawn from the prior predictive distribution.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input data. Must be a scipy.sparse.csc_array when sparse=True.

  • size (int, default 1) – Number of posterior samples to draw.

Returns:

samples – Predicted values for each posterior draw.

Return type:

ndarray of shape (size, n_samples)

See also

predict

Point predictions using the posterior mean.

property shape_: CholmodSparseFactor | SuperLUSparseFactor | ScaledSparseFactor | DenseFactor#

Precision of the shape matrix for the multivariate t posterior.

The shape covariance is (b/a)·Λ⁻¹, so the shape precision is (a/b)·Λ. For dense models this is represented as a DenseFactor with U_scaled = √(a/b)·U (zero extra factorizations); for sparse models it wraps the existing SparseFactor via scale_factor.