bayesianbandits.EmpiricalBayesNormalRegressor#

class bayesianbandits.EmpiricalBayesNormalRegressor(alpha: float = 1.0, beta: float = 1.0, *, n_eb_iter: int = 10, eb_tol: float = 0.0001, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None, trace_method: str = 'auto')#

Bases: NormalRegressor

Bayesian linear regression with empirical Bayes hyperparameter tuning.

Extends NormalRegressor with automatic optimization of the prior precision (alpha) and noise precision (beta) via MacKay’s evidence maximization framework [1]. During fit, the hyperparameters are iteratively updated to maximize the log marginal likelihood. During partial_fit, a single online MacKay step is performed using accumulated sufficient statistics.

When learning_rate < 1, exponential forgetting is applied to the precision matrix. To prevent the prior contribution from collapsing to zero under repeated decay, stabilized forgetting [2] re-injects a fixed prior floor after each decay step.

Parameters:
  • alpha (float, default 1.0) – Initial prior precision of the weights. The prior is \(w \sim \mathcal{N}(0, \alpha^{-1} I)\). Updated automatically during fitting.

  • beta (float, default 1.0) – Initial noise precision. The likelihood is \(y \mid x, w \sim \mathcal{N}(x^T w, \beta^{-1})\). Updated automatically during fitting.

  • n_eb_iter (int, default 10) – Maximum number of empirical Bayes iterations during fit. Each iteration re-fits the posterior and runs one MacKay update. Set to 0 to disable EB tuning during fit.

  • eb_tol (float, default 1e-4) – Convergence tolerance on the change in log marginal likelihood between successive EB iterations.

  • learning_rate (float, default 1.0) – Decay rate for sequential updates. Values less than 1 geometrically shrink the precision matrix on each call to decay or partial_fit, enabling adaptation to non-stationary environments.

  • sparse (bool, default False) – If True, use sparse matrix operations for the precision matrix. Input X must be a scipy.sparse.csc_array. When CHOLMOD is available (via scikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to SuperLU.

  • random_state (int, np.random.Generator, or None, default None) – Controls the random number generator for sample. Pass an int for reproducible results across calls.

  • trace_method ({'auto', 'diagonal'}, default 'auto') –

    Method for computing \(\operatorname{tr}(\Lambda^{-1})\) in the MacKay update for alpha:

    • 'auto': Uses Takahashi recursion (exact, exploits sparsity) for sparse matrices, Cholesky for dense.

    • 'diagonal': Fast \(O(p)\) approximation \(\operatorname{tr}(\Lambda^{-1}) \approx \sum_i 1/\Lambda_{ii}\).

log_evidence_#

Log marginal likelihood at convergence (after fit), or -inf if n_eb_iter=0.

Type:

float

n_eb_iterations_#

Number of EB iterations performed during the last fit.

Type:

int

eb_converged_#

Whether the EB loop converged within eb_tol during the last fit.

Type:

bool

See also

NormalRegressor

Base estimator without empirical Bayes tuning.

BayesianGLM

Bayesian GLM for non-Gaussian likelihoods (logistic, Poisson).

Notes

Evidence maximization (MacKay’s update rules)

Given the posterior precision \(\Lambda = \alpha I + \beta X^T X\) and MAP estimate \(\hat{w}\), the effective number of well-determined parameters is:

\[\gamma = p - \alpha \operatorname{tr}(\Lambda^{-1})\]

The hyperparameters are updated as [1]:

\[\alpha_{\text{new}} = \frac{\gamma}{\hat{w}^T \hat{w}}, \qquad \beta_{\text{new}} = \frac{N - \gamma}{ \| y - X \hat{w} \|^2}\]

Stabilized forgetting

Under exponential decay with rate \(\gamma < 1\), the prior contribution to the precision diagonal decays as \(\gamma^t \alpha \to 0\). Stabilized forgetting [2] re-injects \((1 - \gamma^n) \alpha\) after each decay step, so the prior contribution converges to \(\alpha\) instead of zero:

\[s_{t+n} = \gamma^n s_t + (1 - \gamma^n) \alpha\]

where \(s_t\) tracks the cumulative prior scalar on the diagonal.

References

Examples

Batch fitting with automatic hyperparameter tuning:

>>> import numpy as np
>>> from bayesianbandits import EmpiricalBayesNormalRegressor
>>> rng = np.random.default_rng(42)
>>> X = rng.standard_normal((100, 3))
>>> w_true = np.array([1.0, -0.5, 0.0])
>>> y = X @ w_true + 0.1 * rng.standard_normal(100)
>>> model = EmpiricalBayesNormalRegressor(random_state=42)
>>> _ = model.fit(X, y)
>>> model.eb_converged_
True
>>> np.round(model.coef_, 2)  # Close to [1.0, -0.5, 0.0]
array([ 1.  , -0.49, -0.02])

Online learning with forgetting for non-stationary environments:

>>> model = EmpiricalBayesNormalRegressor(
...     learning_rate=0.99, random_state=42
... )
>>> for i in range(10):  
...     X_batch = rng.standard_normal((5, 3))
...     y_batch = X_batch @ w_true + 0.1 * rng.standard_normal(5)
...     model.partial_fit(X_batch, y_batch)
__init__(alpha: float = 1.0, beta: float = 1.0, *, n_eb_iter: int = 10, eb_tol: float = 0.0001, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None, trace_method: str = 'auto') None#
decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#

Decay the precision matrix with stabilized prior re-injection.

Applies exponential forgetting Λ_new = γ^n · Λ_old and then re-injects (1 - γ^n)·α onto the diagonal, following stabilized forgetting (Kulhavy & Zarrop 1993). This ensures the prior contribution to the precision converges to alpha rather than decaying to zero under repeated decay steps.

Sufficient statistics (_effective_n, _eff_yTy, _eff_XTy) used for the online MacKay beta update are also decayed.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Used only to determine the number of time steps n for the decay exponent γ^n. The actual feature values are not used.

  • decay_rate (float, default None) – Decay factor \(\gamma\) in (0, 1]. If None, uses self.learning_rate.

See also

partial_fit

Update the model with new observations.

fit(X_fit: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#

Fit the model with empirical Bayes hyperparameter tuning.

Iteratively optimizes alpha (prior precision) and beta (noise precision) by maximizing the log marginal likelihood using MacKay’s evidence framework, then performs a final posterior update with the converged hyperparameters.

Parameters:
  • X_fit (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), optional) – Individual weights for each sample. If None, all samples are given weight 1.0.

Returns:

self – Fitted estimator with tuned alpha and beta.

Return type:

EmpiricalBayesNormalRegressor

See also

partial_fit

Incremental update with one online MacKay step.

partial_fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#

Incrementally update the posterior and retune hyperparameters.

Performs a recursive Bayesian update (delegating to NormalRegressor.partial_fit), then runs one online MacKay step to adjust alpha and beta using accumulated sufficient statistics. The precision matrix is corrected to reflect the updated hyperparameters.

When learning_rate < 1, stabilized forgetting (Kulhavy & Zarrop 1993) re-injects (1 - γ^n)·α into the precision diagonal so that the prior contribution converges to alpha instead of decaying to zero.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training data.

  • y (array-like of shape (n_samples,)) – Target values.

  • sample_weight (array-like of shape (n_samples,), optional) – Individual weights for each sample. If None, all samples are given weight 1.0.

Returns:

self – Updated estimator with retuned hyperparameters.

Return type:

EmpiricalBayesNormalRegressor

See also

fit

Fit from scratch with full EB iteration loop.

decay

Increase uncertainty without observing new data.