bayesianbandits.EmpiricalBayesNormalRegressor#
- class bayesianbandits.EmpiricalBayesNormalRegressor(alpha: float = 1.0, beta: float = 1.0, *, n_eb_iter: int = 10, eb_tol: float = 0.0001, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None, trace_method: str = 'auto')#
Bases:
NormalRegressorBayesian linear regression with empirical Bayes hyperparameter tuning.
Extends
NormalRegressorwith automatic optimization of the prior precision (alpha) and noise precision (beta) via MacKay’s evidence maximization framework [1]. Duringfit, the hyperparameters are iteratively updated to maximize the log marginal likelihood. Duringpartial_fit, a single online MacKay step is performed using accumulated sufficient statistics.When
learning_rate < 1, exponential forgetting is applied to the precision matrix. To prevent the prior contribution from collapsing to zero under repeated decay, stabilized forgetting [2] re-injects a fixed prior floor after each decay step.- Parameters:
alpha (
float, default1.0) – Initial prior precision of the weights. The prior is \(w \sim \mathcal{N}(0, \alpha^{-1} I)\). Updated automatically during fitting.beta (
float, default1.0) – Initial noise precision. The likelihood is \(y \mid x, w \sim \mathcal{N}(x^T w, \beta^{-1})\). Updated automatically during fitting.n_eb_iter (
int, default10) – Maximum number of empirical Bayes iterations duringfit. Each iteration re-fits the posterior and runs one MacKay update. Set to 0 to disable EB tuning duringfit.eb_tol (
float, default1e-4) – Convergence tolerance on the change in log marginal likelihood between successive EB iterations.learning_rate (
float, default1.0) – Decay rate for sequential updates. Values less than 1 geometrically shrink the precision matrix on each call todecayorpartial_fit, enabling adaptation to non-stationary environments.sparse (
bool, defaultFalse) – If True, use sparse matrix operations for the precision matrix. InputXmust be ascipy.sparse.csc_array. When CHOLMOD is available (viascikit-sparse), it is used for efficient Cholesky factorization; otherwise falls back to SuperLU.random_state (
int,np.random.Generator, orNone, defaultNone) – Controls the random number generator forsample. Pass an int for reproducible results across calls.trace_method (
{'auto', 'diagonal'}, default'auto') –Method for computing \(\operatorname{tr}(\Lambda^{-1})\) in the MacKay update for
alpha:'auto': Uses Takahashi recursion (exact, exploits sparsity) for sparse matrices, Cholesky for dense.'diagonal': Fast \(O(p)\) approximation \(\operatorname{tr}(\Lambda^{-1}) \approx \sum_i 1/\Lambda_{ii}\).
- log_evidence_#
Log marginal likelihood at convergence (after
fit), or-infifn_eb_iter=0.- Type:
float
- n_eb_iterations_#
Number of EB iterations performed during the last
fit.- Type:
int
- eb_converged_#
Whether the EB loop converged within
eb_tolduring the lastfit.- Type:
bool
See also
NormalRegressorBase estimator without empirical Bayes tuning.
BayesianGLMBayesian GLM for non-Gaussian likelihoods (logistic, Poisson).
Notes
Evidence maximization (MacKay’s update rules)
Given the posterior precision \(\Lambda = \alpha I + \beta X^T X\) and MAP estimate \(\hat{w}\), the effective number of well-determined parameters is:
\[\gamma = p - \alpha \operatorname{tr}(\Lambda^{-1})\]The hyperparameters are updated as [1]:
\[\alpha_{\text{new}} = \frac{\gamma}{\hat{w}^T \hat{w}}, \qquad \beta_{\text{new}} = \frac{N - \gamma}{ \| y - X \hat{w} \|^2}\]Stabilized forgetting
Under exponential decay with rate \(\gamma < 1\), the prior contribution to the precision diagonal decays as \(\gamma^t \alpha \to 0\). Stabilized forgetting [2] re-injects \((1 - \gamma^n) \alpha\) after each decay step, so the prior contribution converges to \(\alpha\) instead of zero:
\[s_{t+n} = \gamma^n s_t + (1 - \gamma^n) \alpha\]where \(s_t\) tracks the cumulative prior scalar on the diagonal.
References
Examples
Batch fitting with automatic hyperparameter tuning:
>>> import numpy as np >>> from bayesianbandits import EmpiricalBayesNormalRegressor >>> rng = np.random.default_rng(42) >>> X = rng.standard_normal((100, 3)) >>> w_true = np.array([1.0, -0.5, 0.0]) >>> y = X @ w_true + 0.1 * rng.standard_normal(100) >>> model = EmpiricalBayesNormalRegressor(random_state=42) >>> _ = model.fit(X, y) >>> model.eb_converged_ True >>> np.round(model.coef_, 2) # Close to [1.0, -0.5, 0.0] array([ 1. , -0.49, -0.02])
Online learning with forgetting for non-stationary environments:
>>> model = EmpiricalBayesNormalRegressor( ... learning_rate=0.99, random_state=42 ... ) >>> for i in range(10): ... X_batch = rng.standard_normal((5, 3)) ... y_batch = X_batch @ w_true + 0.1 * rng.standard_normal(5) ... model.partial_fit(X_batch, y_batch)
- __init__(alpha: float = 1.0, beta: float = 1.0, *, n_eb_iter: int = 10, eb_tol: float = 0.0001, learning_rate: float = 1.0, sparse: bool = False, random_state: int | None | Generator = None, trace_method: str = 'auto') None#
- decay(X: NDArray[Any] | csc_array, *, decay_rate: float | None = None) None#
Decay the precision matrix with stabilized prior re-injection.
Applies exponential forgetting
Λ_new = γ^n · Λ_oldand then re-injects(1 - γ^n)·αonto the diagonal, following stabilized forgetting (Kulhavy & Zarrop 1993). This ensures the prior contribution to the precision converges toalpharather than decaying to zero under repeated decay steps.Sufficient statistics (
_effective_n,_eff_yTy,_eff_XTy) used for the online MacKay beta update are also decayed.- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Used only to determine the number of time stepsnfor the decay exponentγ^n. The actual feature values are not used.decay_rate (
float, defaultNone) – Decay factor \(\gamma\) in (0, 1]. If None, usesself.learning_rate.
See also
partial_fitUpdate the model with new observations.
- fit(X_fit: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#
Fit the model with empirical Bayes hyperparameter tuning.
Iteratively optimizes
alpha(prior precision) andbeta(noise precision) by maximizing the log marginal likelihood using MacKay’s evidence framework, then performs a final posterior update with the converged hyperparameters.- Parameters:
X_fit (
array-likeofshape (n_samples,n_features)) – Training data.y (
array-likeofshape (n_samples,)) – Target values.sample_weight (
array-likeofshape (n_samples,), optional) – Individual weights for each sample. If None, all samples are given weight 1.0.
- Returns:
self – Fitted estimator with tuned
alphaandbeta.- Return type:
See also
partial_fitIncremental update with one online MacKay step.
- partial_fit(X: NDArray[Any] | csc_array, y: NDArray[Any], sample_weight: NDArray[Any] | None = None) Self#
Incrementally update the posterior and retune hyperparameters.
Performs a recursive Bayesian update (delegating to
NormalRegressor.partial_fit), then runs one online MacKay step to adjustalphaandbetausing accumulated sufficient statistics. The precision matrix is corrected to reflect the updated hyperparameters.When
learning_rate < 1, stabilized forgetting (Kulhavy & Zarrop 1993) re-injects(1 - γ^n)·αinto the precision diagonal so that the prior contribution converges toalphainstead of decaying to zero.- Parameters:
X (
array-likeofshape (n_samples,n_features)) – Training data.y (
array-likeofshape (n_samples,)) – Target values.sample_weight (
array-likeofshape (n_samples,), optional) – Individual weights for each sample. If None, all samples are given weight 1.0.
- Returns:
self – Updated estimator with retuned hyperparameters.
- Return type: