Choosing and Tuning a Decay Rate ================================ Decay lets a bandit adapt to non-stationary environments. There are two ways to apply it: .. note:: Decay scales the precision matrix, which increases posterior variance without moving the posterior mean. Your point estimate stays the same; you just become less confident in it. Wider posteriors drive re-exploration via more diverse Thompson samples and higher UCB values. ``learning_rate < 1`` on the estimator Decay is coupled to ``partial_fit``: every update automatically down-weights the prior by ``learning_rate ** n_samples`` before incorporating new data. Explicit ``agent.decay()`` calls Decay is decoupled from updates. You call ``decay()`` on your own schedule, independently of when observations arrive. Start with no decay -------------------- If you are unsure whether your environment is non-stationary, start with ``learning_rate=1.0`` (the default) and no ``decay()`` calls. Adding decay when you don't need it throws away information and widens your posterior for no benefit. Decouple decay from updates ----------------------------- Consider a product recommendation system. You ``pull()`` thousands of times per day as users visit the site, and ``update()`` as purchases arrive. But user preferences don't shift on a per-request basis -- they shift over weeks or months. If you set ``learning_rate < 1``, the amount of forgetting depends on how many observations land in each update batch, not how fast tastes actually change. Keep ``learning_rate=1.0`` and call ``decay()`` on a schedule that matches the timescale of change in your environment: .. code-block:: python from bayesianbandits import ( Arm, ContextualAgent, NormalRegressor, ThompsonSampling, ) import numpy as np arms = [ Arm(f"product_{i}", learner=NormalRegressor(alpha=1.0, beta=1.0)) for i in range(3) ] agent = ContextualAgent(arms, ThompsonSampling(), random_seed=42) # Throughout the day: pull and update as users visit X = np.array([[1.0, 2.0]]) # user features (action,) = agent.pull(X) agent.update(X, y=np.array([1.0])) # purchase signal # Once per day (e.g. nightly cron): decay all arms agent.decay(np.array([[0.0, 0.0]]), decay_rate=0.95) Pass a 1-row array -- ``decay()`` uses ``X.shape[0]`` as the exponent, so a 100-row array would apply ``0.95^100`` instead of ``0.95`` [1]_. .. [1] ``decay()`` raises ``gamma`` to the power of ``X.shape[0]``, so a 1-row array gives one decay step (``gamma^1``). This exponent exists so that ``partial_fit`` on a batch of 10 observations gives the same posterior as fitting them one at a time with decay between each. In practice, per-observation decay via ``learning_rate < 1`` is often too aggressive: most real systems make many decisions per natural time period (thousands of recommendations per day), and decaying once per observation in that setting forgets too fast. Choose a decay rate -------------------- The decay rate ``gamma`` controls how many effective observations the model remembers. After ``n`` decay steps, an observation's weight is ``gamma^n``. A rough rule of thumb: the effective window size is approximately ``1 / (1 - gamma)`` observations before the weight drops below ``1/e``: ====== ================ gamma Effective window ====== ================ 0.999 ~1000 0.99 ~100 0.95 ~20 0.9 ~10 ====== ================ Start conservative (closer to 1.0). You can always decay more aggressively later. Avoid over-decay ----------------- Aggressive decay can cause problems: - **Near-singular precision matrix**: with ``NormalRegressor``, the precision matrix ``Lambda`` is scaled by ``gamma^n`` on each decay step. If ``gamma`` is too small or decay is called too frequently, ``Lambda`` approaches zero and the Cholesky factorization fails. - **Prior washed out**: the prior contribution ``alpha * I`` decays along with the data. After enough decay steps, the model is effectively unregularized. :class:`~bayesianbandits.EmpiricalBayesNormalRegressor` mitigates the second problem with stabilized forgetting: after each decay step, it re-injects ``(1 - gamma^n) * alpha`` onto the precision diagonal so the prior contribution converges to ``alpha`` instead of zero. If you need decay and want a safety net against prior collapse, use EB: .. code-block:: python from bayesianbandits import EmpiricalBayesNormalRegressor learner = EmpiricalBayesNormalRegressor( alpha=1.0, beta=1.0, learning_rate=1.0, # still decouple from partial_fit ) Then call ``decay()`` on a schedule as above. See the :doc:`delayed reward example ` for a full simulation that tunes the decay rate with optuna and shows how too much decay hurts.