Intercept-Only Models
=====================

Two conjugate models for problems without covariates: one for
categorical outcomes, one for rates. Both are stratified by the first
feature value, maintaining an independent posterior per group.


DirichletClassifier
-------------------

Conjugate Dirichlet-Multinomial model for binary or categorical
outcomes (click/no-click, class selection).


Symbols
~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 15 85

   * - Symbol
     - Meaning
   * - :math:`K`
     - Number of classes
   * - :math:`\alpha_k`
     - Concentration parameter for class :math:`k`
   * - :math:`\theta_k`
     - Probability of class :math:`k`
   * - :math:`w_i`
     - Sample weight for observation :math:`i`
   * - :math:`\gamma`
     - Decay factor


Prior and likelihood
~~~~~~~~~~~~~~~~~~~~

.. math::

   \boldsymbol{\theta} \sim \mathrm{Dirichlet}(\alpha_1, \ldots, \alpha_K)

.. math::

   y_i \mid \boldsymbol{\theta} \sim \mathrm{Categorical}(\boldsymbol{\theta})


Update
~~~~~~

.. math::

   \alpha_k^{\text{post}}
   = \alpha_k^{\text{prior}}
   + \sum_{i=1}^{N} w_i \, \mathbb{1}[y_i = k]


Posterior mean
~~~~~~~~~~~~~~

.. math::

   \mathbb{E}[\theta_k]
   = \frac{\alpha_k}{\sum_j \alpha_j}


Sampling
~~~~~~~~

.. math::

   \boldsymbol{\theta} \sim \mathrm{Dirichlet}(\alpha_1^{\text{post}},
     \ldots, \alpha_K^{\text{post}})

via ``scipy.stats.dirichlet``.


Decay
~~~~~

.. math::

   \alpha_k \leftarrow \gamma\, \alpha_k \quad \forall k

Scaling all concentrations uniformly preserves the mean
:math:`\alpha_k / \sum_j \alpha_j` but increases posterior variance.

**Reference:** Murphy (2012) Chapter 3 [1]_.


GammaRegressor
--------------

Conjugate Gamma-Poisson model for count or rate data (transactions
per period, events per session).


Symbols
~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 15 85

   * - Symbol
     - Meaning
   * - :math:`\lambda`
     - Rate parameter (random)
   * - :math:`\alpha`
     - Gamma shape parameter
   * - :math:`\beta`
     - Gamma rate parameter (inverse scale)
   * - :math:`w_i`
     - Sample weight for observation :math:`i`
   * - :math:`\gamma`
     - Decay factor


Prior and likelihood
~~~~~~~~~~~~~~~~~~~~

.. math::

   \lambda \sim \mathrm{Gamma}(\alpha, \beta)

.. math::

   y_i \mid \lambda \sim \mathrm{Poisson}(\lambda)


Update
~~~~~~

.. math::

   \alpha^{\text{post}}
   &= \alpha^{\text{prior}} + \sum_{i=1}^{N} w_i\, y_i \\
   \beta^{\text{post}}
   &= \beta^{\text{prior}} + \sum_{i=1}^{N} w_i


Posterior moments
~~~~~~~~~~~~~~~~~

.. math::

   \mathbb{E}[\lambda] = \frac{\alpha}{\beta}, \qquad
   \mathrm{Var}[\lambda] = \frac{\alpha}{\beta^2}


Sampling
~~~~~~~~

.. math::

   \lambda \sim \mathrm{Gamma}(\alpha^{\text{post}},\,
     \beta^{\text{post}})

via ``scipy.stats.gamma`` with ``scale = 1/beta``.


Decay
~~~~~

.. math::

   \alpha \leftarrow \gamma\,\alpha, \qquad
   \beta \leftarrow \gamma\,\beta

Both parameters scale equally, so the mean :math:`\alpha/\beta` is
preserved and the variance :math:`\alpha/\beta^2` increases.

**Reference:** Murphy (2012) Chapter 3 [1]_.


When to use each
-----------------

Use :class:`~bayesianbandits.DirichletClassifier` when the outcome is
categorical (binary conversions, multi-class selections). Use
:class:`~bayesianbandits.GammaRegressor` when the outcome is a
positive count or rate.

Both are intercept-only: each unique value of the first feature gets
its own independent posterior. They cannot condition on covariates. For
problems with covariates, use :class:`~bayesianbandits.NormalRegressor`
or :class:`~bayesianbandits.BayesianGLM`.


References
----------

.. [1] Murphy, K. P. (2012). *Machine Learning: A Probabilistic
   Perspective*, Chapter 3. MIT Press.