Intercept-Only Models#

Two conjugate models for problems without covariates: one for categorical outcomes, one for rates. Both are stratified by the first feature value, maintaining an independent posterior per group.

DirichletClassifier#

Conjugate Dirichlet-Multinomial model for binary or categorical outcomes (click/no-click, class selection).

Symbols#

Symbol

Meaning

\(K\)

Number of classes

\(\alpha_k\)

Concentration parameter for class \(k\)

\(\theta_k\)

Probability of class \(k\)

\(w_i\)

Sample weight for observation \(i\)

\(\gamma\)

Decay factor

Prior and likelihood#

\[\boldsymbol{\theta} \sim \mathrm{Dirichlet}(\alpha_1, \ldots, \alpha_K)\]
\[y_i \mid \boldsymbol{\theta} \sim \mathrm{Categorical}(\boldsymbol{\theta})\]

Update#

\[\alpha_k^{\text{post}} = \alpha_k^{\text{prior}} + \sum_{i=1}^{N} w_i \, \mathbb{1}[y_i = k]\]

Posterior mean#

\[\mathbb{E}[\theta_k] = \frac{\alpha_k}{\sum_j \alpha_j}\]

Sampling#

\[\boldsymbol{\theta} \sim \mathrm{Dirichlet}(\alpha_1^{\text{post}}, \ldots, \alpha_K^{\text{post}})\]

via scipy.stats.dirichlet.

Decay#

\[\alpha_k \leftarrow \gamma\, \alpha_k \quad \forall k\]

Scaling all concentrations uniformly preserves the mean \(\alpha_k / \sum_j \alpha_j\) but increases posterior variance.

Reference: Murphy (2012) Chapter 3 [1].

GammaRegressor#

Conjugate Gamma-Poisson model for count or rate data (transactions per period, events per session).

Symbols#

Symbol

Meaning

\(\lambda\)

Rate parameter (random)

\(\alpha\)

Gamma shape parameter

\(\beta\)

Gamma rate parameter (inverse scale)

\(w_i\)

Sample weight for observation \(i\)

\(\gamma\)

Decay factor

Prior and likelihood#

\[\lambda \sim \mathrm{Gamma}(\alpha, \beta)\]
\[y_i \mid \lambda \sim \mathrm{Poisson}(\lambda)\]

Update#

\[\begin{split}\alpha^{\text{post}} &= \alpha^{\text{prior}} + \sum_{i=1}^{N} w_i\, y_i \\ \beta^{\text{post}} &= \beta^{\text{prior}} + \sum_{i=1}^{N} w_i\end{split}\]

Posterior moments#

\[\mathbb{E}[\lambda] = \frac{\alpha}{\beta}, \qquad \mathrm{Var}[\lambda] = \frac{\alpha}{\beta^2}\]

Sampling#

\[\lambda \sim \mathrm{Gamma}(\alpha^{\text{post}},\, \beta^{\text{post}})\]

via scipy.stats.gamma with scale = 1/beta.

Decay#

\[\alpha \leftarrow \gamma\,\alpha, \qquad \beta \leftarrow \gamma\,\beta\]

Both parameters scale equally, so the mean \(\alpha/\beta\) is preserved and the variance \(\alpha/\beta^2\) increases.

Reference: Murphy (2012) Chapter 3 [1].

When to use each#

Use DirichletClassifier when the outcome is categorical (binary conversions, multi-class selections). Use GammaRegressor when the outcome is a positive count or rate.

Both are intercept-only: each unique value of the first feature gets its own independent posterior. They cannot condition on covariates. For problems with covariates, use NormalRegressor or BayesianGLM.

References#