Working with Sparse Features
============================

When your feature space is large and sparse (one-hot encoded
categories, text features, high-cardinality IDs), dense precision
matrices become impractical. A model with 100k features needs a
100k x 100k matrix in the dense case. Set ``sparse=True`` on the
estimator and the precision matrix is stored as a sparse CSC array,
so storage and updates scale with the number of nonzero entries
instead of the square of the feature dimension.

All linear estimators support ``sparse=True``. The intercept-only
models (:class:`~bayesianbandits.DirichletClassifier`,
:class:`~bayesianbandits.GammaRegressor`) don't need it.


Enable sparse mode
-------------------

Pass ``sparse=True`` to the estimator and provide context as
``scipy.sparse.csc_array``:

.. code-block:: python

   import numpy as np
   from scipy.sparse import random as sparse_random
   from bayesianbandits import (
       Arm, ContextualAgent, NormalRegressor, ThompsonSampling,
   )

   arms = [
       Arm(f"variant_{i}", learner=NormalRegressor(
           alpha=1.0, beta=1.0, sparse=True,
       ))
       for i in range(3)
   ]
   agent = ContextualAgent(arms, ThompsonSampling(), random_seed=42)

   # Sparse context: 1 row, 10000 features, ~1% density
   X = sparse_random(1, 10000, density=0.01, format="csc", random_state=42)
   (action,) = agent.pull(X)
   agent.update(X, np.array([1.0]))

The estimator will not convert dense arrays to sparse for you. If you
pass a dense array to a ``sparse=True`` estimator, it will raise.

If the precision matrix fills in over time (common with unstructured
sparse features like bag-of-words), sparse operations become slower
than dense. Hierarchical features (one-hot at each level of a
taxonomy) keep fill bounded and are the ideal use case.


Install CHOLMOD for production workloads
-----------------------------------------

Two sparse backends are available:

**SuperLU** ships with scipy and works out of the box. It handles
arbitrary sparse LU decomposition, which is a harder problem than
what we actually need. Precision matrices are symmetric positive
definite, and SuperLU can't exploit that, so it does more work than
necessary. For small models this doesn't matter. For large ones it
dominates your inference time.

**CHOLMOD** (via ``scikit-sparse``) knows the matrix is symmetric
positive definite and takes advantage of it.

.. code-block:: bash

   pip install bayesianbandits[cholmod]

If ``scikit-sparse`` is installed, the library uses CHOLMOD
automatically. No code changes needed. Both backends apply
fill-reducing permutations internally; the library handles
unpermuting so that sampling, prediction, and all other operations
are unaware of the solver choice. To force SuperLU (for debugging or
benchmarking), set the environment variable ``BB_NO_SUITESPARSE=1``.

With CHOLMOD, real-time inference under 10 ms is feasible with models
up to 2\ :sup:`20` features.