Liesel: A Probabilistic Programming Framework#
Welcome to the API documentation of Liesel, a probabilistic programming framework with a focus on semi-parametric regression. It includes:
Liesel, a library to express statistical models as Probabilistic Graphical Models (PGMs). Through the PGM representation, the user can build and update models in a natural way.
Goose, a library to build custom MCMC algorithms with several parameter blocks and MCMC kernels such as the No U-Turn Sampler (NUTS), the Iteratively Weighted Least Squares (IWLS) sampler, or different Gibbs samplers. Goose also takes care of the MCMC bookkeeping and the chain post-processing.
Liesel-GAM, a modeling library built on Liesel which assists the user with the configuration of semi-parametric regression models such as Generalized Additive Models for Location, Scale and Shape (GAMLSS) with different response distributions, spline-based smooth terms and shrinkage priors.
The name “Liesel” is an homage to the Gänseliesel fountain, landmark of Liesel’s birth city Göttingen.
Installation#
You can install Liesel via pip:
pip install liesel
If you want to work with the latest development version of Liesel or use PyGraphviz for prettier plots of the model graphs, see the README in the main repository.
Now you can get started. Throughout this documentation, we import Liesel as follows:
import liesel.model as lsl
import liesel.goose as gs
We also commonly use the following imports:
import jax
import jax.numpy as jnp
import numpy as np
import tensorflow_probability.substrates.jax.distributions as tfd
import tensorflow_probability.substrates.jax.bijectors as tfb
We provide overviews of the most important building blocks provided by liesel.model
and liesel.goose in Model Building (liesel.model) and
MCMC Sampling (liesel.goose), respectively.
Tutorials#
To start working with Liesel, our tutorials might come in handy, starting with a tutorial on linear regression. An overview of our tutorials can be found here: Liesel tutorials.
Further Reading#
For a scientific discussion of the software, see our paper on arXiv.
Acknowledgements#
Liesel is being developed by Paul Wiemann and Hannes Riebl at the University of Göttingen with support from Thomas Kneib. Important contributions were made by Joel Beck, Alex Afanasev, Gianmarco Callegher and Johannes Brachem. We are grateful to the German Research Foundation (DFG) for funding the development through grant 443179956.
API Reference#
This is an overview of the central classes in Liesel.
Model Basics#
The fundamental building blocks of your model graph are given by just three classes. Both are documented with examples, so make sure to check them out.
The model building workflow in Liesel consists of the following steps:
Set up the nodes and variables that make up your model.
Initialize a
Modelwith your root variable(s).
MCMC Setup#
To set up an MCMC engine, goose provides the EngineBuilder. Please refer to
the linked EngineBuilder documentation to learn how to use it.
A recent addition is the MCMCSpec, which can be passed to the
inference argument of a model.Var upon initialization to tell the variable
directly how it should be sampled. You can then use the method
get_engine_builder() of LieselMCMC to
conveniently initialize your EngineBuilder.
Manages the setup of MCMC specifications for a Liesel model. |
|
Specification for the MCMC kernel and optional jitter distribution associated with a model variable. |
|
The |
|
MCMC engine capable of combining multiple transition kernels. |
|
Defines an Epoch in an MCMC algorithm. |
MCMC Kernels#
Goose makes it easy for you to combine different MCMC kernels for different blocks of
model parameters. You can also define your own kernel by implementing
the Kernel protocol.
To draw samples from your posterior, you will want to call
sample_all_epochs(). Once sampling is done, you can obtain the results
with get_results(), which will return a SamplingResults
instance.
An IWLS kernel with dual averaging and an (optional) user-defined function for computing the Cholesky decomposition of the Fisher information matrix, implementing the |
|
A NUTS kernel with dual averaging and an inverse mass matrix tuner, implementing the |
|
A HMC kernel with dual averaging and an inverse mass matrix tuner, implementing the |
|
A random walk kernel. |
|
A Metropolis-Hastings kernel implementing the |
|
MHProposal(position, log_correction) |
|
A Gibbs kernel implementing the |
|
Protocol for a transition kernel. |
Summary & Plots#
The central classes for handling your sampling results are:
Contains the results of the MCMC engine. |
|
A summary object. |
|
A summary object based on a dictionary of samples. |
You can obtain your posterior samples as a dictionary via
get_posterior_samples(). There is also experimental support
for turning your samples into an arviz.InferenceData object via
to_arviz_inference_data().
Goose also comes with a number of plotting functions that give you quick acccess to important diagnostics.
Visualizes posterior samples over time with a trace plot. |
|
Visualizes autocorrelations of posterior samples. |
|
Produces a pairplot panel. |
|
Produces a scatterplot of two parameters. |
|
Visualizes posterior distributions with a density plot. |
|
Visualizes trace plot, density plot and autocorrelation plot of a single subparameter. |
Optimization#
It can often be beneficial to find good starting values to get your MCMC sampling scheme
going. Goose provides the function optim_flat() for this purpose, which allows you
to run stochastic gradient descent on a liesel model.
Optimize the parameters of a Liesel |
|
Handles (early) stopping for |
|
Turns a |
|
Holds the results of model optimization with |
Model (Advanced)#
A |
|
A node of a computational graph that can cache its value. |
|
A |
|
A weak variable evaluating a probability integral transform (PIT). |
|
A transient calculator node that does not cache its value. |
|
A transient distribution node that does not cache its value. |
|
A transient identity node that does not cache its value. |
|
A node that does not cache its value. |
|
A node that groups its inputs for another node. |
|
A graph builder, used to set up a |
Model Interfaces#
A natural option for setting up your model is the use of liesel.model.
However, you are not locked
in to using Model. Goose currently includes the following interfaces:
A |
|
A model interface for a model state represented by a |
|
A model interface for a model state represented by a |
P-Splines#
Builds a B-spline basis matrix. |
|
Create equidistant knots for a B-spline of the specified order. |
|
Builds an (n_param x n_param) P-spline penalty matrix. |
|
A potentially degenerate multivariate normal distribution. |
Experimental API#
Experimental Liesel add-ons. |
Effort-Based Versioning#
Starting with v0.4.0, we will be using effort-based versioning. See the EffVer documentation at https://jacobtomlinson.dev/effver/
The JAX developers provide a wonderful summary: https://docs.jax.dev/en/latest/jep/25516-effver.html
The following description is almost entirely quoted from the linked JAX page, but it describes what we intend with effort-based versioning perfectly.
Effort-based versioning is a three-number versioning system,
similar to the better-known semantic versioning (SemVer: https://semver.org/).
It uses a three-number format: MACRO.MESO.MICRO, where version numbers
are incremented based on the expected effort required to adapt to the change.
As an example, consider software with current version 2.3.4:
Increasing the micro version (i.e. releasing
2.3.5) signals to users that little to no effort is necessary on their part to adapt to the changes.Increasing the meso version (i.e. releasing
2.4.0) signals to users that some small effort will be required for existing code to work with the changes.Increasing the macro version (i.e. releasing
3.0.0) signals to users that significant effort may be required to update to the changes.
In some ways, this captures the essence of more commonly-used semantic versioning, but avoids phrasing in terms of compatibility guarantees that are hard to meet in practice.
Zero Version#
In addition, EffVer gives special meaning to the zero version.
Early releases of software are often versioned 0.X.Y, and in this case:
Xhas the characteristics of the macro version.Yhas the characteristics of the meso version.
Liesel has been in a zero-version state since its initial release, and EffVer’s zero-version case is a good post-facto description of the implicit intent behind Liesel’s releases to date.
In EffVer, bumping from 0.X.Y to version 1.0.0 is recommended
when a certain level of stability has been reached in practice:
If you end up on a version like 0.9.x for many months,
it is a good signal that things are pretty stable
and that it’s time to switch to a 1.0.0 release.