Liesel: A Probabilistic Programming Framework#

Welcome to the API documentation of Liesel, a probabilistic programming framework with a focus on semi-parametric regression. It includes:

  • Liesel, a library to express statistical models as Probabilistic Graphical Models (PGMs). Through the PGM representation, the user can build and update models in a natural way.

  • Goose, a library to build custom MCMC algorithms with several parameter blocks and MCMC kernels such as the No U-Turn Sampler (NUTS), the Iteratively Weighted Least Squares (IWLS) sampler, or different Gibbs samplers. Goose also takes care of the MCMC bookkeeping and the chain post-processing.

  • Liesel-GAM, a modeling library built on Liesel which assists the user with the configuration of semi-parametric regression models such as Generalized Additive Models for Location, Scale and Shape (GAMLSS) with different response distributions, spline-based smooth terms and shrinkage priors.

The name “Liesel” is an homage to the Gänseliesel fountain, landmark of Liesel’s birth city Göttingen.

Installation#

You can install Liesel via pip:

pip install liesel

If you want to work with the latest development version of Liesel or use PyGraphviz for prettier plots of the model graphs, see the README in the main repository.

Now you can get started. Throughout this documentation, we import Liesel as follows:

import liesel.model as lsl
import liesel.goose as gs

We also commonly use the following imports:

import jax
import jax.numpy as jnp
import numpy as np
import tensorflow_probability.substrates.jax.distributions as tfd
import tensorflow_probability.substrates.jax.bijectors as tfb

We provide overviews of the most important building blocks provided by liesel.model and liesel.goose in Model Building (liesel.model) and MCMC Sampling (liesel.goose), respectively.

Tutorials#

To start working with Liesel, our tutorials might come in handy, starting with a tutorial on linear regression. An overview of our tutorials can be found here: Liesel tutorials.

Further Reading#

For a scientific discussion of the software, see our paper on arXiv.

Acknowledgements#

Liesel is being developed by Paul Wiemann and Hannes Riebl at the University of Göttingen with support from Thomas Kneib. Important contributions were made by Joel Beck, Alex Afanasev, Gianmarco Callegher and Johannes Brachem. We are grateful to the German Research Foundation (DFG) for funding the development through grant 443179956.

University of Göttingen Funded by DFG

API Reference#

This is an overview of the central classes in Liesel.

Model Basics#

The fundamental building blocks of your model graph are given by just three classes. Both are documented with examples, so make sure to check them out.

The model building workflow in Liesel consists of the following steps:

  1. Set up the nodes and variables that make up your model.

  2. Initialize a Model with your root variable(s).

Model

A model with a static graph.

Var

A variable in a statistical model.

Dist

A Node subclass that wraps a probability distribution.

MCMC Setup#

To set up an MCMC engine, goose provides the EngineBuilder. Please refer to the linked EngineBuilder documentation to learn how to use it.

A recent addition is the MCMCSpec, which can be passed to the inference argument of a model.Var upon initialization to tell the variable directly how it should be sampled. You can then use the method get_engine_builder() of LieselMCMC to conveniently initialize your EngineBuilder.

LieselMCMC

Manages the setup of MCMC specifications for a Liesel model.

MCMCSpec

Specification for the MCMC kernel and optional jitter distribution associated with a model variable.

EngineBuilder

The EngineBuilder is used to construct an MCMC Engine.

Engine

MCMC engine capable of combining multiple transition kernels.

EpochConfig

Defines an Epoch in an MCMC algorithm.

MCMC Kernels#

Goose makes it easy for you to combine different MCMC kernels for different blocks of model parameters. You can also define your own kernel by implementing the Kernel protocol.

To draw samples from your posterior, you will want to call sample_all_epochs(). Once sampling is done, you can obtain the results with get_results(), which will return a SamplingResults instance.

IWLSKernel

An IWLS kernel with dual averaging and an (optional) user-defined function for computing the Cholesky decomposition of the Fisher information matrix, implementing the liesel.goose.types.Kernel protocol.

NUTSKernel

A NUTS kernel with dual averaging and an inverse mass matrix tuner, implementing the Kernel protocol.

HMCKernel

A HMC kernel with dual averaging and an inverse mass matrix tuner, implementing the Kernel protocol.

RWKernel

A random walk kernel.

MHKernel

A Metropolis-Hastings kernel implementing the Kernel protocol.

MHProposal

MHProposal(position, log_correction)

GibbsKernel

A Gibbs kernel implementing the Kernel protocol.

Kernel

Protocol for a transition kernel.

Summary & Plots#

The central classes for handling your sampling results are:

SamplingResults

Contains the results of the MCMC engine.

Summary

A summary object.

SamplesSummary

A summary object based on a dictionary of samples.

You can obtain your posterior samples as a dictionary via get_posterior_samples(). There is also experimental support for turning your samples into an arviz.InferenceData object via to_arviz_inference_data().

Goose also comes with a number of plotting functions that give you quick acccess to important diagnostics.

plot_trace

Visualizes posterior samples over time with a trace plot.

plot_cor

Visualizes autocorrelations of posterior samples.

plot_pairs

Produces a pairplot panel.

plot_scatter

Produces a scatterplot of two parameters.

plot_density

Visualizes posterior distributions with a density plot.

plot_param

Visualizes trace plot, density plot and autocorrelation plot of a single subparameter.

Optimization#

It can often be beneficial to find good starting values to get your MCMC sampling scheme going. Goose provides the function optim_flat() for this purpose, which allows you to run stochastic gradient descent on a liesel model.

optim_flat

Optimize the parameters of a Liesel Model.

Stopper

Handles (early) stopping for optim_flat().

history_to_df

Turns a OptimResult.history dictionary into a pandas.DataFrame.

OptimResult

Holds the results of model optimization with optim_flat().

Model (Advanced)#

Calc

A Node subclass that calculates its value based on its inputs nodes.

Node

A node of a computational graph that can cache its value.

Value

A Node subclass that holds constant values.

PIT

A weak variable evaluating a probability integral transform (PIT).

TransientCalc

A transient calculator node that does not cache its value.

TransientDist

A transient distribution node that does not cache its value.

TransientIdentity

A transient identity node that does not cache its value.

TransientNode

A node that does not cache its value.

InputGroup

A node that groups its inputs for another node.

GraphBuilder

A graph builder, used to set up a Model.

Model Interfaces#

A natural option for setting up your model is the use of liesel.model. However, you are not locked in to using Model. Goose currently includes the following interfaces:

LieselInterface

A ModelInterface for a Liesel Model.

DictInterface

A model interface for a model state represented by a dict[str, Array] and a corresponding log-probability function.

NamedTupleInterface

A model interface for a model state represented by a NamedTuple and a corresponding log-probability function.

P-Splines#

basis_matrix

Builds a B-spline basis matrix.

equidistant_knots

Create equidistant knots for a B-spline of the specified order.

pspline_penalty

Builds an (n_param x n_param) P-spline penalty matrix.

MultivariateNormalDegenerate

A potentially degenerate multivariate normal distribution.

Experimental API#

experimental

Experimental Liesel add-ons.

Effort-Based Versioning#

Starting with v0.4.0, we will be using effort-based versioning. See the EffVer documentation at https://jacobtomlinson.dev/effver/

The JAX developers provide a wonderful summary: https://docs.jax.dev/en/latest/jep/25516-effver.html

The following description is almost entirely quoted from the linked JAX page, but it describes what we intend with effort-based versioning perfectly.

Effort-based versioning is a three-number versioning system, similar to the better-known semantic versioning (SemVer: https://semver.org/). It uses a three-number format: MACRO.MESO.MICRO, where version numbers are incremented based on the expected effort required to adapt to the change.

As an example, consider software with current version 2.3.4:

  1. Increasing the micro version (i.e. releasing 2.3.5) signals to users that little to no effort is necessary on their part to adapt to the changes.

  2. Increasing the meso version (i.e. releasing 2.4.0) signals to users that some small effort will be required for existing code to work with the changes.

  3. Increasing the macro version (i.e. releasing 3.0.0) signals to users that significant effort may be required to update to the changes.

In some ways, this captures the essence of more commonly-used semantic versioning, but avoids phrasing in terms of compatibility guarantees that are hard to meet in practice.

Zero Version#

In addition, EffVer gives special meaning to the zero version. Early releases of software are often versioned 0.X.Y, and in this case:

  • X has the characteristics of the macro version.

  • Y has the characteristics of the meso version.

Liesel has been in a zero-version state since its initial release, and EffVer’s zero-version case is a good post-facto description of the implicit intent behind Liesel’s releases to date.

In EffVer, bumping from 0.X.Y to version 1.0.0 is recommended when a certain level of stability has been reached in practice: If you end up on a version like 0.9.x for many months, it is a good signal that things are pretty stable and that it’s time to switch to a 1.0.0 release.