Linear Regression#

In this tutorial, we build a linear regression model with Liesel and estimate it with Goose. Our goal is to illustrate the most fundamental features of the software in a straight-forward context.

Imports#

Before we can generate the data and build the model, we need to load Liesel and a number of other packages. We usually import the model building library liesel.model as lsl, and the MCMC library liesel.goose as gs.

import jax
import jax.numpy as jnp
import numpy as np

# We use distributions and bijectors from tensorflow probability
import tensorflow_probability.substrates.jax.distributions as tfd
import tensorflow_probability.substrates.jax.bijectors as tfb

import liesel.goose as gs
import liesel.model as lsl

import matplotlib.pyplot as plt

Generating the data#

Now we can simulate 500 observations from the linear regression model \(y_i \sim \mathcal{N}(\beta_0 + \beta_1 x_i, \;\sigma^2)\) with the true parameters \(\boldsymbol{\beta} = (\beta_0, \beta_1)' = (1, 2)'\) and \(\sigma = 1\). The relationship between the response \(y_i\) and the covariate \(x_i\) is visualized in the following scatterplot.

rng = np.random.default_rng(42)

# sample size and true parameters
n = 500
true_beta = np.array([1.0, 2.0])
true_sigma = 1.0

# data-generating process
x0 = rng.uniform(size=n)
X_mat = np.column_stack([np.ones(n), x0])
eps = rng.normal(scale=true_sigma, size=n)
y_vec = X_mat @ true_beta + eps

# plot the simulated data
plt.scatter(x0, y_vec)
plt.title("Simulated data from the linear regression model")
plt.xlabel("Covariate x")
plt.ylabel("Response y")
plt.show()

Building the Model#

As the most basic building blocks of a model, Liesel provides the Var class for instantiating variables and the Dist class for wrapping probability distributions. The Var class comes with four constructors, namely Var.new_param() for parameters, Var.new_obs() for observed data, Var.new_calc() for variables that are deterministic functions of other variables in the model, and Var.new_value() for fixed values.

The regression coefficients#

Let’s assume the weakly informative prior \(\beta_0, \beta_1 \sim \mathcal{N}(0, 100^2)\) for the regression coefficients. To define this in Liesel, we will be using the Dist class. This class wraps distribution classes with the TensorFlow Probability (TFP) API. Here, we use the TFP distribution object (tfd.Normal), and the two hyperparameters representing the parameters of the distribution. TFP uses the names loc for the mean and scale for the standard deviation, so we have to use the same names here. This is a general feature of Dist, you should always use the parameter names from TFP to refer to the parameters of your distribution.

beta_prior = lsl.Dist(tfd.Normal, loc=0.0, scale=100.0)

Now we can create our regression coefficient with the Var.new_param() constructor:

beta = lsl.Var.new_param(
    value=jnp.array([0.0, 0.0]), distribution=beta_prior, name="beta"
)

The standard deviation#

We define the standard deviation using the weakly informative prior \(\sigma^2 \sim \text{InverseGamma}(a, b)\) with \(a = b = 0.01\).

sigma_sq_prior = lsl.Dist(tfd.InverseGamma, concentration=0.01, scale=0.01)
sigma_sq = lsl.Var.new_param(value=1.0, distribution=sigma_sq_prior, name="sigma_sq")

Since we need to work not only with the variance, but with the scale, we initialize the scale using Var.new_calc(), to compute the square root.

sigma = lsl.Var.new_calc(jnp.sqrt, sigma_sq, name="sigma")

Design matrix, fitted values, and response#

To compute the matrix-vector product \(\mathbf{X}\boldsymbol{\beta}\), we use another variable instantiated via Var.new_calc(). We can view our model as \(y_i \sim \mathcal{N}(\mu_i, \;\sigma^2)\) with \(\mu_i = \beta_0 + \beta_1 x_i\), so we use the name mu for this product.

X = lsl.Var.new_obs(X_mat, name="X")
mu = lsl.Var.new_calc(jnp.dot, X, beta, name="mu")

At last we can define our response, using our observed response values. And since we assumed the model \(y_i \sim \mathcal{N}(\beta_0 + \beta_1 x_i, \;\sigma^2)\), we also need to specify the response’s distribution. We use our sigma and mu to specify this distribution:

y_dist = lsl.Dist(tfd.Normal, loc=mu, scale=sigma)
y = lsl.Var.new_obs(y_vec, distribution=y_dist, name="y")

Bringing the model together#

Now, we can set up the Model. Here, we will only add the response.

model = lsl.Model([y])

The plot_vars() function visualizes the model. More on that in the Model building with Liesel tutorial If the layout of the graph looks messy for you, please make sure you have the pygraphviz package installed.

lsl.plot_vars(model)

MCMC inference with Goose#

This section illustrates the basics of Liesel’s MCMC framework Goose. To use Goose, the user needs to select one or more sampling algorithms, called (transition) kernels, for the model parameters. Goose comes with a number of standard kernels such as Hamiltonian Monte Carlo (HMCKernel) or the No U-Turn Sampler (NUTSKernel). Multiple kernels can be combined in one sampling scheme and assigned to different parameters, and the user can implement their own problem-specific kernels, as long as they are compatible with the Kernel protocol. In any case, the user is responsible for constructing a mathematically valid algorithm.

We start with a very simple sampling scheme, keeping \(\sigma^2\) fixed at the true value and using a NUTS sampler for \(\boldsymbol{\beta}\). More on sampling \(\sigma^2\) can be found in the Parameter transformations tutorial and the Gibbs sampling tutorial. The kernels are added to a Engine, which coordinates the sampling, including the kernel tuning during the warmup, and the MCMC bookkeeping. The engine can be configured step by step with a EngineBuilder. Starting from a Liesel model, it is straight-forward to obtain an engine builder using the LieselMCMC helper. We then need to define the kernels, and the sampling duration. Finally, we can call the EngineBuilder.build() method, which returns a fully configured MCMC engine.

builder = gs.LieselMCMC(model).get_engine_builder(seed=1337, num_chains=4)

builder.add_kernel(gs.NUTSKernel(["beta"]))
builder.set_duration(warmup_duration=1000, posterior_duration=1000)

engine = builder.build()

Now we can run the MCMC algorithm for the specified duration by calling the sample_all_epochs() method. In a first step, the model and the sampling algorithm are compiled, so don’t worry if you don’t see an output right away. The subsequent samples will be generated much faster.

engine.sample_all_epochs()
  0%|                                                  | 0/3 [00:00<?, ?chunk/s]
 33%|##############                            | 1/3 [00:01<00:02,  1.35s/chunk]
100%|##########################################| 3/3 [00:01<00:00,  2.22chunk/s]

  0%|                                                  | 0/1 [00:00<?, ?chunk/s]
100%|########################################| 1/1 [00:00<00:00, 2212.19chunk/s]

  0%|                                                  | 0/2 [00:00<?, ?chunk/s]
100%|########################################| 2/2 [00:00<00:00, 3883.61chunk/s]

  0%|                                                  | 0/4 [00:00<?, ?chunk/s]
100%|########################################| 4/4 [00:00<00:00, 4106.02chunk/s]

  0%|                                                  | 0/8 [00:00<?, ?chunk/s]
100%|########################################| 8/8 [00:00<00:00, 1306.54chunk/s]

  0%|                                                 | 0/20 [00:00<?, ?chunk/s]
100%|#######################################| 20/20 [00:00<00:00, 391.76chunk/s]

  0%|                                                  | 0/2 [00:00<?, ?chunk/s]
100%|########################################| 2/2 [00:00<00:00, 2500.33chunk/s]

  0%|                                                 | 0/40 [00:00<?, ?chunk/s]
 82%|################################1      | 33/40 [00:00<00:00, 318.94chunk/s]
100%|#######################################| 40/40 [00:00<00:00, 303.65chunk/s]

Finally, we can extract the results and print a summary table.

results = engine.get_results()
summary = gs.Summary(results)
summary

Parameter summary:

kernel mean sd q_0.05 q_0.5 q_0.95 sample_size ess_bulk ess_tail rhat
parameter index
beta (0,) kernel_00 0.981 0.087 0.842 0.980 1.127 4000 887.081 1154.253 1.003
(1,) kernel_00 1.914 0.152 1.658 1.915 2.163 4000 883.961 1324.823 1.001

Error summary:

count sample_size sample_size_total relative
kernel error_code error_msg phase
kernel_00 1 divergent transition warmup 52 4000 4000 0.013
posterior 0 4000 4000 0.000

If we need more samples, we can append another epoch to the engine and sample it by calling either the sample_next_epoch() or the sample_all_epochs() method. The epochs are described by EpochConfig objects.

engine.append_epoch(
    gs.EpochConfig(gs.EpochType.POSTERIOR, duration=1000, thinning=1, optional=None)
)
engine.sample_next_epoch()
  0%|                                                 | 0/40 [00:00<?, ?chunk/s]
 80%|###############################2       | 32/40 [00:00<00:00, 311.91chunk/s]
100%|#######################################| 40/40 [00:00<00:00, 298.62chunk/s]

No compilation is required at this point, so this is pretty fast.

Here, we end this first tutorial. We have learned how to build a linear regression model and seen how we can use Kernels for drawing MCMC samples - that is quite a bit for the start. Now, have fun modelling with Liesel!