# Linear Regression In this tutorial, we build a linear regression model with Liesel and estimate it with Goose. Our goal is to illustrate the most fundamental features of the software in a straight-forward context. ## Imports Before we can generate the data and build the model, we need to load Liesel and a number of other packages. We usually import the model building library `liesel.model` as `lsl`, and the MCMC library `liesel.goose` as `gs`. ``` python import jax import jax.numpy as jnp import numpy as np # We use distributions and bijectors from tensorflow probability import tensorflow_probability.substrates.jax.distributions as tfd import tensorflow_probability.substrates.jax.bijectors as tfb import liesel.goose as gs import liesel.model as lsl import matplotlib.pyplot as plt ``` ## Generating the data Now we can simulate 500 observations from the linear regression model $y_i \sim \mathcal{N}(\beta_0 + \beta_1 x_i, \;\sigma^2)$ with the true parameters $\boldsymbol{\beta} = (\beta_0, \beta_1)' = (1, 2)'$ and $\sigma = 1$. The relationship between the response $y_i$ and the covariate $x_i$ is visualized in the following scatterplot. ``` python rng = np.random.default_rng(42) # sample size and true parameters n = 500 true_beta = np.array([1.0, 2.0]) true_sigma = 1.0 # data-generating process x0 = rng.uniform(size=n) X_mat = np.column_stack([np.ones(n), x0]) eps = rng.normal(scale=true_sigma, size=n) y_vec = X_mat @ true_beta + eps # plot the simulated data plt.scatter(x0, y_vec) plt.title("Simulated data from the linear regression model") plt.xlabel("Covariate x") plt.ylabel("Response y") plt.show() ```

## Building the Model As the most basic building blocks of a model, Liesel provides the {class}`.Var` class for instantiating variables and the {class}`.Dist` class for wrapping probability distributions. The {class}`.Var` class comes with four constructors, namely {meth}`.Var.new_param` for parameters, {meth}`.Var.new_obs` for observed data, {meth}`.Var.new_calc` for variables that are deterministic functions of other variables in the model, and {meth}`.Var.new_value` for fixed values. ### The regression coefficients Let’s assume the weakly informative prior $\beta_0, \beta_1 \sim \mathcal{N}(0, 100^2)$ for the regression coefficients. To define this in Liesel, we will be using the {class}`.Dist` class. This class wraps distribution classes with the TensorFlow Probability (TFP) API. Here, we use the TFP distribution object [(`tfd.Normal`)](https://www.tensorflow.org/probability/api_docs/python/tfp/distributions/Normal), and the two hyperparameters representing the parameters of the distribution. TFP uses the names `loc` for the mean and `scale` for the standard deviation, so we have to use the same names here. This is a general feature of {class}`.Dist`, you should always use the parameter names from TFP to refer to the parameters of your distribution. ``` python beta_prior = lsl.Dist(tfd.Normal, loc=0.0, scale=100.0) ``` Now we can create our regression coefficient with the {meth}`.Var.new_param` constructor. We also attach an {class}`~.goose.MCMCSpec` to `beta`, which tells Goose to sample this parameter with a NUTS kernel later on: ``` python beta = lsl.Var.new_param( value=jnp.array([0.0, 0.0]), dist=beta_prior, name="beta", inference=gs.MCMCSpec(gs.NUTSKernel), ) ``` ### The variance and standard deviation We define the variance using the weakly informative prior $\sigma^2 \sim \text{InverseGamma}(a, b)$ with $a = b = 0.01$. In this introductory model, we do not attach an MCMC kernel to `sigma_sq`, so it remains fixed at its initial value during sampling. ``` python sigma_sq_prior = lsl.Dist(tfd.InverseGamma, concentration=0.01, scale=0.01) sigma_sq = lsl.Var.new_param(value=1.0, dist=sigma_sq_prior, name="sigma_sq") ``` Since we need to work not only with the variance, but with the scale, we initialize the scale using {meth}`.Var.new_calc`, to compute the square root. ``` python sigma = lsl.Var.new_calc(jnp.sqrt, sigma_sq, name="sigma") ``` ### Design matrix, fitted values, and response To compute the matrix-vector product $\mathbf{X}\boldsymbol{\beta}$, we use another variable instantiated via {meth}`.Var.new_calc`. We can view our model as $y_i \sim \mathcal{N}(\mu_i, \;\sigma^2)$ with $\mu_i = \beta_0 + \beta_1 x_i$, so we use the name `mu` for this product. ``` python X = lsl.Var.new_obs(X_mat, name="X") mu = lsl.Var.new_calc(jnp.dot, X, beta, name="mu") ``` At last we can define our response, using our observed response values. And since we assumed the model $y_i \sim \mathcal{N}(\beta_0 + \beta_1 x_i, \;\sigma^2)$, we also need to specify the response’s distribution. We use our `sigma` and `mu` to specify this distribution: ``` python y_dist = lsl.Dist(tfd.Normal, loc=mu, scale=sigma) y = lsl.Var.new_obs(y_vec, dist=y_dist, name="y") ``` ### Bringing the model together Now, we can set up the {class}`.Model`. Here, we will only add the response. ``` python model = lsl.Model(y) ``` The {meth}`.Model.plot()` method visualizes the model. If the layout of the graph looks messy for you, please make sure you have the `pygraphviz` package installed. ``` python model.plot() ```

## MCMC inference with Goose This section illustrates the basics of Liesel’s MCMC framework Goose. To use Goose, the user needs to select one or more sampling algorithms, called (transition) kernels, for the model parameters. Goose comes with a number of standard kernels such as Hamiltonian Monte Carlo ({class}`~.goose.HMCKernel`) or the No U-Turn Sampler ({class}`~.goose.NUTSKernel`). Multiple kernels can be combined in one sampling scheme and assigned to different parameters, and the user can implement their own problem-specific kernels, as long as they are compatible with the {class}`.Kernel` protocol. In any case, the user is responsible for constructing a mathematically valid algorithm. We start with a very simple sampling scheme, keeping $\sigma^2$ fixed at its initial value and using a NUTS sampler for $\boldsymbol{\beta}$. More on sampling $\sigma^2$ can be found in the [Parameter transformations tutorial](01c-transform.md) and the [Gibbs sampling tutorial](01d-gibbs-sampling.md). The NUTS kernel for `beta` was specified above through the variable’s `inference` attribute. The {class}`.LieselMCMC` helper reads these inference specifications from the model and can run the sampler directly with {meth}`~.goose.LieselMCMC.run_for_epochs`. Here we request 1000 adaptation iterations and 1000 posterior draws per chain. ``` python results = gs.LieselMCMC(model).run_for_epochs( seed=1337, num_chains=4, adaptation=1000, posterior=1000 ) ``` liesel.goose.mcmc_spec - WARNING - No inference specification defined for Var(name="sigma_sq"). If you do not add a kernel for this parameter manually to an EngineBuilder, it will not be sampled. liesel.goose.builder - WARNING - No jitter functions provided for position keys 'beta'. The initial values for these keys won't be jittered liesel.goose.engine - INFO - Initializing kernels... liesel.goose.engine - INFO - Done liesel.goose.engine - INFO - Starting epoch: FAST_ADAPTATION, 100 transitions, 25 jitted together 0%| | 0/4 [00:00 Parameter summary:

		kernel	mean	sd	q_0.05	q_0.5	q_0.95	sample_size	ess_bulk	ess_tail	rhat
parameter	index
beta	(0,)	kernel_00	0.984	0.088	0.838	0.985	1.126	4000	1151.201	1385.802	1.002
beta	(1,)	kernel_00	1.906	0.154	1.648	1.907	2.156	4000	1199.216	1432.785	1.003

Acceptance probabilities:

			acceptance_probability	position_moved
kernel	positions	phase
kernel_00	beta	posterior	0.877	NaN
kernel_00	beta	warmup	0.791	NaN

Error summary:

					count	sample_size	sample_size_total	relative
kernel	positions	error_code	error_msg	phase
kernel_00	beta	1	divergent transition	warmup	56	4000	4000	0.014
kernel_00	beta	1	divergent transition	posterior	0	4000	4000	0.000

Here, we end this first tutorial. We have learned how to build a linear regression model, attach a NUTS kernel through an inference specification, and draw MCMC samples - that is quite a bit for the start. Now, have fun modelling with Liesel!