GEV responses#

In this tutorial, we illustrate how to set up a distributional regression model with the generalized extreme value distribution as a response distribution. First, we simulate some data in R:

  • The location parameter (\(\mu\)) is a function of an intercept and a non-linear covariate effect.

  • The scale parameter (\(\sigma\)) is a function of an intercept and a linear effect and uses a log-link.

  • The shape or concentration parameter (\(\xi\)) is a function of an intercept and a linear effect.

After simulating the data, we can configure the model with a single call to the rliesel::liesel() function.

library(rliesel)
Please make sure you are using a virtual or conda environment with Liesel installed, e.g. using `reticulate::use_virtualenv()` or `reticulate::use_condaenv()`. See `vignette("versions", "reticulate")`.

After setting the environment, check if the installed versions of RLiesel and Liesel are compatible with `check_liesel_version()`.
library(VGAM)
Loading required package: stats4

Loading required package: splines
set.seed(1337)

n <- 1000

x0 <- runif(n)
x1 <- runif(n)
x2 <- runif(n)

y <- rgev(
  n,
  location = 0 + sin(2 * pi * x0),
  scale = exp(-3 + x1),
  shape = 0.1 + x2
)

plot(y)

model <- liesel(
  response = y,
  distribution = "GeneralizedExtremeValue",
  predictors = list(
    loc = predictor(~ s(x0)),
    scale = predictor(~ x1, inverse_link = "Exp"),
    concentration = predictor(~ x2)
  )
)
Installed Liesel version 0.2.4 is compatible.

Now, we can continue in Python and use the lsl.dist_reg_mcmc() function to set up a sampling algorithm with IWLS kernels for the regression coefficients (\(\boldsymbol{\beta}\)) and a Gibbs kernel for the smoothing parameter (\(\tau^2\)) of the spline. Note that we need to set \(\beta_0\) for \(\xi\) to 0.1 manually, because \(\xi = 0\) breaks the sampler.

import liesel.model as lsl
import jax.numpy as jnp

model = r.model

# concentration == 0.0 seems to break the sampler
model.vars["concentration_p0_beta"].value = jnp.array([0.1, 0.0])

builder = lsl.dist_reg_mcmc(model, seed=42, num_chains=4)
builder.set_duration(warmup_duration=1000, posterior_duration=1000)

engine = builder.build()
liesel.goose.engine - INFO - Initializing kernels...
liesel.goose.engine - INFO - Done
engine.sample_all_epochs()
liesel.goose.engine - INFO - Starting epoch: FAST_ADAPTATION, 75 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 7, 12, 10, 4 / 75 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_03: 1, 0, 0, 0 / 75 transitions
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Starting epoch: SLOW_ADAPTATION, 25 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 2, 1, 1, 3 / 25 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_01: 3, 1, 1, 0 / 25 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_03: 0, 1, 1, 2 / 25 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_04: 2, 1, 0, 1 / 25 transitions
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Starting epoch: SLOW_ADAPTATION, 50 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 2, 1, 3, 1 / 50 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_01: 1, 1, 1, 2 / 50 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_03: 2, 1, 1, 2 / 50 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_04: 0, 0, 1, 1 / 50 transitions
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Starting epoch: SLOW_ADAPTATION, 100 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 1, 2, 1, 3 / 100 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_01: 1, 1, 1, 2 / 100 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_03: 1, 0, 1, 2 / 100 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_04: 1, 1, 0, 1 / 100 transitions
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Starting epoch: SLOW_ADAPTATION, 200 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 1, 1, 1, 1 / 200 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_01: 1, 1, 1, 3 / 200 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_03: 1, 1, 1, 2 / 200 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_04: 3, 0, 1, 1 / 200 transitions
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Starting epoch: SLOW_ADAPTATION, 500 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 4, 3, 4, 1 / 500 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_01: 1, 2, 1, 1 / 500 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_03: 1, 1, 1, 2 / 500 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_04: 1, 1, 1, 1 / 500 transitions
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Starting epoch: FAST_ADAPTATION, 50 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 0, 1, 1, 1 / 50 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_01: 1, 2, 1, 0 / 50 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_03: 1, 1, 0, 1 / 50 transitions
liesel.goose.engine - WARNING - Errors per chain for kernel_04: 1, 0, 0, 1 / 50 transitions
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Finished warmup
liesel.goose.engine - INFO - Starting epoch: POSTERIOR, 1000 transitions, 25 jitted together
liesel.goose.engine - WARNING - Errors per chain for kernel_00: 1, 0, 0, 4 / 1000 transitions
liesel.goose.engine - INFO - Finished epoch

Some tabular summary statistics of the posterior samples:

import liesel.goose as gs

results = engine.get_results()
gs.Summary(results)

Parameter summary:

kernel mean sd q_0.05 q_0.5 q_0.95 sample_size ess_bulk ess_tail rhat
parameter index
concentration_p0_beta (0,) kernel_00 0.068 0.051 -0.015 0.068 0.153 4000 274.887 696.928 1.012
(1,) kernel_00 1.070 0.100 0.909 1.071 1.233 4000 136.495 550.203 1.020
loc_np0_beta (0,) kernel_03 0.598 0.226 0.228 0.596 0.972 4000 45.636 86.072 1.078
(1,) kernel_03 0.322 0.124 0.115 0.324 0.521 4000 96.395 249.120 1.052
(2,) kernel_03 -0.365 0.128 -0.614 -0.352 -0.173 4000 46.097 69.378 1.082
(3,) kernel_03 0.362 0.065 0.252 0.363 0.471 4000 62.370 152.229 1.053
(4,) kernel_03 -0.241 0.085 -0.381 -0.243 -0.099 4000 54.831 159.130 1.097
(5,) kernel_03 0.170 0.031 0.119 0.168 0.221 4000 35.365 145.434 1.119
(6,) kernel_03 6.028 0.038 5.970 6.026 6.091 4000 72.399 260.400 1.032
(7,) kernel_03 0.547 0.071 0.431 0.549 0.663 4000 34.101 190.253 1.124
(8,) kernel_03 1.703 0.030 1.656 1.702 1.751 4000 69.504 213.189 1.034
loc_np0_tau2 () kernel_02 6.286 5.872 2.419 5.088 13.443 4000 3623.338 3754.696 1.000
loc_p0_beta (0,) kernel_04 0.027 0.003 0.023 0.027 0.032 4000 76.520 49.237 1.059
scale_p0_beta (0,) kernel_01 -3.063 0.060 -3.159 -3.066 -2.962 4000 84.772 150.819 1.045
(1,) kernel_01 1.041 0.076 0.915 1.043 1.165 4000 155.422 409.469 1.014

Error summary:

count relative
kernel error_code error_msg phase
kernel_00 90 nan acceptance prob warmup 73 0.018
posterior 5 0.001
kernel_01 90 nan acceptance prob warmup 30 0.008
posterior 0 0.000
kernel_03 90 nan acceptance prob warmup 28 0.007
posterior 0 0.000
kernel_04 90 nan acceptance prob warmup 20 0.005
posterior 0 0.000

And the corresponding trace plots:

fig = gs.plot_trace(results, "loc_p0_beta")
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

fig = gs.plot_trace(results, "loc_np0_tau2")
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

fig = gs.plot_trace(results, "loc_np0_beta")
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

fig = gs.plot_trace(results, "scale_p0_beta")
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

fig = gs.plot_trace(results, "concentration_p0_beta")
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)
/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/seaborn/axisgrid.py:118: UserWarning: The figure layout has changed to tight
  self._figure.tight_layout(*args, **kwargs)

We need to reset the index of the summary data frame before we can transfer it to R.

summary = gs.Summary(results).to_dataframe().reset_index()

After transferring the summary data frame to R, we can process it with packages like dplyr and ggplot2. Here is a visualization of the estimated spline vs. the true function:

library(dplyr)
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
library(reticulate)

summary <- py$summary

beta <- summary %>%
  filter(variable == "loc_np0_beta") %>%
  group_by(var_index) %>%
  summarize(mean = mean(mean)) %>%
  ungroup()

beta <- beta$mean
X <- py_to_r(model$vars["loc_np0_X"]$value)
estimate <- X %*% beta

true <- sin(2 * pi * x0)

ggplot(data.frame(x0 = x0, estimate = estimate, true = true)) +
  geom_line(aes(x0, estimate), color = palette()[2]) +
  geom_line(aes(x0, true), color = palette()[4]) +
  ggtitle("Estimated spline (red) vs. true function (blue)") +
  ylab("f") +
  theme_minimal()