GEV responses#
In this tutorial, we illustrate how to set up a distributional regression model with the generalized extreme value distribution as a response distribution. First, we simulate some data in R:
The location parameter (\(\mu\)) is a function of an intercept and a non-linear covariate effect.
The scale parameter (\(\sigma\)) is a function of an intercept and a linear effect and uses a log-link.
The shape or concentration parameter (\(\xi\)) is a function of an intercept and a linear effect.
After simulating the data, we can configure the model with a single call
to the rliesel::liesel()
function.
library(rliesel)
Please make sure you are using a virtual or conda environment with Liesel installed, e.g. using `reticulate::use_virtualenv()` or `reticulate::use_condaenv()`. See `vignette("versions", "reticulate")`.
After setting the environment, check if the installed versions of RLiesel and Liesel are compatible with `check_liesel_version()`.
library(VGAM)
Loading required package: stats4
Loading required package: splines
set.seed(13)
n <- 1000
x0 <- runif(n)
x1 <- runif(n)
x2 <- runif(n)
y <- rgev(
n,
location = 0 + sin(2 * pi * x0),
scale = exp(-3 + x1),
shape = 0.1 + x2
)
plot(y)
model <- liesel(
response = y,
distribution = "GeneralizedExtremeValue",
predictors = list(
loc = predictor(~ s(x0)),
scale = predictor(~ x1, inverse_link = "Exp"),
concentration = predictor(~ x2)
)
)
Installed Liesel version 0.2.10-dev is compatible, continuing to set up model
Now, we can continue in Python and use the lsl.dist_reg_mcmc()
function to set up a sampling algorithm with IWLS kernels for the
regression coefficients (\(\boldsymbol{\beta}\)) and a Gibbs kernel for
the smoothing parameter (\(\tau^2\)) of the spline.
The support of the GEV distribution changes with the parameter values
(compare
Wikipedia).
To ensure that the initial parameters support the observed data we set
\(xi = 0.1\) and disable jittering of the the variance and regression
parameters. For the latter, we supply user-defined jitter functions to
lsl.dist_reg_mcmc
that are essentially the identity function w.r.t.
the parameter value.
import liesel.model as lsl
import jax.numpy as jnp
model = r.model
# concentration == 0.0 seems to break the sampler
model.vars["concentration_p0_beta"].value = jnp.array([0.1, 0.0])
builder = lsl.dist_reg_mcmc(model, seed=42, num_chains=4, tau2_jitter_fn=lambda key, val: val, beta_jitter_fn=lambda key, val: val)
builder.set_duration(warmup_duration=1000, posterior_duration=1000)
engine = builder.build()
engine.sample_all_epochs()
Some tabular summary statistics of the posterior samples:
import liesel.goose as gs
results = engine.get_results()
gs.Summary(results)
Parameter summary:
kernel | mean | sd | q_0.05 | q_0.5 | q_0.95 | sample_size | ess_bulk | ess_tail | rhat | ||
---|---|---|---|---|---|---|---|---|---|---|---|
parameter | index | ||||||||||
concentration_p0_beta | (0,) | kernel_00 | 0.102 | 0.053 | 0.017 | 0.101 | 0.193 | 4000 | 371.025 | 840.414 | 1.015 |
(1,) | kernel_00 | 0.972 | 0.099 | 0.811 | 0.975 | 1.132 | 4000 | 161.693 | 449.305 | 1.038 | |
loc_np0_beta | (0,) | kernel_03 | 0.486 | 0.214 | 0.140 | 0.492 | 0.844 | 4000 | 81.248 | 246.038 | 1.037 |
(1,) | kernel_03 | -0.168 | 0.106 | -0.343 | -0.168 | 0.003 | 4000 | 125.877 | 351.505 | 1.028 | |
(2,) | kernel_03 | -0.462 | 0.143 | -0.684 | -0.469 | -0.211 | 4000 | 107.938 | 201.522 | 1.019 | |
(3,) | kernel_03 | -0.022 | 0.068 | -0.132 | -0.024 | 0.100 | 4000 | 131.218 | 231.624 | 1.026 | |
(4,) | kernel_03 | 0.477 | 0.062 | 0.374 | 0.478 | 0.575 | 4000 | 118.509 | 253.829 | 1.022 | |
(5,) | kernel_03 | 0.457 | 0.027 | 0.410 | 0.456 | 0.500 | 4000 | 122.648 | 230.152 | 1.016 | |
(6,) | kernel_03 | -5.906 | 0.031 | -5.956 | -5.905 | -5.856 | 4000 | 105.971 | 257.752 | 1.031 | |
(7,) | kernel_03 | -0.384 | 0.062 | -0.482 | -0.384 | -0.284 | 4000 | 149.321 | 292.330 | 1.005 | |
(8,) | kernel_03 | -1.788 | 0.026 | -1.831 | -1.788 | -1.746 | 4000 | 103.978 | 245.266 | 1.027 | |
loc_np0_tau2 | () | kernel_02 | 6.048 | 4.345 | 2.356 | 4.932 | 13.018 | 4000 | 3980.546 | 3781.071 | 1.001 |
loc_p0_beta | (0,) | kernel_04 | -0.027 | 0.003 | -0.031 | -0.027 | -0.022 | 4000 | 21.449 | 188.444 | 1.157 |
scale_p0_beta | (0,) | kernel_01 | -3.100 | 0.061 | -3.200 | -3.100 | -2.997 | 4000 | 42.713 | 245.213 | 1.107 |
(1,) | kernel_01 | 1.201 | 0.080 | 1.070 | 1.201 | 1.335 | 4000 | 158.839 | 438.542 | 1.045 |
Error summary:
count | relative | ||||
---|---|---|---|---|---|
kernel | error_code | error_msg | phase | ||
kernel_00 | 90 | nan acceptance prob | warmup | 88 | 0.022 |
posterior | 1 | 0.000 | |||
kernel_01 | 90 | nan acceptance prob | warmup | 45 | 0.011 |
posterior | 0 | 0.000 | |||
kernel_03 | 90 | nan acceptance prob | warmup | 27 | 0.007 |
posterior | 0 | 0.000 | |||
kernel_04 | 90 | nan acceptance prob | warmup | 22 | 0.005 |
posterior | 0 | 0.000 |
And the corresponding trace plots:
fig = gs.plot_trace(results, "loc_p0_beta")
fig = gs.plot_trace(results, "loc_np0_tau2")
fig = gs.plot_trace(results, "loc_np0_beta")
fig = gs.plot_trace(results, "scale_p0_beta")
fig = gs.plot_trace(results, "concentration_p0_beta")
We need to reset the index of the summary data frame before we can transfer it to R.
summary = gs.Summary(results).to_dataframe().reset_index()
After transferring the summary data frame to R, we can process it with packages like dplyr and ggplot2. Here is a visualization of the estimated spline vs. the true function:
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
library(reticulate)
summary <- py$summary
beta <- summary %>%
filter(variable == "loc_np0_beta") %>%
group_by(var_index) %>%
summarize(mean = mean(mean)) %>%
ungroup()
beta <- beta$mean
X <- py_to_r(model$vars["loc_np0_X"]$value)
estimate <- X %*% beta
true <- sin(2 * pi * x0)
ggplot(data.frame(x0 = x0, estimate = estimate, true = true)) +
geom_line(aes(x0, estimate), color = palette()[2]) +
geom_line(aes(x0, true), color = palette()[4]) +
ggtitle("Estimated spline (red) vs. true function (blue)") +
ylab("f") +
theme_minimal()