k <- 80L
B_aug <- seq(-5, max(x) + 5, by = 1)
B_basis <- bs(
x,
df = k,
degree = 3,
intercept = FALSE,
Boundary.knots = range(B_aug)
)
B <- unclass(B_basis)
A_ps <- Matrix(B, sparse = TRUE)Univariate P-splines in R-INLA and inlabru
P-splines combine a flexible B-spline basis with a roughness penalty on the coefficients. Their main practical objective is to make model specification more robust to the number and locations of the knots used to define the basis. In a Bayesian setting, that penalty is naturally represented by an n-th-order random walk prior on the spline weights. This short blog post shows how to fit a smoothing P-spline in R-INLA.
Useful academic work on P-splines include:
Paul Eilers and Brian Marx’s original article introducing P-splines, Flexible Smoothing with B-splines and Penalties, https://projecteuclid.org/journals/statistical-science/volume-11/issue-2/Flexible-smoothing-with-B-splines-and-penalties/10.1214/ss/1038425655.pdf.
Paul Eilers and Brian Marx’s masterpiece Practical Smoothing: The Joys of P-splines, https://psplines.bitbucket.io/.
Stefan Lang and Andreas Brezger’s Bayesian P-splines paper, https://www.tandfonline.com/doi/abs/10.1198/1061860043010.
Aritz Adin Urtasun’s Ph.D. thesis Hierarchical and spline-based models in space-time disease mapping. I could not find the original link, so I uploaded the file to my personal Google Drive here.
Dae-Jin Lee’s Ph.D. thesis Smoothing mixed models for spatial and spatio-temporal data, https://e-archivo.uc3m.es/entities/publication/72bffb09-ff83-48eb-acba-8aaed8de45f4.
Related blog posts:
Tristan Mahr’s blog post Random effects and penalized splines are the same thing, https://www.tjmahr.com/random-effects-penalized-splines-same-thing/.
Gavin Simpson’s blog post Extrapolating with B splines and GAMs, https://fromthebottomoftheheap.net/2020/06/03/extrapolating-with-gams/.
B-spline basis
For a scalar covariate \(x\), a P-spline smooth approximates the unknown function \(f\) through a finite-dimensional decomposition:
\[ f(x)= \beta_0 + \sum_{j=1}^{K} B_j(x)\alpha_j, \]
where \(B_j(x)\) are spline basis functions defined over the full domain of \(x\). The basis itself is determined by two ingredients: a polynomial degree and a knot sequence. The knots split the covariate range into intervals, and on each interval every \(B_j\) is an ordinary polynomial. In practice one often uses cubic B-splines, that is, splines of degree 3, because they are flexible while remaining smooth enough for most regression problems.
A central property of B-splines is local support. Each \(B_j(x)\) is zero outside a limited portion of the covariate range, so at a fixed location \(x\) only a few basis functions are active. For splines of degree \(d\), at most \(d + 1\) basis functions are nonzero at the same \(x\). This is what makes the basis local: changing one coefficient mainly perturbs the fitted curve near the associated knot region instead of changing the function everywhere.
Another useful property is that the basis functions form a partition of unity, meaning
\[ \sum_{j=1}^{K} B_j(x) = 1. \]
This makes the spline behave like a local weighted average of the coefficients. It also clarifies why one has to think about identifiability: if the basis can already represent constant functions, then the intercept and the spline part are not automatically separated unless one imposes a constraint or centers the smooth.
The coefficients \((\alpha_1,\ldots,\alpha_K)\) control the contribution of each basis function. For observations \(x_1,\ldots,x_n\), the basis enters the model through the design matrix
\[ \mathbf{B} = \begin{pmatrix} B_1(x_1) & \cdots & B_K(x_1) \\ \vdots & \ddots & \vdots \\ B_1(x_n) & \cdots & B_K(x_n) \end{pmatrix}. \]
Row \(i\) shows how the basis combines to represent the smooth at \(x_i\), and column \(j\) shows where coefficient \(\alpha_j\) contributes. Because of local support, this matrix is typically sparse or at least strongly banded in structure, which is one of the reasons B-splines are computationally attractive.
However, the fit quality of B-splines has been shown to be highly sensitive to the number and locations of the knots used to define the basis, leading to poor representation when too few knots are used and to overfitting when too many are used.
P-splines provide a fallback solution to this issue: instead of leaving the parameters \(\alpha\) unconstrained, they penalise their finite differences so that neighbouring coefficients are encouraged to vary smoothly. With the standard second-order penalty, the second differences satisfy
\[ \alpha_j - 2\alpha_{j-1} + \alpha_{j-2} \sim N(0, \tau^{-1}), \qquad j = 3, \ldots, K. \]
Large values of \(\tau\) shrink the second differences more strongly and therefore produce smoother functions, while smaller values allow more local curvature. The role of the P-spline is therefore split into two parts: the B-spline basis provides a rich approximation space, and the RW2 penalty regularises that space so that the fitted curve remains smooth.
Although it is possible to construct P-splines in several ways, I keep the rest of this post close to plain R-INLA so the basis construction and the RW2 penalty remain explicit.
Data import
To keep the discussion concrete, I will work with daily closing values of the CAC 40 index. On Yahoo Finance, the CAC 40 index is available under the ticker ^FCHI.
For the purpose of this blog post, I will use
Basis construction
For illustrative purposes, I will model the closing series y as a smooth function of the trading-day index x. With \(x = 1, \ldots, n\) (n = 1608), the B-spline basis is built as follows. To reduce boundary effects, I extend the covariate range slightly and construct the basis on the augmented grid from -5 to n + 5, then evaluate it back on the observed index.
Here B is the B-spline design matrix evaluated at the observed covariate values. The argument k controls the basis dimension and therefore the richness of the spline representation. The extended boundary knots help stabilise the fitted spline near the endpoints. With the chosen degree, this induces the following internal knot locations:
print(attr(B_basis, "knots")) [1] 21.60256 42.20513 62.80769 83.41026 104.01282 124.61538
[7] 145.21795 165.82051 186.42308 207.02564 227.62821 248.23077
[13] 268.83333 289.43590 310.03846 330.64103 351.24359 371.84615
[19] 392.44872 413.05128 433.65385 454.25641 474.85897 495.46154
[25] 516.06410 536.66667 557.26923 577.87179 598.47436 619.07692
[31] 639.67949 660.28205 680.88462 701.48718 722.08974 742.69231
[37] 763.29487 783.89744 804.50000 825.10256 845.70513 866.30769
[43] 886.91026 907.51282 928.11538 948.71795 969.32051 989.92308
[49] 1010.52564 1031.12821 1051.73077 1072.33333 1092.93590 1113.53846
[55] 1134.14103 1154.74359 1175.34615 1195.94872 1216.55128 1237.15385
[61] 1257.75641 1278.35897 1298.96154 1319.56410 1340.16667 1360.76923
[67] 1381.37179 1401.97436 1422.57692 1443.17949 1463.78205 1484.38462
[73] 1504.98718 1525.58974 1546.19231 1566.79487 1587.39744
The resulting basis functions along the covariate index are shown below.
basis_df <- data.frame(
x = rep(x, times = ncol(B)),
basis = factor(rep(seq_len(ncol(B)), each = nrow(B))),
value = c(B)
)
knot_df <- data.frame(knot = unname(attr(B_basis, "knots")))
ggplot(basis_df, aes(x = x, y = value, group = basis, color = basis)) +
geom_line(linewidth = 0.35, show.legend = FALSE) +
geom_vline(
data = knot_df,
aes(xintercept = knot),
inherit.aes = FALSE,
colour = "grey45",
linetype = 3
) +
scale_color_manual(values = grDevices::hcl.colors(ncol(B), palette = "Dynamic")) +
labs(
x = "Trading-day index",
y = "Basis value",
title = "B-spline basis along x"
) +
theme_minimal()
The same object can also be viewed directly as a matrix. This makes the local support of the basis especially visible: each column is active only over a limited range of rows, which gives the design matrix its banded structure.
B_long <- data.frame(
row = rep(seq_len(nrow(B)), times = ncol(B)),
basis = rep(seq_len(ncol(B)), each = nrow(B)),
value = c(B)
)
ggplot(B_long, aes(x = row, y = basis, fill = value)) +
geom_raster() +
geom_vline(
data = knot_df,
aes(xintercept = knot),
inherit.aes = FALSE,
colour = "grey45",
linetype = 3
) +
scale_fill_gradientn(colors = grDevices::hcl.colors(64, palette = "YlOrRd", rev = TRUE)) +
labs(
x = "Trading-day index",
y = "Basis index",
fill = "Value",
title = "B-spline design matrix B"
) +
theme_minimal()
RW precision structure
Usual choices for P-splines are RW(1) and RW(2) precision structures. They enforce smoothness while keeping the precision matrix sparse, which is one of the main computational advantages of these priors. If \(\psi = (\alpha_1,\ldots,\alpha_K)^\top\) denotes the spline coefficient vector, the first- and second-order difference operators are
\[ D_{RW1}\psi = \begin{pmatrix} \alpha_2 - \alpha_1 \\ \alpha_3 - \alpha_2 \\ \vdots \\ \alpha_K - \alpha_{K-1} \end{pmatrix}, \qquad D_{RW2}\psi = \begin{pmatrix} \alpha_3 - 2\alpha_2 + \alpha_1 \\ \alpha_4 - 2\alpha_3 + \alpha_2 \\ \vdots \\ \alpha_K - 2\alpha_{K-1} + \alpha_{K-2} \end{pmatrix}. \]
The associated intrinsic precision matrices are then
\[ Q_{RW1} = D_{RW1}^\top D_{RW1}, \qquad Q_{RW2} = D_{RW2}^\top D_{RW2}. \]
In R, these objects can be derived directly from sparse finite-difference matrices:
n_basis <- ncol(B)
I_k <- Diagonal(n_basis)
D_rw1 <- diff(I_k, differences = 1)
D_rw2 <- diff(I_k, differences = 2)
Q_rw1 <- crossprod(D_rw1)
Q_rw2 <- crossprod(D_rw2)
print(as.matrix(Q_rw1[1:8, 1:8])) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 -1 0 0 0 0 0 0
[2,] -1 2 -1 0 0 0 0 0
[3,] 0 -1 2 -1 0 0 0 0
[4,] 0 0 -1 2 -1 0 0 0
[5,] 0 0 0 -1 2 -1 0 0
[6,] 0 0 0 0 -1 2 -1 0
[7,] 0 0 0 0 0 -1 2 -1
[8,] 0 0 0 0 0 0 -1 2
print(as.matrix(Q_rw2[1:8, 1:8])) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 -2 1 0 0 0 0 0
[2,] -2 5 -4 1 0 0 0 0
[3,] 1 -4 6 -4 1 0 0 0
[4,] 0 1 -4 6 -4 1 0 0
[5,] 0 0 1 -4 6 -4 1 0
[6,] 0 0 0 1 -4 6 -4 1
[7,] 0 0 0 0 1 -4 6 -4
[8,] 0 0 0 0 0 1 -4 6
The RW1 precision penalizes successive differences in the coefficients, while the RW2 precision penalizes local curvature through second differences. Both matrices are sparse. RW1 has constant vectors in its null space, whereas RW2 has both constant and linear vectors in its null space, which is why additional constraints or an explicit intercept are usually required for identifiability.
R-INLA model component
In matrix form, the spline contribution can be written as \(B\psi\). The previous construction makes explicit the precision matrix associated with the prior on \(\psi\):
\[ \psi\vert \tau \sim \mathcal{N}\left(0,[\tau Q_{RW(k)}]^-\right) \]
Here \([\tau Q_{RW(k)}]^-\) denotes the generalised inverse of \(\tau Q_{RW(k)}\), with \(\tau\) the precision hyperparameter, equal to \(1/\sigma^2\), where \(\sigma\) is the standard deviation. Such custom latent components can be specified in R-INLA using the Generic0 latent component, described here.
Fit with inlabru
The generic0 latent model takes a structure matrix Cmatrix and internally uses the precision form \(Q = \tau C\). Here we simply supply Q_rw2 as the structure matrix. On the inlabru side, bm_matrix(n_basis) tells the component system to interpret the matrix-valued input as a direct matrix multiplication between the supplied basis matrix and the latent state vector. Since CAC 40 closing values are strictly positive, I use a Gamma likelihood instead of a Gaussian one. INLA’s Gamma likelihood uses a default log-link for the mean, so the spline acts on the log-mean scale. For the precision hyperparameter, I use a PC prior of the form \(\Pr(\sigma > u) = \alpha\), implemented in INLA through prior = "pc.prec".
To learn more about PC-priors, see:
- Daniel Simpson, Håvard Rue, Andrea Riebler, Thiago G. Martins, Sigrunn H. Sørbye, Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors, DOI: 10.1214/16-STS576
- Daniel Simpson’s blog post: https://dansblog.netlify.app/#category=PC%20priors
pc_prec <- list(
prior = "pc.prec",
param = c(5, 0.01)
)
cmp_ps_generic0 <- ~
Intercept(1,
prec.linear = 5,
mean.linear = 0) +
pspline(
B,
model = "generic0",
mapper = bm_matrix(n_basis),
Cmatrix = Q_rw2,
rankdef = 2,
constr = TRUE,
diagonal = 1e-06,
hyper = list(prec = pc_prec)
)
lik_ps_generic0 <- bru_obs(
y ~ Intercept + pspline,
family = "gamma",
data = data.frame(y = y)
)
fit_ps_generic0_bru <- bru(
cmp_ps_generic0,
lik_ps_generic0
)The arguments rankdef = 2 and constr = TRUE reflect the intrinsic rank deficiency of the RW2 precision matrix. The small diagonal inflation is a standard numerical stabilisation for intrinsic models represented through generic0. The PC prior above is interpreted as \(\Pr(\sigma_{\text{pspline}} > 5) = 0.01\).
Posterior predictive check
Let us perform a simple posterior predictive check by comparing the observed series with the posterior fitted band on the response scale.
obs_idx <- seq_len(length(y))
fitted_obs <- fit_ps_generic0_bru$summary.fitted.values[obs_idx, , drop = FALSE]
ppc_df <- data.frame(
x = x,
y = y,
fitted_mean = fitted_obs$mean,
fitted_q025 = fitted_obs$`0.025quant`,
fitted_q975 = fitted_obs$`0.975quant`
)
ggplot(ppc_df, aes(x = x)) +
geom_ribbon(
aes(
ymin = fitted_q025,
ymax = fitted_q975,
fill = "95% credible band"
),
alpha = 0.35
) +
geom_line(
aes(y = fitted_mean, colour = "Posterior mean", linetype = "Posterior mean"),
linewidth = 0.7
) +
geom_line(
aes(y = fitted_q025, colour = "Lower 95% bound", linetype = "Lower 95% bound"),
linewidth = 0.5
) +
geom_line(
aes(y = fitted_q975, colour = "Upper 95% bound", linetype = "Upper 95% bound"),
linewidth = 0.5
) +
geom_line(
aes(y = y, colour = "Observed", linetype = "Observed"),
linewidth = 0.35,
alpha = 0.75
) +
scale_colour_manual(
name = NULL,
values = c(
"Observed" = "black",
"Posterior mean" = "#08519c",
"Lower 95% bound" = "#3182bd",
"Upper 95% bound" = "#3182bd"
),
breaks = c("Observed", "Posterior mean", "Lower 95% bound", "Upper 95% bound")
) +
scale_linetype_manual(
name = NULL,
values = c(
"Observed" = "solid",
"Posterior mean" = "solid",
"Lower 95% bound" = "dashed",
"Upper 95% bound" = "dotdash"
),
breaks = c("Observed", "Posterior mean", "Lower 95% bound", "Upper 95% bound")
) +
scale_fill_manual(
name = NULL,
values = c("95% credible band" = "#9ecae1")
) +
guides(
colour = guide_legend(order = 1),
linetype = guide_legend(order = 1),
fill = guide_legend(
order = 2,
override.aes = list(alpha = 0.35)
)
) +
labs(
x = "Trading-day index",
y = "CAC 40 close",
title = "Observed values and posterior fitted band"
) +
theme_minimal() +
theme(
legend.position = "bottom",
legend.box = "vertical"
)
Other competing possibilities within R-INLA: RW2, PRW2, and SPDE
R-INLA offers a wide range of latent effects related to the P-spline construction presented above for temporal modelling. Below I briefly highlight three possibilities:
- RW2 latent effect, which differs from the previous P-spline construction in that there is no spline basis (see https://inla.r-inla-download.org/r-inla.org/doc/latent/rw2.pdf);
- PRW2, which is the non-intrinsic version of the RW2 latent effect (see https://inla.r-inla-download.org/r-inla.org/doc/latent/prw2.pdf);
- Gaussian process implemented using the finite element representation presented in Finn Lindgren, Håvard Rue, Johan Lindström 2011 article An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, https://doi.org/10.1111/j.1467-9868.2011.00777.x
Other latent effects, such as AR(1), AR(p), or Fractional Gaussian Noise are also available, see https://inla.r-inla-download.org/r-inla.org/doc/latent/.
#Case 1: direct RW2 latent effect
cmp_ps_rw2 <- ~
Intercept(1,
prec.linear = 5,
mean.linear = 0) +
rw2(
x,
model = "rw2",
rankdef = 2,
constr = TRUE,
diagonal = 1e-06,
hyper = list(prec = pc_prec)
)
lik_rw2 <- bru_obs(
y ~ Intercept + rw2,
family = "gamma",
data = data.frame(y = y, x = x)
)
fit_ps_rw2_bru <- bru(
cmp_ps_rw2,
lik_rw2
)
#Case 2: PRW2 latent effect
pc_prw2_range <- list(
prior = "pc.prw2.range",
param = c(25, 0.01, 1, 0.01)
)
cmp_ps_prw2 <- ~
Intercept(1,
prec.linear = 5,
mean.linear = 0) +
prw2(
x,
model = "prw2",
diagonal = 1e-06,
hyper = list(
prec = pc_prec,
range = pc_prw2_range
)
)
lik_prw2 <- bru_obs(
y ~ Intercept + prw2,
family = "gamma",
data = data.frame(y = y, x = x)
)
fit_ps_prw2_bru <- bru(
cmp_ps_prw2,
lik_prw2
)
#Case 3: Gaussian process
mesh1d <- fmesher::fm_mesh_1d(
seq(min(x) - 5, max(x) + 5, length.out = 80),
boundary = "free"
)
matern_1d <- INLA::inla.spde2.pcmatern(
mesh1d,
prior.range = c(25, 0.01),
prior.sigma = c(5, 0.01),
constr = TRUE
)
cmp_ps_gp <- ~
Intercept(1,
prec.linear = 5,
mean.linear = 0) +
gp(
x,
model = matern_1d
)
lik_gp <- bru_obs(
y ~ Intercept + gp,
family = "gamma",
data = data.frame(y = y, x = x),
domain = list(x = mesh1d)
)
fit_ps_gp_bru <- bru(
cmp_ps_gp,
lik_gp
)The posterior mean fitted values from the four approaches can be compared directly on the response scale.
obs_idx <- seq_len(length(y))
extract_obs_mean <- function(fit) {
fit$summary.fitted.values[obs_idx, "mean"]
}
fit_compare_df <- data.frame(
x = x,
Observed = y,
Generic0 = extract_obs_mean(fit_ps_generic0_bru),
RW2 = extract_obs_mean(fit_ps_rw2_bru),
PRW2 = extract_obs_mean(fit_ps_prw2_bru),
GP = extract_obs_mean(fit_ps_gp_bru)
)
fit_compare_long <- rbind(
data.frame(x = x, series = "Observed", value = fit_compare_df$Observed),
data.frame(x = x, series = "Generic0", value = fit_compare_df$Generic0),
data.frame(x = x, series = "RW2", value = fit_compare_df$RW2),
data.frame(x = x, series = "PRW2", value = fit_compare_df$PRW2),
data.frame(x = x, series = "Gaussian process", value = fit_compare_df$GP)
)
ggplot(fit_compare_long, aes(x = x, y = value, colour = series, linetype = series, alpha = series)) +
geom_line(linewidth = 1) +
scale_colour_manual(
values = c(
"Observed" = "black",
"Generic0" = "#08519c",
"RW2" = "#2ca25f",
"PRW2" = "#cb181d",
"Gaussian process" = "#756bb1"
)
) +
scale_linetype_manual(
values = c(
"Observed" = "solid",
"Generic0" = "solid",
"RW2" = "longdash",
"PRW2" = "dotdash",
"Gaussian process" = "twodash"
)
) +
scale_alpha_manual(
values = c(
"Observed" = 0.7,
"Generic0" = 1,
"RW2" = 1,
"PRW2" = 1,
"Gaussian process" = 1
)
) +
labs(
x = "Trading-day index",
y = "CAC 40 close",
colour = NULL,
linetype = NULL,
title = "Posterior mean comparison across four latent-effect choices"
) +
theme_minimal() +
theme(
legend.position = "bottom",
legend.box = "vertical"
) +
guides(alpha = "none")
The negligible differences can be explained primarily by how each effect approximates the continuous target function. The usual P-spline uses 80 basis functions to define the approximation, much like the SPDE approximation of the Gaussian process, which relies on a mesh with a comparable resolution. By contrast, the RW2 and PRW2 effects use one latent value for each trading-day index.