Univariate P-splines in R-INLA and inlabru

Methods

Author

Olivier Supplisson

Published

April 12, 2026

P-splines combine a flexible B-spline basis with a roughness penalty on the coefficients. Their main practical objective is to make model specification more robust to the number and locations of the knots used to define the basis. In a Bayesian setting, that penalty is naturally represented by an n-th-order random walk prior on the spline weights. This short blog post shows how to fit a smoothing P-spline in R-INLA.

Useful academic work on P-splines include:

Paul Eilers and Brian Marx’s original article introducing P-splines, Flexible Smoothing with B-splines and Penalties, https://projecteuclid.org/journals/statistical-science/volume-11/issue-2/Flexible-smoothing-with-B-splines-and-penalties/10.1214/ss/1038425655.pdf.
Paul Eilers and Brian Marx’s masterpiece Practical Smoothing: The Joys of P-splines, https://psplines.bitbucket.io/.
Stefan Lang and Andreas Brezger’s Bayesian P-splines paper, https://www.tandfonline.com/doi/abs/10.1198/1061860043010.
Aritz Adin Urtasun’s Ph.D. thesis Hierarchical and spline-based models in space-time disease mapping. I could not find the original link, so I uploaded the file to my personal Google Drive here.
Dae-Jin Lee’s Ph.D. thesis Smoothing mixed models for spatial and spatio-temporal data, https://e-archivo.uc3m.es/entities/publication/72bffb09-ff83-48eb-acba-8aaed8de45f4.

B-spline basis

For a scalar covariate \(x\), a P-spline smooth approximates the unknown function \(f\) through a finite-dimensional decomposition:

\[ f(x)= \beta_0 + \sum_{j=1}^{K} B_j(x)\alpha_j, \]

where \(B_j(x)\) are spline basis functions defined over the full domain of \(x\). The basis itself is determined by two ingredients: a polynomial degree and a knot sequence. The knots split the covariate range into intervals, and on each interval every \(B_j\) is an ordinary polynomial. In practice one often uses cubic B-splines, that is, splines of degree 3, because they are flexible while remaining smooth enough for most regression problems.

A central property of B-splines is local support. Each \(B_j(x)\) is zero outside a limited portion of the covariate range, so at a fixed location \(x\) only a few basis functions are active. For splines of degree \(d\), at most \(d + 1\) basis functions are nonzero at the same \(x\). This is what makes the basis local: changing one coefficient mainly perturbs the fitted curve near the associated knot region instead of changing the function everywhere.

Another useful property is that the basis functions form a partition of unity, meaning

\[ \sum_{j=1}^{K} B_j(x) = 1. \]

This makes the spline behave like a local weighted average of the coefficients. It also clarifies why one has to think about identifiability: if the basis can already represent constant functions, then the intercept and the spline part are not automatically separated unless one imposes a constraint or centers the smooth.

The coefficients \((\alpha_1,\ldots,\alpha_K)\) control the contribution of each basis function. For observations \(x_1,\ldots,x_n\), the basis enters the model through the design matrix

\[ \mathbf{B} = \begin{pmatrix} B_1(x_1) & \cdots & B_K(x_1) \\ \vdots & \ddots & \vdots \\ B_1(x_n) & \cdots & B_K(x_n) \end{pmatrix}. \]

Row \(i\) shows how the basis combines to represent the smooth at \(x_i\), and column \(j\) shows where coefficient \(\alpha_j\) contributes. Because of local support, this matrix is typically sparse or at least strongly banded in structure, which is one of the reasons B-splines are computationally attractive.

However, the fit quality of B-splines has been shown to be highly sensitive to the number and locations of the knots used to define the basis, leading to poor representation when too few knots are used and to overfitting when too many are used.

P-splines provide a fallback solution to this issue: instead of leaving the parameters \(\alpha\) unconstrained, they penalise their finite differences so that neighbouring coefficients are encouraged to vary smoothly. With the standard second-order penalty, the second differences satisfy

\[ \alpha_j - 2\alpha_{j-1} + \alpha_{j-2} \sim N(0, \tau^{-1}), \qquad j = 3, \ldots, K. \]

Large values of \(\tau\) shrink the second differences more strongly and therefore produce smoother functions, while smaller values allow more local curvature. The role of the P-spline is therefore split into two parts: the B-spline basis provides a rich approximation space, and the RW2 penalty regularises that space so that the fitted curve remains smooth.

Although it is possible to construct P-splines in several ways, I keep the rest of this post close to plain R-INLA so the basis construction and the RW2 penalty remain explicit.

Data import

To keep the discussion concrete, I will work with daily closing values of the CAC 40 index. On Yahoo Finance, the CAC 40 index is available under the ticker ^FCHI.

For the purpose of this blog post, I will use

Basis construction

For illustrative purposes, I will model the closing series y as a smooth function of the trading-day index x. With \(x = 1, \ldots, n\) (n = 1615), the B-spline basis is built as follows. To reduce boundary effects, I extend the covariate range slightly and construct the basis on the augmented grid from -5 to n + 5, then evaluate it back on the observed index.

k <- 80L
B_aug <- seq(-5, max(x) + 5, by = 1)
B_basis <- bs(
  x,
  df = k,
  degree = 3,
  intercept = FALSE,
  Boundary.knots = range(B_aug)
)
B <- unclass(B_basis)
A_ps <- Matrix(B, sparse = TRUE)

Here B is the B-spline design matrix evaluated at the observed covariate values. The argument k controls the basis dimension and therefore the richness of the spline representation. The extended boundary knots help stabilise the fitted spline near the endpoints. With the chosen degree, this induces the following internal knot locations:

print(attr(B_basis, "knots"))

 [1]   21.69231   42.38462   63.07692   83.76923  104.46154  125.15385
 [7]  145.84615  166.53846  187.23077  207.92308  228.61538  249.30769
[13]  270.00000  290.69231  311.38462  332.07692  352.76923  373.46154
[19]  394.15385  414.84615  435.53846  456.23077  476.92308  497.61538
[25]  518.30769  539.00000  559.69231  580.38462  601.07692  621.76923
[31]  642.46154  663.15385  683.84615  704.53846  725.23077  745.92308
[37]  766.61538  787.30769  808.00000  828.69231  849.38462  870.07692
[43]  890.76923  911.46154  932.15385  952.84615  973.53846  994.23077
[49] 1014.92308 1035.61538 1056.30769 1077.00000 1097.69231 1118.38462
[55] 1139.07692 1159.76923 1180.46154 1201.15385 1221.84615 1242.53846
[61] 1263.23077 1283.92308 1304.61538 1325.30769 1346.00000 1366.69231
[67] 1387.38462 1408.07692 1428.76923 1449.46154 1470.15385 1490.84615
[73] 1511.53846 1532.23077 1552.92308 1573.61538 1594.30769

The resulting basis functions along the covariate index are shown below.

basis_df <- data.frame(
  x = rep(x, times = ncol(B)),
  basis = factor(rep(seq_len(ncol(B)), each = nrow(B))),
  value = c(B)
)

knot_df <- data.frame(knot = unname(attr(B_basis, "knots")))

p_basis <- ggplot(basis_df, aes(x = x, y = value, group = basis, color = basis)) +
  geom_line(linewidth = 0.35, show.legend = FALSE) +
  geom_vline(
    data = knot_df,
    aes(xintercept = knot),
    inherit.aes = FALSE,
    colour = "grey45",
    linetype = 3
  ) +
  scale_color_manual(values = grDevices::hcl.colors(ncol(B), palette = "Dynamic")) +
  labs(
    x = "Trading-day index",
    y = "Basis value",
    title = "B-spline basis along x"
  ) +
  theme_minimal()

interactive_plot(p_basis, height = 700)

The same object can also be viewed directly as a matrix. This makes the local support of the basis especially visible: each column is active only over a limited range of rows, which gives the design matrix its banded structure.

B_long <- data.frame(
  row = rep(seq_len(nrow(B)), times = ncol(B)),
  basis = rep(seq_len(ncol(B)), each = nrow(B)),
  value = c(B)
)

p_matrix <- ggplot(B_long, aes(x = row, y = basis, fill = value)) +
  geom_raster() +
  geom_vline(
    data = knot_df,
    aes(xintercept = knot),
    inherit.aes = FALSE,
    colour = "grey45",
    linetype = 3
  ) +
  scale_fill_gradientn(colors = grDevices::hcl.colors(64, palette = "YlOrRd", rev = TRUE)) +
  labs(
    x = "Trading-day index",
    y = "Basis index",
    fill = "Value",
    title = "B-spline design matrix B"
  ) +
  theme_minimal()

interactive_plot(p_matrix, height = 760)

RW precision structure

Usual choices for P-splines are RW(1) and RW(2) precision structures. They enforce smoothness while keeping the precision matrix sparse, which is one of the main computational advantages of these priors. If \(\psi = (\alpha_1,\ldots,\alpha_K)^\top\) denotes the spline coefficient vector, the first- and second-order difference operators are

\[ D_{RW1}\psi = \begin{pmatrix} \alpha_2 - \alpha_1 \\ \alpha_3 - \alpha_2 \\ \vdots \\ \alpha_K - \alpha_{K-1} \end{pmatrix}, \qquad D_{RW2}\psi = \begin{pmatrix} \alpha_3 - 2\alpha_2 + \alpha_1 \\ \alpha_4 - 2\alpha_3 + \alpha_2 \\ \vdots \\ \alpha_K - 2\alpha_{K-1} + \alpha_{K-2} \end{pmatrix}. \]

The associated intrinsic precision matrices are then

\[ Q_{RW1} = D_{RW1}^\top D_{RW1}, \qquad Q_{RW2} = D_{RW2}^\top D_{RW2}. \]

In R, these objects can be derived directly from sparse finite-difference matrices:

n_basis <- ncol(B)
I_k <- Diagonal(n_basis)

D_rw1 <- diff(I_k, differences = 1)
D_rw2 <- diff(I_k, differences = 2)

Q_rw1 <- crossprod(D_rw1)
Q_rw2 <- crossprod(D_rw2)

print(as.matrix(Q_rw1[1:8, 1:8]))

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1   -1    0    0    0    0    0    0
[2,]   -1    2   -1    0    0    0    0    0
[3,]    0   -1    2   -1    0    0    0    0
[4,]    0    0   -1    2   -1    0    0    0
[5,]    0    0    0   -1    2   -1    0    0
[6,]    0    0    0    0   -1    2   -1    0
[7,]    0    0    0    0    0   -1    2   -1
[8,]    0    0    0    0    0    0   -1    2

print(as.matrix(Q_rw2[1:8, 1:8]))

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1   -2    1    0    0    0    0    0
[2,]   -2    5   -4    1    0    0    0    0
[3,]    1   -4    6   -4    1    0    0    0
[4,]    0    1   -4    6   -4    1    0    0
[5,]    0    0    1   -4    6   -4    1    0
[6,]    0    0    0    1   -4    6   -4    1
[7,]    0    0    0    0    1   -4    6   -4
[8,]    0    0    0    0    0    1   -4    6

The RW1 precision penalizes successive differences in the coefficients, while the RW2 precision penalizes local curvature through second differences. Both matrices are sparse. RW1 has constant vectors in its null space, whereas RW2 has both constant and linear vectors in its null space, which is why additional constraints or an explicit intercept are usually required for identifiability.

R-INLA model component

In matrix form, the spline contribution can be written as \(B\psi\). The previous construction makes explicit the precision matrix associated with the prior on \(\psi\):

\[ \psi\vert \tau \sim \mathcal{N}\left(0,[\tau Q_{RW(k)}]^-\right) \]

Here \([\tau Q_{RW(k)}]^-\) denotes the generalised inverse of \(\tau Q_{RW(k)}\), with \(\tau\) the precision hyperparameter, equal to \(1/\sigma^2\), where \(\sigma\) is the standard deviation. Such custom latent components can be specified in R-INLA using the Generic0 latent component, described here.

Fit with inlabru

The generic0 latent model takes a structure matrix Cmatrix and internally uses the precision form \(Q = \tau C\). Here we simply supply Q_rw2 as the structure matrix. On the inlabru side, bm_matrix(n_basis) tells the component system to interpret the matrix-valued input as a direct matrix multiplication between the supplied basis matrix and the latent state vector. Since CAC 40 closing values are strictly positive, I use a Gamma likelihood instead of a Gaussian one. INLA’s Gamma likelihood uses a default log-link for the mean, so the spline acts on the log-mean scale. For the precision hyperparameter, I use a PC prior of the form \(\Pr(\sigma > u) = \alpha\), implemented in INLA through prior = "pc.prec".

To learn more about PC-priors, see:

Daniel Simpson, Håvard Rue, Andrea Riebler, Thiago G. Martins, Sigrunn H. Sørbye, Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors, DOI: 10.1214/16-STS576
Daniel Simpson’s blog post: https://dansblog.netlify.app/#category=PC%20priors

pc_prec <- list(
  prior = "pc.prec",
  param = c(5, 0.01)
)

cmp_ps_generic0 <- ~
  Intercept(1,
            prec.linear = 5,
            mean.linear = 0) +
  pspline(
    B,
    model = "generic0",
    mapper = bm_matrix(n_basis),
    Cmatrix = Q_rw2,
    rankdef = 2,
    constr = TRUE,
    diagonal = 1e-06,
    hyper = list(prec = pc_prec)
  )

lik_ps_generic0 <- bru_obs(
  y ~ Intercept + pspline,
  family = "gamma",
  data = data.frame(y = y)
)

fit_ps_generic0_bru <- bru(
  cmp_ps_generic0,
  lik_ps_generic0
)

The arguments rankdef = 2 and constr = TRUE reflect the intrinsic rank deficiency of the RW2 precision matrix. The small diagonal inflation is a standard numerical stabilisation for intrinsic models represented through generic0. The PC prior above is interpreted as \(\Pr(\sigma_{\text{pspline}} > 5) = 0.01\).

Posterior predictive check

Let us perform a simple posterior predictive check by comparing the observed series with the posterior fitted band on the response scale.

obs_idx <- seq_len(length(y))
fitted_obs <- fit_ps_generic0_bru$summary.fitted.values[obs_idx, , drop = FALSE]

ppc_df <- data.frame(
  x = x,
  y = y,
  fitted_mean = fitted_obs$mean,
  fitted_q025 = fitted_obs$`0.025quant`,
  fitted_q975 = fitted_obs$`0.975quant`
)

p_ppc <- ggplot(ppc_df, aes(x = x)) +
  geom_ribbon(
    aes(
      ymin = fitted_q025,
      ymax = fitted_q975,
      fill = "95% credible band"
    ),
    alpha = 0.35
  ) +
  geom_line(
    aes(y = fitted_mean, colour = "Posterior mean", linetype = "Posterior mean"),
    linewidth = 0.7
  ) +
  geom_line(
    aes(y = fitted_q025, colour = "Lower 95% bound", linetype = "Lower 95% bound"),
    linewidth = 0.5
  ) +
  geom_line(
    aes(y = fitted_q975, colour = "Upper 95% bound", linetype = "Upper 95% bound"),
    linewidth = 0.5
  ) +
  geom_line(
    aes(y = y, colour = "Observed", linetype = "Observed"),
    linewidth = 0.35,
    alpha = 0.75
  ) +
  scale_colour_manual(
    name = NULL,
    values = c(
      "Observed" = "black",
      "Posterior mean" = "#08519c",
      "Lower 95% bound" = "#3182bd",
      "Upper 95% bound" = "#3182bd"
    ),
    breaks = c("Observed", "Posterior mean", "Lower 95% bound", "Upper 95% bound")
  ) +
  scale_linetype_manual(
    name = NULL,
    values = c(
      "Observed" = "solid",
      "Posterior mean" = "solid",
      "Lower 95% bound" = "dashed",
      "Upper 95% bound" = "dotdash"
    ),
    breaks = c("Observed", "Posterior mean", "Lower 95% bound", "Upper 95% bound")
  ) +
  scale_fill_manual(
    name = NULL,
    values = c("95% credible band" = "#9ecae1")
  ) +
  guides(
    colour = guide_legend(order = 1),
    linetype = guide_legend(order = 1),
    fill = guide_legend(
      order = 2,
      override.aes = list(alpha = 0.35)
    )
  ) +
  labs(
    x = "Trading-day index",
    y = "CAC 40 close",
    title = "Observed values and posterior fitted band"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.box = "vertical"
  )

interactive_plot(p_ppc, height = 760)

Other competing possibilities within R-INLA: RW2, PRW2, and SPDE

R-INLA offers a wide range of latent effects related to the P-spline construction presented above for temporal modelling. Below I briefly highlight three possibilities:

RW2 latent effect, which differs from the previous P-spline construction in that there is no spline basis (see https://inla.r-inla-download.org/r-inla.org/doc/latent/rw2.pdf);
PRW2, which is the non-intrinsic version of the RW2 latent effect (see https://inla.r-inla-download.org/r-inla.org/doc/latent/prw2.pdf);
Gaussian process implemented using the finite element representation presented in Finn Lindgren, Håvard Rue, Johan Lindström 2011 article An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach, https://doi.org/10.1111/j.1467-9868.2011.00777.x

Other latent effects, such as AR(1), AR(p), or Fractional Gaussian Noise are also available, see https://inla.r-inla-download.org/r-inla.org/doc/latent/.

#Case 1: direct RW2 latent effect
cmp_ps_rw2 <- ~
  Intercept(1,
            prec.linear = 5,
            mean.linear = 0) +
  rw2(
    x,
    model = "rw2",
    rankdef = 2,
    constr = TRUE,
    diagonal = 1e-06,
    hyper = list(prec = pc_prec)
  )
lik_rw2 <- bru_obs(
  y ~ Intercept + rw2,
  family = "gamma",
  data = data.frame(y = y, x = x)
)
fit_ps_rw2_bru <- bru(
  cmp_ps_rw2,
  lik_rw2
)
#Case 2: PRW2 latent effect
pc_prw2_range <- list(
  prior = "pc.prw2.range",
  param = c(25, 0.01, 1, 0.01)
)
cmp_ps_prw2 <- ~
  Intercept(1,
            prec.linear = 5,
            mean.linear = 0) +
  prw2(
    x,
    model = "prw2",
    diagonal = 1e-06,
    hyper = list(
      prec = pc_prec,
      range = pc_prw2_range
    )
  )
lik_prw2 <- bru_obs(
  y ~ Intercept + prw2,
  family = "gamma",
  data = data.frame(y = y, x = x)
)
fit_ps_prw2_bru <- bru(
  cmp_ps_prw2,
  lik_prw2
)
#Case 3: Gaussian process
mesh1d <- fmesher::fm_mesh_1d(
  seq(min(x) - 5, max(x) + 5, length.out = 80),
  boundary = "free"
)
matern_1d <- INLA::inla.spde2.pcmatern(
  mesh1d,
  prior.range = c(25, 0.01),
  prior.sigma = c(5, 0.01),
  constr = TRUE
)
cmp_ps_gp <- ~
  Intercept(1,
            prec.linear = 5,
            mean.linear = 0) +
  gp(
    x,
    model = matern_1d
  )
lik_gp <- bru_obs(
  y ~ Intercept + gp,
  family = "gamma",
  data = data.frame(y = y, x = x),
  domain = list(x = mesh1d)
)
fit_ps_gp_bru <- bru(
  cmp_ps_gp,
  lik_gp
)

The posterior mean fitted values from the four approaches can be compared directly on the response scale.

obs_idx <- seq_len(length(y))

extract_obs_mean <- function(fit) {
  fit$summary.fitted.values[obs_idx, "mean"]
}

fit_compare_df <- data.frame(
  x = x,
  Observed = y,
  `P-splines` = extract_obs_mean(fit_ps_generic0_bru),
  RW2 = extract_obs_mean(fit_ps_rw2_bru),
  PRW2 = extract_obs_mean(fit_ps_prw2_bru),
  GP = extract_obs_mean(fit_ps_gp_bru),
  check.names = FALSE
)

fit_compare_long <- rbind(
  data.frame(x = x, series = "Observed", value = fit_compare_df$Observed),
  data.frame(x = x, series = "P-splines", value = fit_compare_df$`P-splines`),
  data.frame(x = x, series = "RW2", value = fit_compare_df$RW2),
  data.frame(x = x, series = "PRW2", value = fit_compare_df$PRW2),
  data.frame(x = x, series = "Gaussian process", value = fit_compare_df$GP)
)

p_compare <- ggplot(fit_compare_long, aes(x = x, y = value, colour = series, linetype = series, alpha = series)) +
  geom_line(linewidth = 1) +
  scale_colour_manual(
    values = c(
      "Observed" = "black",
      "P-splines" = "#08519c",
      "RW2" = "#2ca25f",
      "PRW2" = "#cb181d",
      "Gaussian process" = "#756bb1"
    )
  ) +
  scale_linetype_manual(
    values = c(
      "Observed" = "solid",
      "P-splines" = "solid",
      "RW2" = "longdash",
      "PRW2" = "dotdash",
      "Gaussian process" = "twodash"
    )
  ) +
  scale_alpha_manual(
    values = c(
      "Observed" = 0.7,
      "P-splines" = 1,
      "RW2" = 1,
      "PRW2" = 1,
      "Gaussian process" = 1
    )
  ) +
  labs(
    x = "Trading-day index",
    y = "CAC 40 close",
    colour = NULL,
    linetype = NULL,
    title = "Posterior mean comparison across four latent-effect choices"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    legend.box = "vertical"
  ) +
  guides(alpha = "none")

interactive_plot(p_compare, height = 760)

The negligible differences can be explained primarily by how each effect approximates the continuous target function. The usual P-spline uses 80 basis functions to define the approximation, much like the SPDE approximation of the Gaussian process, which relies on a mesh with a comparable resolution. By contrast, the RW2 and PRW2 effects use one latent value for each trading-day index.