MMM goes bru: how to implement the hill(adstock(-)) function from Meridian in inlabru?

Methods

Author

Olivier Supplisson

Published

May 1, 2026

I was recently looking at the Marketing Mix Modelling literature, which is essentially Bayesian probabilistic modelling applied to marketing and adjacent business questions. In January 2025, Google released meridian, a JAX-based package that provides convenient utilities for fitting these models.

Three remarks on the MMM literature, and on meridian in particular. As often happens when a field develops in relative isolation, some parts of the literature have the distinct flavour of rediscovering existing statistical machinery under new names. A striking example is this Uber paper on Bayesian time-varying coefficient models: https://arxiv.org/html/2106.03322v4. One can only hope they will soon discover that coefficients may also vary spatially, as in Gelfand et al. (2003) https://www.tandfonline.com/doi/abs/10.1198/016214503000170, and therefore, with only a modest additional leap, spatio-temporally.

One charmingly retro feature is that spatially specific parameters are treated as i.i.d. latent effects, as explained here, almost as if the spatial statistics literature had politely stayed home for the last few decades. That is unfortunate, since there is a wide range of spatially structured latent effects designed precisely to borrow strength across neighbouring areas and improve spatial modelling, including models that explicitly mix structured and unstructured spatial variation, as presented here.

Another charmingly anachronistic habit is Bayesian model selection by BIC, as in this Google paper. BIC has the word “Bayesian” in its name, which is apparently doing quite a lot of work here, since the criterion itself is not Bayesian in any meaningful posterior-predictive or decision-theoretic sense. For a modern, proper way to do Bayesian model comparison, have a look at https://mc-stan.org/loo/articles/online-only/faq.html and the references therein. For background on scoring rules, have a look at https://www.tandfonline.com/doi/abs/10.1198/016214506000001437.

Beyond these somewhat painful archaeological findings, I found in the meridian documentation two interesting domain-specific non-linear effects that intuitively make sense: lagged carryover and saturation. Lagged carryover is handled through an adstock transformation:

\[ \operatorname{Adstock}(x_t, x_{t-1}, \cdots, x_{t-L}; \alpha) = \frac{ \sum_{s = 0}^{L} w(s; \alpha)x_{t-s} }{ \sum_{s = 0}^{L} w(s; \alpha) }, \]

where \(w(s;\alpha)\) is a non-negative weight function, \(x_t \geq 0\) is media execution at time \(t\), \(\alpha \in [0,1]\) is the decay parameter, and \(L\) is the maximum lag duration.

The weights encode how much of a past media exposure is still active at the current date. In the geometric version, \(w_s = \alpha^s\), so small values of \(\alpha\) put almost all the mass on the current exposure, while values closer to one spread the effect more evenly over the lag window. The normalization keeps the transformed media variable on a comparable scale when \(\alpha\) changes; the parameter then controls the temporal allocation of the exposure rather than mechanically changing its total mass. Meridian also proposes a binomial decay, which can keep more weight in the latter part of the lag window and is therefore useful when carryover is expected to be more persistent.

Saturation is handled through a Hill function:

\[ \operatorname{Hill}(x; ec, \operatorname{slope}) = \frac{1}{ 1 + \left(\frac{x}{ec}\right)^{-\operatorname{slope}} }. \]

Here \(x \geq 0\), \(ec > 0\) is the half-saturation point, meaning that \(\operatorname{Hill}(x = ec; ec, \operatorname{slope}) = 0.5\), and \(\operatorname{slope} > 0\) controls the shape of the curve. When \(\operatorname{slope} \leq 1\), the response is concave. When \(\operatorname{slope} > 1\), the response is S-shaped: convex for \(x < ec\) and concave for \(x > ec\).

By default, Meridian applies the Hill transformation after adstock, so the media contribution is essentially of the form \(\beta\operatorname{Hill}(\operatorname{Adstock}(x); ec, \operatorname{slope})\).

These two transformations are actually quite sensible: advertising may have delayed effects, and additional spend should not be expected to produce indefinitely linear returns. The useful question, then, is less whether this structure is reasonable and more how to implement the same hill(adstock(-)) construction in a more general latent Gaussian modelling framework such as inlabru.

Turns out that it works like a charm. What a surprise.

The hill(adstock(-)) function

The implementation below uses the geometric adstock specification. I pad the media series with zeros at the beginning, build normalized weights over the lag window \(0,\ldots,L\), compute the adstocked media stock \(a_t\), and then pass it through the Hill saturation curve.

library(INLA)
library(inlabru)

set.seed(20260501)

media_effect <- function(beta, alpha, ec, slope, row = seq_along(media)) {
  beta <- as.numeric(beta)
  alpha <- as.numeric(alpha)
  ec <- as.numeric(ec)
  slope <- as.numeric(slope)

  weights <- alpha^(0:max_lag)
  weights <- weights / sum(weights)
  media_padded <- c(rep(0, max_lag), media)

  adstock <- vapply(seq_along(media), function(t) {
    sum(media_padded[max_lag + t - 0:max_lag] * weights)
  }, numeric(1))

  hill <- adstock^slope / (adstock^slope + ec^slope)
  as.numeric(beta * hill[row])
}

Data simulation

n <- 10000L
max_lag <- 4L
time <- seq_len(n)
media <- rgamma(n, shape = 2, rate = 0.25)

beta_true <- 1.6
alpha_true <- 0.55
ec_true <- 7
slope_true <- 1.4
intercept_true <- 0.25
sigma_true <- 0.15

eta <- intercept_true + media_effect(
  beta = beta_true,
  alpha = alpha_true,
  ec = ec_true,
  slope = slope_true
)
y <- eta + rnorm(n, sd = sigma_true)
dat <- data.frame(y = y, time = time)

Model components

id <- 1L
beta_prec <- 1 / 2^2
alpha_shape <- c(2, 2)
ec_rate <- 1 / median(media)
slope_rate <- 1

components <-
  ~ Intercept(1) +
  beta(id, model = "iid", n = 1,
       hyper = list(prec = list(initial = log(beta_prec), fixed = TRUE))) +
  alpha(id, model = "iid", n = 1,
        hyper = list(prec = list(initial = 0, fixed = TRUE)),
        marginal = bm_marginal(qbeta, pbeta, dbeta,
                               shape1 = alpha_shape[1], shape2 = alpha_shape[2])) +
  ec(id, model = "iid", n = 1,
     hyper = list(prec = list(initial = 0, fixed = TRUE)),
     marginal = bm_marginal(qexp, pexp, dexp, rate = ec_rate)) +
  slope(id, model = "iid", n = 1,
        hyper = list(prec = list(initial = 0, fixed = TRUE)),
        marginal = bm_marginal(qexp, pexp, dexp, rate = slope_rate))

Note: bm_marginal() applies a quantile transformation. The latent field is represented internally as a standard Gaussian variable \(z\); its Gaussian probability level \(\Phi(z)\) is then pushed through the target quantile function. In this example, alpha is evaluated as \(q_{\mathrm{Beta}}(\Phi(z))\), while ec and slope are evaluated as \(q_{\mathrm{Exp}}(\Phi(z))\). This gives constrained parameters with the requested marginal distributions inside the predictor, while the raw summary.random output remains on the internal Gaussian scale unless it is transformed back.

Fit

lik <- inlabru::bru_obs(
  family = "gaussian",
  formula = y ~ 0 + Intercept + media_effect(
    beta = beta_eval(id),
    alpha = alpha_eval(id),
    ec = ec_eval(id),
    slope = slope_eval(id),
    row = time
  ),
  data = dat
)

fit <- inlabru::bru(
  components,
  lik,
  options = list(
    bru_verbose = 0,
    bru_max_iter = 1000
  )
)

Evaluate vs. truth

truth <- c(
  Intercept = intercept_true,
  beta = beta_true,
  alpha = alpha_true,
  ec = ec_true,
  slope = slope_true
)

estimate <- c(
  Intercept = fit$summary.fixed["Intercept", "mean"],
  beta = fit$summary.random$beta$mode[1],
  alpha = qbeta(pnorm(fit$summary.random$alpha$mode[1]),
                shape1 = alpha_shape[1], shape2 = alpha_shape[2]),
  ec = qexp(pnorm(fit$summary.random$ec$mode[1]), rate = ec_rate),
  slope = qexp(pnorm(fit$summary.random$slope$mode[1]), rate = slope_rate)
)

print(round(cbind("Truth" = truth, "Mode" = estimate), 3))

##           Truth  Mode
## Intercept  0.25 0.249
## beta       1.60 1.573
## alpha      0.55 0.554
## ec         7.00 6.792
## slope      1.40 1.445