MMD estimator

Methods

Author

Olivier Supplisson

Published

April 18, 2026

Download inla_mmd_cloglike.c

Last summer, I discovered Jeremias Knoblauch’s line of research around post-Bayesian modelling. His talks are truly fascinating, and his foundational paper even more so. Here are a couple of interesting resources to read:

Jeremias Knoblauch, Jack Jewson, Theodoros Damoulas, An Optimization-centric View on Bayes’ Rule: Reviewing and Generalizing Variational Inference, https://jmlr.org/papers/v23/19-1047.html.
Post-bayes seminar: https://postbayes.github.io/seminar/.
Jeremias’ talk Post-Bayesian Machine Learning: https://www.youtube.com/watch?v=sWBAB-4KHRE.

Navigating through all the citations and ramifications of this work, I ended up learning about generalised Bayes, and more specifically about the seminal work of Bissiri, Holmes, and Walker, A general framework for updating belief distributions, available at https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssb.12158. The idea is quite simple: keep the Bayes update rule and replace the likelihood function with any loss function targeting whatever you want. In practice, it allows you to unlock a range of benefits, such as Huber robustness, without binding you to a specific probabilistic model.

From there, I ended up learning more about the Maximum Mean Discrepancy estimator and Pierre Alquier’s line of work. Pierre Alquier’s YSE post, Universal estimation with maximum mean discrepancy (MMD), is a concise entry point: https://youngstats.github.io/post/2022/01/13/universal-estimation-with-maximum-mean-discrepancy-mmd/. Researching further, I found out about the regMMD package, developed with Mathieu Gerber and described here: https://computo-journal.org/published-202511-alquier-regmmd/published-202511-alquier-regmmd.pdf.

The new possibility of implementing custom likelihoods in R-INLA, together with my interest in genBayes and MMD estimation, led me to wonder whether I could implement Alquier’s estimator to use it myself as a loss within a Bayesian framework. As the cloglike interface requires a per-observation likelihood, I focused only on the Gaussian model for the \(\tilde{\theta}_{reg}\) estimator, with the scale estimated.

Simulation design

For the comparison study, I fixed the regression dimension, the true coefficients, the true scale, and the bandwidth rule, then repeated the fits over several sample sizes and kernels.

All runs completed successfully. As expected, R-INLA required more computation time than regMMD at smaller sample sizes, although the difference decreased as the sample size increased.

Kernel summaries

The following sections summarise the differences between R-INLA and regMMD for each kernel.

Gaussian kernel

These summaries report the minimum, mean, and maximum regMMD point estimates together with the corresponding INLA posterior modes over the successful runs. For the Gaussian kernel, the primary question is whether the INLA mode range follows the regMMD range as the sample size increases and whether any parameters exhibit a persistent shift.

In the Gaussian case, the table indicates very close agreement between the two methods for the regression coefficients, even at small sample sizes, with the min/mean/max summaries becoming nearly indistinguishable as the sample size increases. The clearest discrepancy concerns sigma when n = 50 and n = 100, where the INLA posterior mode is systematically higher than the regMMD estimate. This difference decreases rapidly from n = 250 onward. At the largest sample sizes, both the central tendency of the estimates and the dispersion across replicates are nearly identical for all parameters.

Laplace kernel

This section applies the same min/mean/max comparison to the Laplace kernel. The relevant question is whether the heavier-tailed kernel alters the alignment pattern, especially for sigma and for coefficients whose ranges remain wide across repeated fits.

In the Laplace case, the overall pattern remains close to the Gaussian one: the regression coefficients from INLA and regMMD are already fairly well aligned at small sample sizes, and their min/mean/max summaries become very similar as n increases. The principal discrepancy again concerns sigma, with the INLA posterior mode systematically above the regMMD estimate at n = 50 and n = 100, although the difference is smaller than in the Gaussian table and diminishes rapidly thereafter. There are also modest small-sample differences for some coefficients, especially X2, but these are not persistent and are largely absent at larger sample sizes.

Cauchy kernel

This summary is numerically the most demanding case and therefore provides the clearest setting in which to examine the largest discrepancies between INLA modes and regMMD point estimates. The relevant issues are whether the INLA ranges stabilise as the sample size increases and whether the ordering of parameters remains consistent with the regMMD reference.

The Cauchy table exhibits the clearest small-sample differences between the two methods. When n = 50, the INLA posterior mode is noticeably higher for sigma, lower for X1, and less negative for X2, and these differences remain visible at n = 100, although they are already smaller. From n = 250 onward the agreement improves substantially, and at the largest sample sizes the min/mean/max summaries from INLA and regMMD are again very close for all parameters, with only minor residual differences in range.

Section 4.2.4

This final section reproduces the analysis from section 4.2.4 of the regMMD article, available at https://computo-journal.org/published-202511-alquier-regmmd/published-202511-alquier-regmmd.pdf. Rather than conducting an additional simulation study, it reconstructs the article’s applied example on the airquality data and compares the regMMD and R-INLA outputs under the same modelling ingredients.

OLS reference and regMMD default fit

This section first computes the ordinary least squares references, both with and without the low-response observation, in order to assess the sensitivity of the classical fit to that potential outlier. It then reproduces the default regMMD fit from section 4.2.4 and extracts the coefficients, scale parameter, and selected bandwidth used in the subsequent comparison with R-INLA.

R-INLA default fit

The same Gaussian loss is fitted here through the R-INLA cloglike interface. The code fixes the bandwidth and prior settings, estimates the model, and then collects posterior summaries for the coefficients and sigma for direct comparison with the regMMD output.

Parameter-specific comparison

This table reports the regMMD point estimate, the INLA posterior mode, and the absolute difference between the INLA posterior median and the regMMD estimate for each parameter.

In this example, the absolute median differences are smallest for the intercept, X2, and X5, all of which remain within approximately 0.003 of the regMMD estimate. The largest differences occur for X4 and X6, with smaller but still visible deviations for X1, X3, and sigma. The INLA modes remain close to the regMMD point estimates overall, although noticeable differences remain for X4, X6, and sigma.

ETI95% vs CI95%

The final comparison turns from point estimates to uncertainty summaries. It contrasts the classical 95% confidence intervals from OLS with the INLA 95% equal-tailed intervals in order to assess whether the two approaches convey a similar level of uncertainty for each parameter.

The INLA equal-tailed intervals are wider than the classical OLS confidence intervals for every parameter, sometimes only slightly, as for the intercept and sigma, and sometimes much more noticeably, as for X1, X4, and X6. The broad qualitative conclusions remain unchanged for the intercept, X1, X3, X5, and sigma, but the comparison is particularly informative for X2 and X4: their OLS confidence intervals exclude zero, whereas the INLA equal-tailed intervals include zero. Overall, the INLA fit yields a more conservative characterisation of uncertainty than the classical OLS reference in this example.

Conclusion

In the end, I believe the regMMD and cloglike implementation are pretty coherent: I will use it in some application posts. Feel free to also do so if you find the implementation useful.