Censored binomial likelihood in R-INLA

Methods

Author

Olivier Supplisson

Published

April 12, 2026

Download cloglike-censored-binomial.c

R-INLA provides a wide range of built-in likelihoods, documented on the project likelihood page. Since INLA 25.08.21, an experimental interface also allows users to implement custom likelihoods. In this post, I use that feature to define a censored binomial likelihood. Notice that censored likelihood are already available for the Poisson and Negative Binomial distribution.

This likelihood is useful when counts are deliberately masked for disclosure control. A common situation is left censoring at a fixed threshold: if the true count is small, the released dataset does not report the exact value and only indicates that it lies below the threshold. For example, some public spatially-indexed health datasets in France suppress counts at or below 10.

Binomial and Censored binomial likelihood

Let \(Y\) denote the number of successes in a sample of size \(N\). A natural model is the binomial distribution

\[ Y \mid p,N \sim \mathcal{B}(N, p), \]

where \(p\) is the success probability. The corresponding probability mass function is

\[ \Pr(Y = j \mid p, N) = \binom{N}{j} p^j (1 - p)^{N - j}, \qquad j = 0, 1, \dots, N. \]

In regression settings, it is standard to connect \(p\) to a linear predictor \(\eta\) through the logit link

\[ p = \operatorname{logit}^{-1}(\eta) = \frac{\exp(\eta)}{1 + \exp(\eta)}. \]

Now suppose that counts are left censored at a threshold \(c\). We no longer observe \(Y\) exactly for small values. Instead, we observe

\[ Z = \begin{cases} Y, & \text{if } Y > c, \\ \text{censored}, & \text{if } Y \le c. \end{cases} \]

The observed-data likelihood therefore has two contributions.

For an uncensored observation, with reported count \(z > c\),

\[ \Pr(Z = z \mid p, N, c) = \Pr(Y = z \mid p, N) = \binom{N}{z} p^z (1 - p)^{N - z}. \]

For a censored observation,

\[ \Pr(Z = \text{censored} \mid p, N, c) = \Pr(Y \le c \mid p, N) = \sum_{k=0}^{c} \binom{N}{k} p^k (1 - p)^{N-k}. \]

So the censored binomial likelihood is just the usual binomial likelihood for uncensored observations, and the binomial cumulative distribution function for censored observations.

For implementation in INLA it is convenient to write the contribution on the log scale, as this is required by the cloglike interface.

For an uncensored observation with reported count \(z > c\),

\[ \log \Pr(Z = z \mid \eta, N, c) = \log \binom{N}{z} + z \eta - N \log(1 + \exp(\eta)). \]

For a censored observation,

\[ \log \Pr(Z = \text{censored} \mid \eta, N, c) = \log \left[ \sum_{k=0}^{c} \binom{N}{k} \operatorname{logit}^{-1}(\eta)^k \left(1 - \operatorname{logit}^{-1}(\eta)\right)^{N-k} \right]. \]

In practice, the censored term should be evaluated through the binomial CDF rather than by summing probabilities manually. In R, this corresponds to pbinom(c, size = N, prob = plogis(eta), log.p = TRUE), while the uncensored term is dbinom(z, size = N, prob = plogis(eta), log = TRUE).

Simulation study: why do we have to account for this censoring?

The code above fit a binomial model for the specific case where \((y_i)_{i\in\{1,\cdots,10000\}}\vert N_i, p\sim B(N_i,p)\), \(N_i\vert\lambda=20\sim\mathcal{P}(\lambda)\), and \(p=0.3\). The censoring threshold is set to 5. For the fit, a uniform prior on \((0,1)\) is set.

set.seed(1)
#Packages
library(INLA)
library(inlabru)

#Simulation setup
prob = 0.3
nsample = 10000
N = rpois(n = nsample, lambda = 20)
censoring_threshold <- 5
Y = rbinom(n = nsample, size = N, prob = prob)
censored <- as.integer(Y <= censoring_threshold)
Z = Y
Z[censored == 1L] <- censoring_threshold

#Bru components
comp <- ~ p(
  1,
  mean.linear = 0,
  prec.linear = 1,
  marginal = bm_marginal(qbeta, pbeta, dbeta, shape1 = 1, shape2 = 1)
)

#Uncensored fit
likelihood_uncensored <- bru_obs(
  formula = Y ~ 0 + qlogis(p),
  family = "binomial",
  data = data.frame(Y,N),
  Ntrials = N,
  control.family = list(link = "logit")
)

fit_uncensored <- bru(comp, 
                      likelihood=likelihood_uncensored, 
                      options = list(verbose = FALSE,
                                     bru_verbose = 0
                                     )
                      )

#Censored fit
likelihood_censored <- bru_obs(
  formula = Z ~ 0 + qlogis(p),
  family = "binomial",
  data = data.frame(Z,N),
  Ntrials = N,
  control.family = list(link = "logit")
)

fit_censored <- bru(comp, 
                      likelihood=likelihood_censored, 
                      options = list(verbose = FALSE,
                                     bru_verbose = 0
                                     )
                      )

#Compute the posterior mean and ETI95% of p
draws_uncensored <- predict(fit_uncensored, 100, ~data.frame(p=p))
draws_censored <- predict(fit_censored, 100, ~data.frame(p=p))

fmt2 <- function(x) sub("\\.", ",", sprintf("%.4f", x))

result_uncensored <- paste0(
  fmt2(draws_uncensored$mean), " [",
  fmt2(draws_uncensored$q0.025), ", ",
  fmt2(draws_uncensored$q0.975), "]"
)

result_censored <- paste0(
  fmt2(draws_censored$mean), " [",
  fmt2(draws_censored$q0.025), ", ",
  fmt2(draws_censored$q0.975), "]"
)

The mean [ETI95%] found is 0,3008 [0,2985, 0,3029] for the uncensored fit and 0,3263 [0,3242, 0,3281] for the censored one, showing that not accounting for the censoring will induce an upward inflation in the inferred parameter. This is because the classical binomial likelihood uses \(\Pr(Y = 10 \mid p, N, c)\) for censored point and therefore does not account for the fact that the censoring point is an upper bound.

Censored binomial implementation

The cloglike-censored-binomial.c file, downloadable at the start of this post, implements the censored binomial likelihood. Let us give it a try with the previous setpup.

The cloglike fit needs a matrix-valued response passed through data=, with a response name that matches the left-hand side of the formula, as examplified below.

#Look at the rgeneric vignette for the process
cloglike <- inla.cloglike.define(model = "inla_cloglike_censored_binomial", shlib = shlib)

#INLA response
Y_cens <- INLA::inla.mdata(cbind(Z, N, censored))

#Bru likelihood
likelihood_censored_new <- bru_obs(
  formula = Y_cens ~ 0 + qlogis(p),
  family = "cloglike",
  data = list(
    Y_cens = Y_cens
  ),
  control.family = list(cloglike = cloglike),
  is_rowwise = FALSE
)

#Fit
fit_censored_new <- bru(
  comp,
  likelihood = likelihood_censored_new,
  options = list(
    verbose = F,
    bru_verbose = 0
  )
)

#Posterior draws
draws_censored_new <- predict(fit_censored_new, 100, ~data.frame(p=p))

#Summary
result_censored_new <- paste0(
  fmt2(draws_censored_new$mean), " [",
  fmt2(draws_censored_new$q0.025), ", ",
  fmt2(draws_censored_new$q0.975), "]"
)

With this likelihood implementation, the correct parameter is recovered, as the posterior mean [ETI95%] is 0,3006 [0,2984, 0,3027].