Daily joint prediction of CAC40 opening and closing levels

Applications
Author

Olivier Supplisson

Published

April 12, 2026

This application studies a joint forecasting problem for the CAC 40 opening and closing levels. Rather than treating the intraday move as a single response, the specification uses two linked observation equations: one for the opening level and one for the closing level. The link is simple: the opening equation is anchored by the previous close, and the closing equation is anchored by the same-day open.

The aim is to build a model able to maintain a probabilistic view of both levels during the day, quantify uncertainty, and update the implied distribution of the close once the opening level is known.

Data

The code below downloads daily CAC 40 opening and closing data from Yahoo Finance under the ticker ^FCHI and starts the analysis on January 1, 2000.

library(tidyverse)
date_start <- as.Date("2000-01-01")
date_end <- as.Date("2026-04-14")
forecast_start <- as.Date("2026-03-01")
forecast_end <- as.Date("2026-04-14")

time_index <- tibble(date = seq(from = date_start, to = date_end, by = "1 day")) |> 
  mutate(time_id = row_number() + 1)

cac40_xts <- quantmod::getSymbols(
  Symbols = "^FCHI",
  src = "yahoo",
  from = as.Date("1999-01-01"),
  to = date_end,
  auto.assign = FALSE
)

cac40 <- data.frame(
  date = zoo::index(cac40_xts),
  open = as.numeric(quantmod::Op(cac40_xts)),
  close = as.numeric(quantmod::Cl(cac40_xts))
)


cac40 <- cac40 |>
  filter(!is.na(open) & !is.na(close))  |> 
  mutate(intercept = 1) |> 
  mutate(close_lag = lag(close)) |> 
  filter(date >= date_start) |> 
  left_join(time_index, by = "date")

The observed series contains 6715 trading days, from 2000-01-03 to 2026-04-13.

The forecasting exercise below uses the trading days from 2026-03-01 to 2026-04-14 as a rolling one-step-ahead evaluation window. For each trading day in that range, the model is refit using only the data available up to the previous trading day.

The opening and closing levels are shown below.

plot_df <- rbind(
  data.frame(date = cac40$date, series = "Open", value = cac40$open),
  data.frame(date = cac40$date, series = "Close", value = cac40$close)
)

ggplot2::ggplot(plot_df, ggplot2::aes(x = date, y = value, colour = series)) +
  ggplot2::geom_line(linewidth = 0.55) +
  ggplot2::scale_colour_manual(values = c("Open" = "#6baed6", "Close" = "#08519c")) +
  ggplot2::labs(
    x = "Date",
    y = "CAC 40 level",
    title = "Daily CAC 40 opening and closing levels",
    colour = NULL
  ) +
  ggplot2::theme_minimal()

The two series move together over the long run, but the gap between them still changes enough from day to day to justify modeling the opening and closing levels separately.

Model

Let \(O_t\) and \(C_t\) denote the CAC 40 opening and closing levels on trading day \(t\). Since both quantities are strictly positive, the observational layer uses the lognormal likelihood entry from the R-INLA documentation.

Conditional on latent predictors \(\eta_t^{(O)}\) and \(\eta_t^{(C)}\), the observation model is

\[ \log O_t \mid \eta_t^{(O)}, \tau_O \sim \mathcal{N}(\eta_t^{(O)}, \tau_O^{-1}), \qquad \log C_t \mid \eta_t^{(C)}, \tau_C \sim \mathcal{N}(\eta_t^{(C)}, \tau_C^{-1}). \]

The corresponding linear predictors are

\[ \eta_t^{(O)} = \log C_{t-1} + \alpha_O + h_t^{(O)}, \qquad \eta_t^{(C)} = \log O_t + \alpha_C + h_t^{(C)}, \]

where the notation mirrors the code through CloseLag, Open, InterceptOpen, InterceptClose, betaOpen, and betaClose. The previous close enters the opening equation as an offset, while the same-day open enters the closing equation as an offset. The latent terms \(h_t^{(O)}\) and \(h_t^{(C)}\) are modeled separately, so the dependence between the two equations comes from this sequential conditioning structure rather than from a shared latent field.

Here, I am using a PRW2 latent effect but other approaches could be used with likely little to no change, as highlighted in this methodological post on P-splines in R-INLA. I’ll fit the corresponding model inlabru.

The code below defines the latent components:

Fit

The remaining piece is the observation model. For the lognormal likelihood, the INLA implementation uses an identity link on the log scale together with a precision hyperparameter. In the current code, the opening and closing equations are written as two separate bru_obs() objects. Both equations receive the same PC prior on the observation precision.

prior_prec_likelihood <- list(
  prior = "pc.prec",
  param = c(1, 0.01)
)


likopen <- bru_obs(
  open ~ log(CloseLag)  + InterceptOpen + betaOpen,
  family = "lognormal",
  data = cac40,
  control.family = list(
    link = "identity",
    hyper = list(prec = prior_prec_likelihood)
  )
)

likclose <- bru_obs(
  close ~  log(Open) + InterceptClose + betaClose,
  family = "lognormal",
  data = cac40,
  control.family = list(
    link = "identity",
    hyper = list(prec = prior_prec_likelihood)
  )
)

This is where the two-equation formulation is assembled: likopen and likclose are fitted together in a single bru() call against the same component object cmp.

The fit below keeps the configuration output needed later for posterior simulation, predictive checks, and rolling forecasts.

In-sample posterior predictive check

As a first sanity check, a simple posterior predictive assessment can be carried out. The approach use a shenanigan hidden in this post on the R-INLA discussion forum.

The first step of this magical trick is to draw samples from the joint posterior distribution of the latent state.

ndraws <- 1000
state <- evaluate_state(
result = fit,
property = "sample",
n = ndraws,
seed = 0L
)

Once these draws are available, they can be combined to rebuild any predictor of interest. Here, the posterior predictive check is performed on the same data set that was used for estimation.

predictor_observed <- evaluate_model(
model = fit$bru_info$model,
result = fit,
state = state,
data = cac40,
predictor = ~data.frame(predictor_close = InterceptClose + betaClose,
                       predictor_open = InterceptOpen + betaOpen,
                       Precision_for_open = Precision_for_the_lognormal_observations,
                       Precision_for_close = Precision_for_the_lognormal_observations_2_)
)

predictor_observed <- purrr::map2_dfr(
  predictor_observed,
  seq_along(predictor_observed),
  ~cbind(.x, cac40) %>% mutate(draw_id = .y)
) %>% 
  dplyr::mutate(
    predicted_open = stats::rlnorm(
      dplyr::n(),
      meanlog = log(close_lag) + predictor_open,
      sdlog = 1 / sqrt(Precision_for_open)
    ),
    predicted_close = stats::rlnorm(
      dplyr::n(),
      meanlog = log(open) + predictor_close,
      sdlog = 1 / sqrt(Precision_for_close)
    )
  ) |> 
  group_by(date) |> 
  summarise(
    mean_predicted_open = mean(predicted_open),
    mean_predicted_close = mean(predicted_close),
    q0.025_predicted_open = quantile(predicted_open, 0.025),
    q0.025_predicted_close = quantile(predicted_close, 0.025),
    q0.975_predicted_open = quantile(predicted_open, 0.975),
    q0.975_predicted_close = quantile(predicted_close, 0.975)
  ) |> 
  left_join(cac40, by = "date")

The figure below compares the observed and replicated series over the 2026 segment.

fun_to_plot_date <- function(str_date_start,
                             str_date_end){
  plot <- predictor_observed |> 
    filter(date >= as.Date(str_date_start),
           date <= as.Date(str_date_end)) |> 
    pivot_longer(cols = c(open,
                          close,
                          mean_predicted_open,
                          mean_predicted_close,
                          q0.025_predicted_open,
                          q0.025_predicted_close,
                          q0.975_predicted_open,
                          q0.975_predicted_close)) |> 
    mutate(facet = case_when(
      str_detect(name, "open") ~ "Open",
      str_detect(name, "close") ~ "Close"
    ),
    type = case_when(
      name %in% c("open", 
                  "close") ~ "Observed",
      name %in% c("q0.025_predicted_open",
                  "q0.025_predicted_close",
                  "q0.975_predicted_open",
                  "q0.975_predicted_close",
                  "mean_predicted_open", 
                  "mean_predicted_close") ~ "Simulated"
    ),
    label_legend = case_when(
      name == "open" ~ "Open",
      name == "close" ~ "Close",
      name == "q0.025_predicted_open" ~ "LB ETI95%",
      name == "q0.025_predicted_close" ~ "LB ETI95%",
      name == "q0.975_predicted_open" ~ "UB ETI95%",
      name == "q0.975_predicted_close" ~ "UB ETI95%",
      name == "mean_predicted_open" ~ "Average", 
     name ==  "mean_predicted_close" ~ "Average"
    )) |> 
      ggplot() + 
      geom_line(aes(x = date, 
                    y = value, 
                    color = label_legend, 
                    linetype = type)) + 
        facet_wrap(~facet, nrow = 2)+
    labs(
      x = "Date",
      y = "Index",
      color = "Series: ",
      linetype = "Type of series: "
    ) +
    theme_minimal()+
    theme(legend.position="bottom")
  
  #Never said this blog was for cood coding practice ;-)
  d <- plot$data |> select(-type, -label_legend, -facet) |> pivot_wider(names_from = name, values_from = value)
  pct_open <- round(100 * mean(with(d, open < q0.025_predicted_open | open > q0.975_predicted_open)), 2)
  pct_close <- round(100 * mean(with(d, close < q0.025_predicted_close | close > q0.975_predicted_close)), 2)
  return(list("plot" = plot,
         "pct_open" = pct_open,
         "pct_close" = pct_close))

}
#From "2026-01-01" to "2026-04-14"
p <- fun_to_plot_date("2026-01-01", date_end)
p["plot"]
$plot

The plot shows that the posterior means can miss individual days, but the interval bands are the more relevant object here because the application is about sequential uncertainty quantification rather than about a single determin

istic path. The observed value is almost always within this uncertainty band. More specifically, only the proportion of values observed outside the ETI95% were equal to `1.43`% and `0`%, for the open and close serie, respectively. Across the full period these % were 1.77% and 2.49%.

Not bad! However, an in-sample check only tells us whether the fitted model can reproduce the training data. The more relevant exercise is a one-step-ahead forecasting problem.

Out-of-sample day-ahead forecast

In this section, I keep the same state-based workflow as above, but instead of simulating one long path over the whole hold-out period, I refit the model day by day. For each trading day in the evaluation window, both open and close are hidden, the model is rerun using only the earlier observations, and I simulate the same-day opening and closing levels. The opening draw uses the observed lagged close from the previous trading day, while the closing draw uses the simulated opening level from the same posterior draw.

This mirrors the situation where a trader would perform its prediction at end of time \(t\) for planning its action in time \(t+1\) based on the predictive distribution of CAC40 index opening and closing level.

This produces one predictive distribution per day in the hold-out window, which can then be compared with the realised opening and closing levels over the evaluation period.

periodic_forecast |>
  pivot_longer(cols = c(open,
                        close,
                        mean_predicted_open,
                        mean_predicted_close,
                        q0.025_predicted_open,
                        q0.025_predicted_close,
                        q0.975_predicted_open,
                        q0.975_predicted_close)) |>
  mutate(
    facet = case_when(
      str_detect(name, "open") ~ "Open",
      str_detect(name, "close") ~ "Close"
    ),
    type = case_when(
      name %in% c("open", "close") ~ "Observed",
      TRUE ~ "Simulated"
    ),
    label_legend = case_when(
      name == "open" ~ "Open",
      name == "close" ~ "Close",
      name == "q0.025_predicted_open" ~ "LB ETI95%",
      name == "q0.025_predicted_close" ~ "LB ETI95%",
      name == "q0.975_predicted_open" ~ "UB ETI95%",
      name == "q0.975_predicted_close" ~ "UB ETI95%",
      name == "mean_predicted_open" ~ "Average",
      name == "mean_predicted_close" ~ "Average"
    )
  ) |>
  ggplot() +
  geom_line(aes(
    x = date,
    y = value,
    color = label_legend,
    linetype = type
  ))  +
  geom_point(aes(
    x = date,
    y = value,
    color = label_legend
  ), size = 1) +
  facet_wrap(~facet, nrow = 2) +
  labs(
    x = "Date",
    y = "Index",
    color = "Series: ",
    linetype = "Type of series: "
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

As expected, more observed values are falling outside of the uncertainty interval. More specifically, 24.14% of the observed opening values and 6.9% of the observed closing values fall outside the ETI95% over the hold-out window. Considering a larger uncertainty band, such as the ETI99%, would help decreasing these percentages.