| Title: | Approximating Evidence via Bounded Harmonic Means |
|---|---|
| Description: | Implements the Elliptical Covering Marginal Likelihood Estimator (ECMLE), a geometric method for approximating marginal likelihood from posterior draws and log-posterior evaluations. The method constructs a collection of non-overlapping ellipsoids in a high-posterior-density region, computes the covered volume, and combines this with posterior sample coverage to estimate model evidence. It is designed to stabilize harmonic-mean-based evidence approximation and can be applied in multimodal settings. The methodology is described in Naderi et al. (2025) <doi:10.48550/arXiv.2510.20617>. |
| Authors: | Dana Naderi [aut, cre], Christian P. Robert [aut], Kaniav Kamary [aut], Darren Wraith [aut] |
| Maintainer: | Dana Naderi <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-05-15 09:12:15 UTC |
| Source: | https://github.com/da-na-deri/ecmle |
Implements Elliptical Covering Marginal Likelihood Estimator (ECMLE), a geometric method for approximating marginal likelihood from posterior draws and log-posterior evaluations. The approach constructs a collection of non-overlapping ellipsoids within a high-posterior-density region, computes the covered volume, and combines this with posterior sample coverage to estimate model evidence. The method is designed to stabilize harmonic-mean-based evidence approximation and can be applied in multimodal settings. The methodology is described in Naderi et al. (2025) <doi:10.48550/arXiv.2510.20617>.
The package provides the main estimator ecmle() with print, summary,
and plot methods, visualisation functions plot_ecmle_2d() and
draw_ellipse_2d(), a pair plot utility pair_plot(), and
Rosenbrock (banana) posterior benchmark functions for testing and reproducible
experiments.
Dana Naderi [aut, cre], Christian P Robert [aut], Kaniav Kamary [aut], Darren Wraith [aut]
Computes an ECMLE estimate of the log marginal likelihood from posterior draws, corresponding log unnormalized posterior evaluations, and a function that evaluates the log unnormalized posterior density at arbitrary points.
ecmle( post_samples, lps, log_post_fn, hpd_level = 0.75, subsample_frac = 0.05, r_max = NULL, bisect_tol = 1e-04, seed = NULL, verbose = FALSE )ecmle( post_samples, lps, log_post_fn, hpd_level = 0.75, subsample_frac = 0.05, r_max = NULL, bisect_tol = 1e-04, seed = NULL, verbose = FALSE )
post_samples |
Numeric matrix of posterior draws; each row is one draw. |
lps |
Numeric vector of log unnormalized posterior values (log prior +
log likelihood, up to an additive constant) evaluated at the rows of
|
log_post_fn |
A function that accepts a numeric vector of length |
hpd_level |
Fraction in |
subsample_frac |
Fraction of HPD points retained for candidate ellipsoid
centers. Default |
r_max |
Upper bracketing radius used by the directional bisection solver.
If |
bisect_tol |
Tolerance for the directional radius solver. Default
|
seed |
Optional integer seed used for HPD subsampling. |
verbose |
Logical; if |
The algorithm splits posterior draws into two halves. The first half is used to define and pack ellipsoids inside a high-posterior-density region. The second half is used to estimate the coverage-adjusted harmonic-mean quantity that yields the final estimate.
An object of class "ecmle", a list with components:
log_marginal_likelihoodFinal log marginal likelihood estimate.
log_marginal_likelihood_iterRunning estimate over the second half-sample.
ellipsoidsList of fitted ellipsoids.
total_volumeTotal ellipsoid volume.
n_ellipsoidsNumber of ellipsoids.
points_in_ellipsoidsNumber of evaluation points inside the ellipsoids.
n_samplesNumber of evaluation points in the second half-sample.
coverage_rateFraction of evaluation points covered by the ellipsoids.
hpd_levelHPD level used in the fit.
Consistency of lps and log_post_fn is critical.
Both must evaluate the same log unnormalized posterior (log prior +
log likelihood) up to the same additive constant. If they differ by more
than a constant the marginal likelihood estimate will be incorrect.
set.seed(1) d <- 2 b <- rep(-1, d - 1) a <- rep(0, d - 1) Y_bar <- rosenbrock_generate_data( theta_true = seq(1, 2.8, length.out = d), n = 20, b = b, a = a, sigma = 8 ) samps <- rosenbrock_exact_posterior(Y_bar, n = 20, b = b, a = a, sigma = 8, n_samples = 500L) lps <- rosenbrock_log_post_vec(samps, Y_bar, n = 20, b = b, a = a, sigma = 8) log_post_fn <- function(theta) rosenbrock_log_post(theta, Y_bar, n = 20, b = b, a = a, sigma = 8) fit <- ecmle(samps, lps, log_post_fn, hpd_level = 0.75, seed = 1L) fit summary(fit) plot_ecmle_2d(fit, post_samples = samps)set.seed(1) d <- 2 b <- rep(-1, d - 1) a <- rep(0, d - 1) Y_bar <- rosenbrock_generate_data( theta_true = seq(1, 2.8, length.out = d), n = 20, b = b, a = a, sigma = 8 ) samps <- rosenbrock_exact_posterior(Y_bar, n = 20, b = b, a = a, sigma = 8, n_samples = 500L) lps <- rosenbrock_log_post_vec(samps, Y_bar, n = 20, b = b, a = a, sigma = 8) log_post_fn <- function(theta) rosenbrock_log_post(theta, Y_bar, n = 20, b = b, a = a, sigma = 8) fit <- ecmle(samps, lps, log_post_fn, hpd_level = 0.75, seed = 1L) fit summary(fit) plot_ecmle_2d(fit, post_samples = samps)
S3 methods for inspecting ECMLE fits.
## S3 method for class 'ecmle' print(x, ...) ## S3 method for class 'ecmle' summary(object, ...) ## S3 method for class 'summary.ecmle' print(x, ...) ## S3 method for class 'ecmle' plot( x, y, xlab = "Iteration", ylab = "Running log marginal likelihood", main = "ECMLE running estimate", ... )## S3 method for class 'ecmle' print(x, ...) ## S3 method for class 'ecmle' summary(object, ...) ## S3 method for class 'summary.ecmle' print(x, ...) ## S3 method for class 'ecmle' plot( x, y, xlab = "Iteration", ylab = "Running log marginal likelihood", main = "ECMLE running estimate", ... )
x |
An object of class |
object |
An object of class |
y |
Unused. |
xlab |
X-axis label for the plot. |
ylab |
Y-axis label for the plot. |
main |
Plot title. |
... |
Unused or additional graphical parameters. |
summary.ecmle() returns an object of class "summary.ecmle".
The print and plot methods return their input invisibly.
Produces a lower-triangular scatter/density pair plot of a posterior sample matrix. Diagonal panels display scaled marginal histograms. Lower off-diagonal panels display a 2-D pixel-density image using a blue colour ramp. Upper panels are left blank.
pair_plot(samples, pixs = 1, labels = NULL)pair_plot(samples, pixs = 1, labels = NULL)
samples |
Numeric matrix with one row per posterior draw and one column
per parameter. Column names are used as axis labels when |
pixs |
Positive scalar controlling the pixel size of the 2-D density
image. Smaller values produce finer resolution at higher computational
cost. Default |
labels |
Optional character vector of length |
Invisibly returns NULL. Called for its side-effect (drawing a plot).
set.seed(1) d <- 2 b <- rep(-1, d - 1) a <- rep(0, d - 1) Y_bar <- rosenbrock_generate_data(seq(1, 2.8, length.out = d), n = 20, b = b, a = a, sigma = 8) samps <- rosenbrock_exact_posterior(Y_bar, n = 20, b = b, a = a, sigma = 8, n_samples = 2000L) pair_plot(samps, pixs = 0.5)set.seed(1) d <- 2 b <- rep(-1, d - 1) a <- rep(0, d - 1) Y_bar <- rosenbrock_generate_data(seq(1, 2.8, length.out = d), n = 20, b = b, a = a, sigma = 8) samps <- rosenbrock_exact_posterior(Y_bar, n = 20, b = b, a = a, sigma = 8, n_samples = 2000L) pair_plot(samps, pixs = 0.5)
plot_ecmle_2d visualises the output of ecmle() for a
two-dimensional posterior. The second-half evaluation samples are colour-coded
by ellipsoid membership (navy = inside, grey = outside) and the fitted
ellipsoid boundaries are overlaid in red. For the function emits
an informative message and returns fit invisibly without plotting.
draw_ellipse_2d is the low-level helper that traces a single ellipse
boundary on the current graphics device. It can also be called directly to
add ellipses to an existing plot.
plot_ecmle_2d( fit, post_samples, col_inside = "navy", col_outside = "grey", col_ellipse = "red", pch = 19, cex = 0.3, lwd_ellipse = 1.5, npoints = 100L, xlab = expression(theta[1]), ylab = expression(theta[2]), main = NULL, legend_pos = "topright", ... ) draw_ellipse_2d(center, Sigma, npoints = 100L, col = "red", lwd = 1.5, ...)plot_ecmle_2d( fit, post_samples, col_inside = "navy", col_outside = "grey", col_ellipse = "red", pch = 19, cex = 0.3, lwd_ellipse = 1.5, npoints = 100L, xlab = expression(theta[1]), ylab = expression(theta[2]), main = NULL, legend_pos = "topright", ... ) draw_ellipse_2d(center, Sigma, npoints = 100L, col = "red", lwd = 1.5, ...)
fit |
An object of class |
post_samples |
Numeric matrix of posterior draws (N by d); the same
matrix passed to |
col_inside |
Colour for samples inside at least one ellipsoid.
Default |
col_outside |
Colour for samples outside all ellipsoids.
Default |
col_ellipse |
Colour for ellipsoid boundaries. Default |
pch |
Point character; default |
cex |
Point size; default |
lwd_ellipse |
Line width for ellipse boundaries; default |
npoints |
Integer; number of boundary points per ellipse.
Default |
xlab |
X-axis label; default |
ylab |
Y-axis label; default |
main |
Plot title. When |
legend_pos |
Legend position string passed to
|
center |
Numeric vector of length 2; ellipse center. |
Sigma |
2 by 2 symmetric positive-definite covariance matrix defining the ellipse boundary. |
col |
Line colour for |
lwd |
Line width for |
... |
Additional graphical parameters passed to
|
plot_ecmle_2d returns fit invisibly.
draw_ellipse_2d returns NULL invisibly.
Both are called for their side-effect (drawing on the active graphics device).
set.seed(42) post_samps <- cbind(rnorm(400), rnorm(400)) lps <- apply(post_samps, 1, function(z) sum(dnorm(z, log = TRUE))) log_post_fn <- function(theta) sum(dnorm(theta, log = TRUE)) fit <- ecmle(post_samps, lps, log_post_fn, hpd_level = 0.75, seed = 1L) plot_ecmle_2d(fit, post_samples = post_samps) # draw_ellipse_2d standalone usage plot(0, 0, xlim = c(-3, 3), ylim = c(-3, 3), type = "n", asp = 1, xlab = "", ylab = "") draw_ellipse_2d(center = c(0, 0), Sigma = matrix(c(1, 0.4, 0.4, 1), 2, 2))set.seed(42) post_samps <- cbind(rnorm(400), rnorm(400)) lps <- apply(post_samps, 1, function(z) sum(dnorm(z, log = TRUE))) log_post_fn <- function(theta) sum(dnorm(theta, log = TRUE)) fit <- ecmle(post_samps, lps, log_post_fn, hpd_level = 0.75, seed = 1L) plot_ecmle_2d(fit, post_samples = post_samps) # draw_ellipse_2d standalone usage plot(0, 0, xlim = c(-3, 3), ylim = c(-3, 3), type = "n", asp = 1, xlab = "", ylab = "") draw_ellipse_2d(center = c(0, 0), Sigma = matrix(c(1, 0.4, 0.4, 1), 2, 2))
Functions for the Rosenbrock (banana-shaped) posterior, providing data generation, exact posterior sampling, and log-posterior evaluation.
rosenbrock_generate_data draws one sufficient statistic vector.
rosenbrock_exact_posterior draws exact posterior samples.
rosenbrock_log_post evaluates the log-posterior at a single parameter
vector (suitable as the log_post_fn argument to ecmle()).
rosenbrock_log_post_vec is the vectorised version over a sample matrix.
rosenbrock_generate_data(theta_true, n, b, a, sigma) rosenbrock_exact_posterior(Y_bar, n, b, a, sigma, n_samples = 10000L) rosenbrock_log_post(theta, Y_bar, n, b, a, sigma) rosenbrock_log_post_vec(theta_matrix, Y_bar, n, b, a, sigma)rosenbrock_generate_data(theta_true, n, b, a, sigma) rosenbrock_exact_posterior(Y_bar, n, b, a, sigma, n_samples = 10000L) rosenbrock_log_post(theta, Y_bar, n, b, a, sigma) rosenbrock_log_post_vec(theta_matrix, Y_bar, n, b, a, sigma)
theta_true |
Numeric vector of length |
n |
Positive integer; number of observations used to form the sufficient statistic. |
b |
Numeric vector of length |
a |
Numeric vector of length |
sigma |
Positive scalar; observation standard deviation. |
Y_bar |
Numeric vector of length |
n_samples |
Positive integer; number of posterior draws to generate.
Default |
theta |
Numeric vector of length |
theta_matrix |
Numeric matrix with |
rosenbrock_generate_dataNumeric vector of length .
rosenbrock_exact_posteriorNumeric matrix of dimensions
n_samples by , with column names
theta1, ..., thetad.
rosenbrock_log_postA single numeric value.
rosenbrock_log_post_vecNumeric vector of length
nrow(theta_matrix).
set.seed(1) d <- 2 b <- rep(-1, d - 1) a <- rep(0, d - 1) # 1. Generate one dataset Y_bar <- rosenbrock_generate_data( theta_true = seq(1, 2.8, length.out = d), n = 20, b = b, a = a, sigma = 8 ) # 2. Draw exact posterior samples samps <- rosenbrock_exact_posterior(Y_bar, n = 20, b = b, a = a, sigma = 8, n_samples = 500L) # 3. Evaluate the log-posterior lps <- rosenbrock_log_post_vec(samps, Y_bar, n = 20, b = b, a = a, sigma = 8) # 4. Build a log_post_fn closure for ecmle() log_post_fn <- function(theta) rosenbrock_log_post(theta, Y_bar, n = 20, b = b, a = a, sigma = 8) fit <- ecmle(samps, lps, log_post_fn, hpd_level = 0.75, seed = 1L) fitset.seed(1) d <- 2 b <- rep(-1, d - 1) a <- rep(0, d - 1) # 1. Generate one dataset Y_bar <- rosenbrock_generate_data( theta_true = seq(1, 2.8, length.out = d), n = 20, b = b, a = a, sigma = 8 ) # 2. Draw exact posterior samples samps <- rosenbrock_exact_posterior(Y_bar, n = 20, b = b, a = a, sigma = 8, n_samples = 500L) # 3. Evaluate the log-posterior lps <- rosenbrock_log_post_vec(samps, Y_bar, n = 20, b = b, a = a, sigma = 8) # 4. Build a log_post_fn closure for ecmle() log_post_fn <- function(theta) rosenbrock_log_post(theta, Y_bar, n = 20, b = b, a = a, sigma = 8) fit <- ecmle(samps, lps, log_post_fn, hpd_level = 0.75, seed = 1L) fit