Skip to contents

Parameters are grouped into families defined by the prefix before the first "[" in their name (for example, "beta[1]" and "beta[2]" belong to the family "beta"). For each family, the function computes median efficiency metrics and derived quantities that help identify bottlenecks.

Usage

identify_bottlenecks_family(
  samples,
  runtime_s,
  ess_threshold = 1000,
  sampler_params = NULL,
  model = NULL,
  mcmc_conf = NULL,
  ignore_patterns = c("^lifted_", "^logProb_"),
  strict_sampler_only = TRUE,
  auto_configure = TRUE,
  rhat_threshold = 1.01,
  ess_per_s_min = 0
)

Arguments

samples

An object containing MCMC samples; typically an object of class mcmc.list, mcmc, matrix, or data.frame.

runtime_s

Numeric scalar. Wall-clock runtime of the MCMC run in seconds.

ess_threshold

Numeric scalar. Target ESS per family (default is 1000).

sampler_params

Optional character vector of parameter names to keep when defining families. Parameters not in this vector are ignored.

model

A nimbleModel (compiled or uncompiled).

mcmc_conf

Optional MCMC configuration (from configureMCMC). If NULL, a fresh configuration is built internally.

ignore_patterns

Character vector of regular expressions for node or family names to exclude from the bottleneck search.

strict_sampler_only

Logical; if TRUE, only nodes actually sampled by a user-level sampler are considered.

auto_configure

Logical; if TRUE, the function will configure a baseline MCMC when mcmc_conf is missing.

rhat_threshold

Numeric scalar kept for API symmetry (not used in the ranking).

ess_per_s_min

Numeric scalar. Optional CE threshold (ESS per second) used to flag families below this value. Use 0 to deactivate.

Value

A list or data.frame describing bottleneck families and their structural and computational load.

A list with components:

type

Character string, either "ok" or "degenerate_only".

details

List with components ce, ae, time, and degenerate summarising the diagnostics.

per_family

Data frame (or tibble) of metrics by family.

summary

Single-row data frame (or tibble) with global summaries across families.

top3

Data frame containing the three worst families according to the main ranking criterion.

Details

Group parameters into families and rank them by median efficiency metrics.

For each family, the following median metrics are computed:

  • AE_med = median(AE) (low values are worse),

  • CE_med = median(CE) (low values are worse, CE is ESS per second),

  • ESS_med = median(ESS),

  • Rhat_med = median(Rhat, with na.rm = TRUE).

From these, the following diagnostics are derived:

  • slow_node_time = ess_threshold / CE_med (seconds needed to reach the target ESS; higher is worse),

  • meet_target = logical flag, TRUE when slow_node_time <= runtime_s.

Families with degenerate metrics (non-finite or non-positive ESS, AE, or CE) are reported in the degenerate component and excluded from the ranking.

When sampler_params is provided, only parameters whose names are included in sampler_params are used to form families (typically stochastic nodes that are actually sampled).