Diagnose model structure, dependencies, and sampler time (parameter and family levels)

Inspect a NIMBLE model to extract the universe of nodes, classify stochastic vs deterministic nodes, compute downstream dependencies per node, map configured samplers to their target nodes, and (optionally) profile per-sampler run time. Produces tidy tables and publication-quality figures at both the parameter level and the family level (where "family" = base variable name before any indices, replacing indices by square brackets, for example "beta[]").

Usage

diagnose_model_structure(
  model,
  include_data = FALSE,
  removed_nodes = NULL,
  ignore_patterns = c("^lifted_", "^logProb_"),
  make_plots = TRUE,
  output_dir = NULL,
  save_csv = FALSE,
  node_of_interest = NULL,
  sampler_times = NULL,
  sampler_times_unit = "seconds",
  auto_profile = TRUE,
  profile_niter = 3000L,
  profile_burnin = 100L,
  profile_thin = 1L,
  profile_seed = NULL,
  np = 0.1,
  by_family = TRUE,
  family_stat = c("median", "mean", "sum"),
  time_normalize = c("none", "per_node"),
  only_family_plots = FALSE,
  ...
)

Arguments

model: A compiled or uncompiled nimbleModel (required).
include_data: Logical; include data nodes when enumerating nodes (default FALSE).
removed_nodes: Character vector of nodes to exclude explicitly (default NULL).
ignore_patterns: Character vector of regular expressions used to exclude nodes (for example "^lifted_", "^logProb_").
make_plots: Logical; if TRUE, generate ggplot objects and optionally save them (default TRUE).
output_dir: Directory where figures/CSVs will be saved; if NULL, nothing is written to disk (default NULL).
save_csv: Logical; if TRUE, write CSV exports (dependencies per node, family tables) (default FALSE).
node_of_interest: Optional character scalar naming a node to highlight or subset (reserved for user logic) (default NULL).
sampler_times: Optional numeric vector of per-sampler times aligned with nimble::configureMCMC(model)$getSamplers().
sampler_times_unit: Character label for time axis (for example "seconds", "ms") (default "seconds").
auto_profile: Logical; if TRUE and sampler_times is NULL, profile sampler times automatically via a short MCMC run (default TRUE).
profile_niter: Integer; iterations used by the auto-profiler (default 3000L).
profile_burnin: Integer; burn-in iterations for auto-profiler (kept for API compatibility; not used internally) (default 100L).
profile_thin: Integer; thinning for auto-profiler (kept for API compatibility; not used internally) (default 1L).
profile_seed: Optional integer seed for reproducibility (default NULL).
np: Proportion in (0, 1] used elsewhere for "worst sampler" selection (kept for API compatibility) (default 0.10).
by_family: Logical; if TRUE, compute and plot family-level summaries in addition to parameter-level summaries (default TRUE).
family_stat: One of c("median", "mean", "sum"); summary statistic for family-level aggregation (default "median").
time_normalize: One of c("none", "per_node"); if "per_node", divide family time by the number of distinct nodes in the family (default "none").
only_family_plots: Logical; if TRUE, only family-level figures are exported (parameter-level plots are not written) (default FALSE).
...: Additional arguments forwarded to nimble::configureMCMC().

Value

A named list containing:

dependencies_df: data.frame of (node, dependency) pairs.
dep_counts: data.frame with per-parameter downstream dependency counts.
samplers_df: data.frame listing samplers and their target nodes (list-column).
per_param_times: data.frame with per-parameter aggregated sampler time.
deps_df: tidy data.frame used for parameter-level dependency plotting.
sampler_df: tidy data.frame used for parameter-level time plotting.
fam_deps_df: family-level dependency summary (one statistic per family).
fam_time_df: family-level time summary (optionally normalized).
plots: list of ggplot objects (some may be NULL): plot_dependencies, plot_sampler_time, plot_combined, plot_dependencies_family, plot_sampler_time_family, plot_combined_family.

Details

Key features:

Robust filtering of nodes (ignored patterns, removal list).
Downstream dependency counts per node and per family.
Per-sampler times aggregated to parameters and families.
Optional auto-profiling of samplers via a short MCMC run.
Non-regressive: original per-parameter plots are preserved by default.

Examples

if (FALSE) { # \dontrun{
res <- diagnose_model_structure(
  model            = my_nimble_model,
  make_plots       = TRUE,
  output_dir       = "outputs/diagnostics",
  save_csv         = TRUE,
  by_family        = TRUE,
  family_stat      = "median",
  time_normalize   = "per_node",
  only_family_plots = FALSE
)
} # }