Skip to content

Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5)#6198

Open
AntoineRichard wants to merge 2 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/benchmark-runtime-startup
Open

Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5)#6198
AntoineRichard wants to merge 2 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/benchmark-runtime-startup

Conversation

@AntoineRichard

@AntoineRichard AntoineRichard commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Description

Part 2 of 5 of the benchmark refactor series — the unified non-RL entry scripts.

Stacked on Part 1 (#6197). The diff against develop below also includes Part 1's core until #6197 merges. For the incremental Part 2 changes only, view:
AntoineRichard/IsaacLab@antoiner/benchmark-core...antoiner/benchmark-runtime-startup

Series: Part 1/5 core (#6197) → Part 2/5 runtime + startup (this PR) → Part 3/5 training (#6199) → Part 4/5 play (#6201) → Part 5/5 cleanup.

This PR is purely additive — it adds the new scripts alongside the existing benchmark_non_rl.py / benchmark_startup.py / run_*.sh, which keep working unchanged. Removal of the legacy scripts is deferred to Part 5/5 so downstream consumers can migrate at their own pace.

Adds:

  • scripts/benchmarks/runtime.py — steps an environment with random actions (no policy) and emits a RuntimeBundle.
  • scripts/benchmarks/startup.pycProfile startup-phase profiling (5 phases), emits a StartupBundle.
  • scripts/benchmarks/_common.py — shared CLI helpers (get_backend_type(s), preset_tokens, module loader).
  • Smoke tests for both, gated up-front on Isaac Sim availability (a genuine non-zero exit hard-fails).

Backends are selected with presets= Hydra tokens (same as train.py); the output format is chosen with --benchmark_backend (defaults to schema, accepts a comma-separated list, e.g. schema,omniperf). Existing OmniPerf / JSON / Osmo outputs are unchanged except for the additive peak rows contributed by Part 1's recorders.

Validated on develop (Newton/MJWarp): both smoke suites pass, including the multi-backend (schema,omniperf) run that emits two distinct output files.

Fixes # (n/a)

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation (testing/benchmarks.rst, migration/comparing_simulation_isaacgym.rst)
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Jun 16, 2026
@AntoineRichard AntoineRichard force-pushed the antoiner/benchmark-runtime-startup branch from e1f4ef7 to 935b759 Compare June 16, 2026 12:22
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 16, 2026
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.

Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
@AntoineRichard AntoineRichard marked this pull request as ready for review June 16, 2026 12:33
@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds unified runtime.py and startup.py benchmark entry scripts that replace the legacy benchmark_non_rl.py, benchmark_startup.py, and their shell-runner wrappers, and introduces the supporting core library modules (builders, capture, metrics, profiling, stepping, backend_descriptor, SchemaBundleFile) along with a comprehensive test suite.

  • scripts/benchmarks/runtime.py — steps an environment with random actions, records per-frame wall times, and emits a RuntimeBundle JSON; benchmark_non_rl.py and run_non_rl_benchmarks.sh are removed.
  • scripts/benchmarks/startup.py — wraps each of five startup phases in its own cProfile session and emits a StartupBundle JSON; benchmark_startup.py and run_physx_benchmarks.sh are removed.
  • source/isaaclab/isaaclab/test/benchmark/ — new pure-stdlib helper modules (builders, capture, metrics, profiling, stepping) and SchemaBundleFile backend; BaseIsaacLabBenchmark is extended to accept multiple comma-separated backends and a typed attach_bundle hook.

Confidence Score: 3/5

The core library modules and tests are solid, but two data-correctness bugs in the entry scripts need fixing before the output can be trusted.

In startup.py, both start_utc and end_utc are captured in consecutive calls inside _run_main just before bundle assembly, so duration_s in every StartupBundle is always approximately zero regardless of how long the profiling session actually took. Separately, import_module_from_path in _common.py registers the module in sys.modules before calling exec_module; if exec_module raises, the half-initialized module object is left in the cache and all subsequent calls silently return the broken object rather than re-raising. Both issues produce silent wrong data rather than loud failures, making them easy to miss in testing.

scripts/benchmarks/startup.py (timestamp ordering) and scripts/benchmarks/_common.py (import_module_from_path error path) need the most attention before merge.

Important Files Changed

Filename Overview
scripts/benchmarks/startup.py New startup profiling entry point; two bugs: start_utc/end_utc captured simultaneously (making duration_s always ~0), and env.close() not called when env.reset() raises.
scripts/benchmarks/_common.py New shared CLI helpers; import_module_from_path leaves a broken module in sys.modules if exec_module raises, causing silent failures on retry.
scripts/benchmarks/runtime.py New runtime benchmark entry point; env.close() is not inside a try/finally, so it is skipped if any exception occurs after gym.make.
source/isaaclab/isaaclab/test/benchmark/benchmark_core.py Extended to accept list/comma-separated backends and attach_bundle; multi-backend finalization loop and deduplication look correct.
source/isaaclab/isaaclab/test/benchmark/backends.py Added SchemaBundleFile backend; lazy serialize import is a clean design; no-bundle warning path is correctly handled.
source/isaaclab/isaaclab/test/benchmark/builders.py New pure-assembly functions for RuntimeBundle, TrainingBundle, and StartupBundle; clean, well-documented, no issues.
source/isaaclab/isaaclab/test/benchmark/capture.py New stdlib-only capture helpers; defensive defaulting throughout, clean design.
source/isaaclab/isaaclab/test/benchmark/metrics.py New metric helpers (TensorBoard parsing, EMA, MeanStd, convergence checking, success-rate tracking); no issues found.
source/isaaclab/isaaclab/test/benchmark/profiling.py New cProfile parsing; uses internal stats.stats dict (noted as non-public API with a migration comment), whitelist+placeholder logic correct.
source/isaaclab/isaaclab/test/benchmark/stepping.py New lightweight stepping helper; single-agent and multi-agent paths, no issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant CLI as CLI / isaaclab.sh
    participant RT as runtime.py / startup.py
    participant Common as _common.py
    participant Core as BaseIsaacLabBenchmark
    participant Backends as MetricsBackend
    participant Schema as SchemaBundleFile
    participant Builders as builders.py
    participant Capture as capture.py

    CLI->>RT: argv
    RT->>Common: get_backend_types(--benchmark_backend)
    RT->>Common: preset_tokens(remaining)
    RT->>Core: "__init__(backend_type=[...])"
    Core->>Backends: get_instance(type) x N
    Backends-->>Core: [SchemaBundleFile, OmniPerfKPIFile, ...]
    RT->>Capture: capture_versions / capture_hardware / capture_resources
    Capture-->>RT: Versions, Hardware, Resources
    RT->>Builders: build_runtime_bundle / build_startup_bundle
    Builders-->>RT: RuntimeBundle / StartupBundle
    RT->>Core: attach_bundle(bundle)
    RT->>Core: _finalize_impl()
    loop for each backend
        Core->>Backends: add_metrics(phase)
        Core->>Backends: "finalize(path, filename_backend, bundle=bundle)"
        alt "backend == schema"
            Schema->>Schema: write_bundle_file(bundle, path)
        else other backend
            Backends->>Backends: write flat KPI JSON
        end
    end
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant CLI as CLI / isaaclab.sh
    participant RT as runtime.py / startup.py
    participant Common as _common.py
    participant Core as BaseIsaacLabBenchmark
    participant Backends as MetricsBackend
    participant Schema as SchemaBundleFile
    participant Builders as builders.py
    participant Capture as capture.py

    CLI->>RT: argv
    RT->>Common: get_backend_types(--benchmark_backend)
    RT->>Common: preset_tokens(remaining)
    RT->>Core: "__init__(backend_type=[...])"
    Core->>Backends: get_instance(type) x N
    Backends-->>Core: [SchemaBundleFile, OmniPerfKPIFile, ...]
    RT->>Capture: capture_versions / capture_hardware / capture_resources
    Capture-->>RT: Versions, Hardware, Resources
    RT->>Builders: build_runtime_bundle / build_startup_bundle
    Builders-->>RT: RuntimeBundle / StartupBundle
    RT->>Core: attach_bundle(bundle)
    RT->>Core: _finalize_impl()
    loop for each backend
        Core->>Backends: add_metrics(phase)
        Core->>Backends: "finalize(path, filename_backend, bundle=bundle)"
        alt "backend == schema"
            Schema->>Schema: write_bundle_file(bundle, path)
        else other backend
            Backends->>Backends: write flat KPI JSON
        end
    end
Loading

Comments Outside Diff (2)

  1. scripts/benchmarks/startup.py, line 556-560 (link)

    P2 env.close() not called if env.reset() raises

    env is set to None before the first try block. If gym.make succeeds but env.reset() raises, the exception propagates past both try blocks (the second is never entered, so its finally: if env is not None: env.close() never runs). The live environment is leaked. Wrapping both the make+reset call and the subsequent profiling in a single outer try/finally that calls env.close() unconditionally would fix this.

  2. scripts/benchmarks/runtime.py, line 265-326 (link)

    P2 env.close() not in a try/finally — leaked on exception

    env.close() is the last statement inside the with launch_simulation block. Any exception raised between gym.make and env.close() (e.g., inside stepping.run_runtime_loop, builders.build_runtime, or capture.*) skips the close call. The environment and its simulation resources are not released. Wrapping the block in try/finally ensures cleanup regardless of the exit path.

Reviews (1): Last reviewed commit: "Add unified runtime and startup benchmar..." | Re-trigger Greptile

Comment on lines +288 to +289
start_utc = capture.now_utc_iso()
end_utc = capture.now_utc_iso()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 start_utc / end_utc captured at the same instant — duration_s always ≈ 0

Both timestamps are obtained in consecutive calls inside _run_main, just before bundle assembly. build_run_identity derives duration_s from end_utc - start_utc, so it will always be microseconds even though the actual run lasted many seconds. Compare runtime.py, which captures start_utc before launch_simulation and end_utc after the stepping loop.

start_utc should be captured before the first profiled phase — either as a module-level variable (alongside _imports_time_begin) or passed into _run_main as a parameter so the elapsed wall time of the full profiling session is reflected in the bundle.

Comment on lines +108 to +111
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Incomplete module cached in sys.modules if exec_module raises

The module is registered in sys.modules before exec_module executes it. If exec_module raises (e.g., a syntax error or import error in the target file), the half-initialized module object remains in the cache. A subsequent call with the same module_name hits the early-return path and silently returns the broken module instead of retrying or surfacing the original failure.

Suggested change
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
try:
spec.loader.exec_module(module)
except Exception:
sys.modules.pop(module_name, None)
raise
return module

Introduce the capture, metrics, builders, stepping, profiling, and
backend_descriptor submodules for assembling the schema-v1 benchmark
bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit
several backends in one run via a new attach_bundle hook. Unit tests
cover each submodule plus the schema backend and multi-backend finalize.

Part 1 of a series splitting the oversized benchmark refactor
(core -> runtime/startup -> training -> play).
@AntoineRichard AntoineRichard force-pushed the antoiner/benchmark-runtime-startup branch from 935b759 to c248c6a Compare June 17, 2026 07:43
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 17, 2026
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.

Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
Add backend-agnostic runtime.py (random-action stepping, emits a
RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a
StartupBundle), wired to develop's launch API (launch_simulation and
add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra
without folding). Remove the legacy benchmark_non_rl.py and
benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and
run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve
at _common.get_backend_type.

Part 2 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Part 1 (isaac-sim#6197).
@AntoineRichard AntoineRichard force-pushed the antoiner/benchmark-runtime-startup branch from c248c6a to b50c1eb Compare June 17, 2026 08:53
AntoineRichard added a commit to AntoineRichard/IsaacLab that referenced this pull request Jun 17, 2026
Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl,
sb3}; each adapter runs real training under BenchmarkMonitor and emits a
TrainingBundle via the shared core, with an optional success-metric early
stop. Scripts use develop's launch API (launch_simulation from
isaaclab.app; preset tokens forwarded without folding). Remove the legacy
benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the
run_training_benchmarks.sh runner shell, and the obsolete utils.py helper.

Part 3 of the benchmark refactor series (core -> runtime/startup ->
training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).
@AntoineRichard AntoineRichard changed the title Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/4) Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5) Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant