Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5) by AntoineRichard · Pull Request #6198 · isaac-sim/IsaacLab

AntoineRichard · 2026-06-16T07:34:52Z

Description

Part 2 of 5 of the benchmark refactor series — the unified non-RL entry scripts.

Stacked on Part 1 (#6197). The diff against develop below also includes Part 1's core until #6197 merges. For the incremental Part 2 changes only, view:
AntoineRichard/IsaacLab@antoiner/benchmark-core...antoiner/benchmark-runtime-startup

Series: Part 1/5 core (#6197) → Part 2/5 runtime + startup (this PR) → Part 3/5 training (#6199) → Part 4/5 play (#6201) → Part 5/5 cleanup.

This PR is purely additive — it adds the new scripts alongside the existing benchmark_non_rl.py / benchmark_startup.py / run_*.sh, which keep working unchanged. Removal of the legacy scripts is deferred to Part 5/5 so downstream consumers can migrate at their own pace.

Adds:

scripts/benchmarks/runtime.py — steps an environment with random actions (no policy) and emits a RuntimeBundle.
scripts/benchmarks/startup.py — cProfile startup-phase profiling (5 phases), emits a StartupBundle.
scripts/benchmarks/_common.py — shared CLI helpers (get_backend_type(s), preset_tokens, module loader).
Smoke tests for both, gated up-front on Isaac Sim availability (a genuine non-zero exit hard-fails).

Backends are selected with presets= Hydra tokens (same as train.py); the output format is chosen with --benchmark_backend (defaults to schema, accepts a comma-separated list, e.g. schema,omniperf). Existing OmniPerf / JSON / Osmo outputs are unchanged except for the additive peak rows contributed by Part 1's recorders.

Validated on develop (Newton/MJWarp): both smoke suites pass, including the multi-backend (schema,omniperf) run that emits two distinct output files.

Fixes # (n/a)

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation (testing/benchmarks.rst, migration/comparing_simulation_isaacgym.rst)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package
I have added my name to the CONTRIBUTORS.md or my name already exists there

Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl, sb3}; each adapter runs real training under BenchmarkMonitor and emits a TrainingBundle via the shared core, with an optional success-metric early stop. Scripts use develop's launch API (launch_simulation from isaaclab.app; preset tokens forwarded without folding). Remove the legacy benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the run_training_benchmarks.sh runner shell, and the obsolete utils.py helper. Part 3 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).

greptile-apps · 2026-06-16T12:38:02Z

Greptile Summary

This PR adds unified runtime.py and startup.py benchmark entry scripts that replace the legacy benchmark_non_rl.py, benchmark_startup.py, and their shell-runner wrappers, and introduces the supporting core library modules (builders, capture, metrics, profiling, stepping, backend_descriptor, SchemaBundleFile) along with a comprehensive test suite.

scripts/benchmarks/runtime.py — steps an environment with random actions, records per-frame wall times, and emits a RuntimeBundle JSON; benchmark_non_rl.py and run_non_rl_benchmarks.sh are removed.
scripts/benchmarks/startup.py — wraps each of five startup phases in its own cProfile session and emits a StartupBundle JSON; benchmark_startup.py and run_physx_benchmarks.sh are removed.
source/isaaclab/isaaclab/test/benchmark/ — new pure-stdlib helper modules (builders, capture, metrics, profiling, stepping) and SchemaBundleFile backend; BaseIsaacLabBenchmark is extended to accept multiple comma-separated backends and a typed attach_bundle hook.

Confidence Score: 3/5

The core library modules and tests are solid, but two data-correctness bugs in the entry scripts need fixing before the output can be trusted.

In startup.py, both start_utc and end_utc are captured in consecutive calls inside _run_main just before bundle assembly, so duration_s in every StartupBundle is always approximately zero regardless of how long the profiling session actually took. Separately, import_module_from_path in _common.py registers the module in sys.modules before calling exec_module; if exec_module raises, the half-initialized module object is left in the cache and all subsequent calls silently return the broken object rather than re-raising. Both issues produce silent wrong data rather than loud failures, making them easy to miss in testing.

scripts/benchmarks/startup.py (timestamp ordering) and scripts/benchmarks/_common.py (import_module_from_path error path) need the most attention before merge.

Important Files Changed

Filename	Overview
scripts/benchmarks/startup.py	New startup profiling entry point; two bugs: `start_utc`/`end_utc` captured simultaneously (making `duration_s` always ~0), and `env.close()` not called when `env.reset()` raises.
scripts/benchmarks/_common.py	New shared CLI helpers; `import_module_from_path` leaves a broken module in `sys.modules` if `exec_module` raises, causing silent failures on retry.
scripts/benchmarks/runtime.py	New runtime benchmark entry point; `env.close()` is not inside a `try/finally`, so it is skipped if any exception occurs after `gym.make`.
source/isaaclab/isaaclab/test/benchmark/benchmark_core.py	Extended to accept list/comma-separated backends and `attach_bundle`; multi-backend finalization loop and deduplication look correct.
source/isaaclab/isaaclab/test/benchmark/backends.py	Added `SchemaBundleFile` backend; lazy `serialize` import is a clean design; no-bundle warning path is correctly handled.
source/isaaclab/isaaclab/test/benchmark/builders.py	New pure-assembly functions for RuntimeBundle, TrainingBundle, and StartupBundle; clean, well-documented, no issues.
source/isaaclab/isaaclab/test/benchmark/capture.py	New stdlib-only capture helpers; defensive defaulting throughout, clean design.
source/isaaclab/isaaclab/test/benchmark/metrics.py	New metric helpers (TensorBoard parsing, EMA, MeanStd, convergence checking, success-rate tracking); no issues found.
source/isaaclab/isaaclab/test/benchmark/profiling.py	New cProfile parsing; uses internal `stats.stats` dict (noted as non-public API with a migration comment), whitelist+placeholder logic correct.
source/isaaclab/isaaclab/test/benchmark/stepping.py	New lightweight stepping helper; single-agent and multi-agent paths, no issues.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant CLI as CLI / isaaclab.sh
    participant RT as runtime.py / startup.py
    participant Common as _common.py
    participant Core as BaseIsaacLabBenchmark
    participant Backends as MetricsBackend
    participant Schema as SchemaBundleFile
    participant Builders as builders.py
    participant Capture as capture.py

    CLI->>RT: argv
    RT->>Common: get_backend_types(--benchmark_backend)
    RT->>Common: preset_tokens(remaining)
    RT->>Core: "__init__(backend_type=[...])"
    Core->>Backends: get_instance(type) x N
    Backends-->>Core: [SchemaBundleFile, OmniPerfKPIFile, ...]
    RT->>Capture: capture_versions / capture_hardware / capture_resources
    Capture-->>RT: Versions, Hardware, Resources
    RT->>Builders: build_runtime_bundle / build_startup_bundle
    Builders-->>RT: RuntimeBundle / StartupBundle
    RT->>Core: attach_bundle(bundle)
    RT->>Core: _finalize_impl()
    loop for each backend
        Core->>Backends: add_metrics(phase)
        Core->>Backends: "finalize(path, filename_backend, bundle=bundle)"
        alt "backend == schema"
            Schema->>Schema: write_bundle_file(bundle, path)
        else other backend
            Backends->>Backends: write flat KPI JSON
        end
    end

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant CLI as CLI / isaaclab.sh
    participant RT as runtime.py / startup.py
    participant Common as _common.py
    participant Core as BaseIsaacLabBenchmark
    participant Backends as MetricsBackend
    participant Schema as SchemaBundleFile
    participant Builders as builders.py
    participant Capture as capture.py

    CLI->>RT: argv
    RT->>Common: get_backend_types(--benchmark_backend)
    RT->>Common: preset_tokens(remaining)
    RT->>Core: "__init__(backend_type=[...])"
    Core->>Backends: get_instance(type) x N
    Backends-->>Core: [SchemaBundleFile, OmniPerfKPIFile, ...]
    RT->>Capture: capture_versions / capture_hardware / capture_resources
    Capture-->>RT: Versions, Hardware, Resources
    RT->>Builders: build_runtime_bundle / build_startup_bundle
    Builders-->>RT: RuntimeBundle / StartupBundle
    RT->>Core: attach_bundle(bundle)
    RT->>Core: _finalize_impl()
    loop for each backend
        Core->>Backends: add_metrics(phase)
        Core->>Backends: "finalize(path, filename_backend, bundle=bundle)"
        alt "backend == schema"
            Schema->>Schema: write_bundle_file(bundle, path)
        else other backend
            Backends->>Backends: write flat KPI JSON
        end
    end

Comments Outside Diff (2)

scripts/benchmarks/startup.py, line 556-560 (link)

env.close() not called if env.reset() raises

env is set to None before the first try block. If gym.make succeeds but env.reset() raises, the exception propagates past both try blocks (the second is never entered, so its finally: if env is not None: env.close() never runs). The live environment is leaked. Wrapping both the make+reset call and the subsequent profiling in a single outer try/finally that calls env.close() unconditionally would fix this.
scripts/benchmarks/runtime.py, line 265-326 (link)

env.close() not in a try/finally — leaked on exception

env.close() is the last statement inside the with launch_simulation block. Any exception raised between gym.make and env.close() (e.g., inside stepping.run_runtime_loop, builders.build_runtime, or capture.*) skips the close call. The environment and its simulation resources are not released. Wrapping the block in try/finally ensures cleanup regardless of the exit path.

_{Reviews (1): Last reviewed commit: "Add unified runtime and startup benchmar..." | Re-trigger Greptile}

greptile-apps · 2026-06-16T12:38:06Z

+        start_utc = capture.now_utc_iso()
+        end_utc = capture.now_utc_iso()


start_utc / end_utc captured at the same instant — duration_s always ≈ 0

Both timestamps are obtained in consecutive calls inside _run_main, just before bundle assembly. build_run_identity derives duration_s from end_utc - start_utc, so it will always be microseconds even though the actual run lasted many seconds. Compare runtime.py, which captures start_utc before launch_simulation and end_utc after the stepping loop.

start_utc should be captured before the first profiled phase — either as a module-level variable (alongside _imports_time_begin) or passed into _run_main as a parameter so the elapsed wall time of the full profiling session is reflected in the bundle.

greptile-apps · 2026-06-16T12:38:07Z

+    module = importlib.util.module_from_spec(spec)
+    sys.modules[module_name] = module
+    spec.loader.exec_module(module)
+    return module


Incomplete module cached in sys.modules if exec_module raises

The module is registered in sys.modules before exec_module executes it. If exec_module raises (e.g., a syntax error or import error in the target file), the half-initialized module object remains in the cache. A subsequent call with the same module_name hits the early-return path and silently returns the broken module instead of retrying or surfacing the original failure.

Suggested change

module = importlib.util.module_from_spec(spec)

sys.modules[module_name] = module

spec.loader.exec_module(module)

return module

module = importlib.util.module_from_spec(spec)

sys.modules[module_name] = module

try:

spec.loader.exec_module(module)

except Exception:

sys.modules.pop(module_name, None)

raise

return module

Introduce the capture, metrics, builders, stepping, profiling, and backend_descriptor submodules for assembling the schema-v1 benchmark bundles, add a schema output backend, and let BaseIsaacLabBenchmark emit several backends in one run via a new attach_bundle hook. Unit tests cover each submodule plus the schema backend and multi-backend finalize. Part 1 of a series splitting the oversized benchmark refactor (core -> runtime/startup -> training -> play).

Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl, sb3}; each adapter runs real training under BenchmarkMonitor and emits a TrainingBundle via the shared core, with an optional success-metric early stop. Scripts use develop's launch API (launch_simulation from isaaclab.app; preset tokens forwarded without folding). Remove the legacy benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the run_training_benchmarks.sh runner shell, and the obsolete utils.py helper. Part 3 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).

Add backend-agnostic runtime.py (random-action stepping, emits a RuntimeBundle) and startup.py (cProfile startup-phase profiling, emits a StartupBundle), wired to develop's launch API (launch_simulation and add_launcher_args from isaaclab.app; preset tokens forwarded to Hydra without folding). Remove the legacy benchmark_non_rl.py and benchmark_startup.py scripts plus the run_non_rl_benchmarks.sh and run_physx_benchmarks.sh runner shells; repoint benchmark_hydra_resolve at _common.get_backend_type. Part 2 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Part 1 (isaac-sim#6197).

Add training.py dispatching over --rl_library {rsl_rl, rl_games, skrl, sb3}; each adapter runs real training under BenchmarkMonitor and emits a TrainingBundle via the shared core, with an optional success-metric early stop. Scripts use develop's launch API (launch_simulation from isaaclab.app; preset tokens forwarded without folding). Remove the legacy benchmark_rsl_rl.py / benchmark_rlgames.py scripts, the run_training_benchmarks.sh runner shell, and the obsolete utils.py helper. Part 3 of the benchmark refactor series (core -> runtime/startup -> training -> play); stacked on Parts 1-2 (isaac-sim#6197, isaac-sim#6198).

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels Jun 16, 2026

This was referenced Jun 16, 2026

Add unified training benchmark dispatcher and RL adapters (benchmark refactor, Part 3/5) #6199

Open

Add play (inference) benchmark with PlayBundle (benchmark refactor, Part 4/5) #6201

Open

AntoineRichard force-pushed the antoiner/benchmark-runtime-startup branch from e1f4ef7 to 935b759 Compare June 16, 2026 12:22

AntoineRichard marked this pull request as ready for review June 16, 2026 12:33

AntoineRichard requested review from Mayankm96, jtigue-bdai, kellyguo11 and ooctipus as code owners June 16, 2026 12:33

greptile-apps Bot reviewed Jun 16, 2026

View reviewed changes

AntoineRichard force-pushed the antoiner/benchmark-runtime-startup branch from 935b759 to c248c6a Compare June 17, 2026 07:43

AntoineRichard force-pushed the antoiner/benchmark-runtime-startup branch from c248c6a to b50c1eb Compare June 17, 2026 08:53

This was referenced Jun 17, 2026

Remove legacy benchmark scripts (benchmark refactor, Part 5/5) #6206

Open

Add backend-agnostic benchmark core (benchmark refactor, Part 1/5) #6197

Open

AntoineRichard changed the title ~~Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/4)~~ Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5) Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5)#6198

Add unified runtime and startup benchmark scripts (benchmark refactor, Part 2/5)#6198
AntoineRichard wants to merge 2 commits into
isaac-sim:developfrom
AntoineRichard:antoiner/benchmark-runtime-startup

AntoineRichard commented Jun 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

greptile-apps Bot Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		start_utc = capture.now_utc_iso()
		end_utc = capture.now_utc_iso()

Conversation

AntoineRichard commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AntoineRichard commented Jun 16, 2026 •

edited

Loading

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading