Skip to content

feat: add FabricStageCache service for shared hierarchy handles#5676

Draft
pv-nvidia wants to merge 1 commit into
isaac-sim:developfrom
pv-nvidia:pv/fabric-stage-cache
Draft

feat: add FabricStageCache service for shared hierarchy handles#5676
pv-nvidia wants to merge 1 commit into
isaac-sim:developfrom
pv-nvidia:pv/fabric-stage-cache

Conversation

@pv-nvidia
Copy link
Copy Markdown
Contributor

@pv-nvidia pv-nvidia commented May 18, 2026

Motivation

FabricFrameView currently creates its own usdrt.Usd.Stage.Attach() and IFabricHierarchy handle on every initialization. When multiple views exist (common in multi-asset environments), this duplicates work and creates independent hierarchy handles that don't share change-tracking state.

Solution

Introduce FabricStageCache — a service registered on SimulationContext via the typed service locator (#5672, already merged). It caches:

  1. The usdrt.Usd.Stage attachment (already synchronized to Fabric)
  2. IFabricHierarchy handles keyed by Fabric attachment ID

Multiple FabricFrameView instances now share a single hierarchy handle. The cache is automatically closed when SimulationContext.clear_instance() is called.

Changes

  • FabricStageCache: new class in isaaclab_physx.sim
  • FabricFrameView._initialize_fabric(): refactored to use FabricStageCache via SimulationContext.get_service()/set_service()
  • Removed: direct usdrt.Usd.Stage.Attach() and IFabricHierarchy creation from FabricFrameView

Tests

  • 7 new unit tests for FabricStageCache lifecycle (register, retrieve, cache, close, clear_instance, replacement)

@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label May 18, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR introduces a well-designed caching mechanism for Fabric stage attachments and hierarchy handles. The service locator pattern with SimulationContext.services is clean and type-safe.

✅ Strengths

  • Clean architecture: The FabricStageCache service with automatic cleanup on clear_instance() is well-designed
  • Good documentation: Docstrings are thorough with cross-references to related classes
  • Better error handling: Replacing assert with RuntimeError for the device check is correct (not stripped under python -O)
  • Future-proofing: The dict-keyed hierarchy cache anticipates multi-GPU scenarios
  • Comprehensive tests: 7 unit tests covering service lifecycle (register, retrieve, cache, close, clear_instance, replacement)

📁 Files Changed

File Changes
fabric_stage_cache.py +68 lines — new cache class with usdrt.Usd.Stage attachment and IFabricHierarchy handle caching
fabric_frame_view.py +21/-8 lines — refactored to use FabricStageCache via SimulationContext.services
test_fabric_stage_cache.py +101 lines — comprehensive service lifecycle tests
fabric-stage-cache.rst +10 lines — changelog fragment

✅ CI Status

  • pre-commit: ✅ Passing
  • changelog-check: ✅ Passing
  • Build Wheel: ✅ Passing
  • license-check: ⏳ Pending
  • Installation Tests: ⏳ Pending
  • Docs Build: ⏳ Pending

💡 Suggestions (Non-blocking)

Defensive check in get_hierarchy()

If close() is called and then get_hierarchy() is invoked (e.g., due to a stale reference), it will crash with AttributeError on self._stage.GetFabricId(). Consider:

def get_hierarchy(self):
    if self._stage is None:
        raise RuntimeError("FabricStageCache has been closed")
    # ... rest of method

This provides a clearer error message for debugging stale reference issues.

Consider an integration test for multi-view sharing

The unit tests verify the service lifecycle well. An integration test confirming that two FabricFrameView instances share the same hierarchy handle would strengthen confidence:

def test_multiple_frame_views_share_hierarchy(self):
    """Multiple FabricFrameView instances share the same hierarchy handle."""
    view1 = FabricFrameView(prim_paths=["/World/robot1"])
    view2 = FabricFrameView(prim_paths=["/World/robot2"])
    
    assert view1._fabric_hierarchy is view2._fabric_hierarchy

Dependencies

  • Requires #5672 (service locator on SimulationContext) to be merged first

Summary

Clean, focused PR with good architecture and test coverage. Assuming CI tests pass and #5672 merges first, this is ready to go. 👍


Automated review by Isaac Lab Review Bot — updated for SHA ae83c65

@pv-nvidia pv-nvidia force-pushed the pv/fabric-stage-cache branch from 3875437 to 068e120 Compare May 18, 2026 13:09
@pv-nvidia pv-nvidia force-pushed the pv/fabric-stage-cache branch 3 times, most recently from 3ad6108 to 575be92 Compare May 18, 2026 17:10
pv-nvidia added a commit that referenced this pull request May 20, 2026
## Motivation

`SimulationContext` is the natural lifecycle owner for backend-specific
caches (e.g. UsdRT stage handles, Fabric hierarchy data). Currently
these either live as class-level globals (no lifecycle, leak across
stages) or get baked directly into SimulationContext (pollutes it with
backend imports).

## Solution

Add a lightweight typed `ServiceLocator` exposed via
`SimulationContext.services`. Backends register their own singletons
using subscript syntax:

```python
sim.services[FabricStageCache] = FabricStageCache(stage)
cache = sim.services[FabricStageCache]
del sim.services[FabricStageCache]  # closes and removes
```

All registered services are closed when `clear_instance()` is called.
Exceptions during close are collected and raised after full teardown
completes.

## Design

- Keyed by service class (typed retrieval)
- Services with a `close()` method are automatically closed on deletion
or teardown
- `close_all(caught_exceptions)` always collects — no silent failures
- Purely additive — no existing behavior changes

## Downstream

This is used by the Fabric stage cache PR (#5676) to manage
`IFabricHierarchy` handles per stage.
@pv-nvidia pv-nvidia force-pushed the pv/fabric-stage-cache branch 3 times, most recently from ae83c65 to becd3ff Compare May 20, 2026 15:48
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary — Updated for SHA becd3ff

This PR has been rebased onto develop and the implementation remains well-designed. The FabricStageCache service properly manages shared hierarchy handles across multiple FabricFrameView instances.

✅ Strengths

  • Clean architecture: The FabricStageCache service with automatic cleanup on clear_instance() is well-designed
  • Good documentation: Docstrings are thorough with cross-references to related classes
  • Better error handling: Replacing assert with RuntimeError for the device check is correct (not stripped under python -O)
  • Future-proofing: The dict-keyed hierarchy cache anticipates multi-GPU scenarios
  • Comprehensive tests: 7 unit tests covering service lifecycle (register, retrieve, cache, close, clear_instance, replacement)

📁 Files Changed

File Changes
fabric_stage_cache.py +67 lines — new cache class with usdrt.Usd.Stage attachment and IFabricHierarchy handle caching
fabric_frame_view.py +16/-6 lines — refactored to use FabricStageCache via SimulationContext.services
test_fabric_stage_cache.py +101 lines — comprehensive service lifecycle tests
fabric-stage-cache.rst +10 lines — changelog fragment

✅ CI Status

  • pre-commit: ✅ Passing
  • changelog-check: ✅ Passing
  • Build Wheel: ✅ Passing
  • license-check: ⏳ Pending
  • Installation Tests: ⏳ Pending
  • Docs Build: ⏳ Pending

💡 Suggestions (Non-blocking)

Defensive check in get_hierarchy()

If close() is called and then get_hierarchy() is invoked (e.g., due to a stale reference), it will crash with AttributeError on self._stage.GetFabricId(). Consider:

def get_hierarchy(self):
    if self._stage is None:
        raise RuntimeError("FabricStageCache has been closed")
    # ... rest of method

This provides a clearer error message for debugging stale reference issues.

Consider an integration test for multi-view sharing

The unit tests verify the service lifecycle well. An integration test confirming that two FabricFrameView instances share the same hierarchy handle would strengthen confidence:

def test_multiple_frame_views_share_hierarchy(self):
    """Multiple FabricFrameView instances share the same hierarchy handle."""
    view1 = FabricFrameView(prim_paths=["/World/robot1"])
    view2 = FabricFrameView(prim_paths=["/World/robot2"])
    
    assert view1._fabric_hierarchy is view2._fabric_hierarchy

Dependencies

  • Requires #5672 (service locator on SimulationContext) — now merged into develop

Summary

Clean, focused PR with good architecture and test coverage. The rebase onto develop (which now includes #5672) means this PR is ready to land once CI passes. 👍


Automated review by Isaac Lab Review Bot — updated for SHA becd3ff

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary — Updated for SHA 4c262b9

This PR has been rebased onto the latest develop branch. The implementation remains well-designed. The FabricStageCache service properly manages shared hierarchy handles across multiple FabricFrameView instances.

✅ Strengths

  • Clean architecture: The FabricStageCache service with automatic cleanup on clear_instance() is well-designed
  • Good documentation: Docstrings are thorough with cross-references to related classes
  • Better error handling: Replacing assert with RuntimeError for the device check is correct (not stripped under python -O)
  • Future-proofing: The dict-keyed hierarchy cache anticipates multi-GPU scenarios
  • Comprehensive tests: 7 unit tests covering service lifecycle (register, retrieve, cache, close, clear_instance, replacement)

📁 Files Changed

File Changes
fabric_stage_cache.py +68 lines — new cache class with usdrt.Usd.Stage attachment and IFabricHierarchy handle caching
fabric_frame_view.py +16/-6 lines — refactored to use FabricStageCache via SimulationContext.services
test_fabric_stage_cache.py +101 lines — comprehensive service lifecycle tests
fabric-stage-cache.rst +10 lines — changelog fragment

✅ CI Status

  • pre-commit: ✅ Passing
  • changelog-check: ✅ Passing
  • Build Wheel: ✅ Passing
  • Check for Broken Links: ❌ Failed (unrelated to this PR — likely repo-wide doc link issue)
  • license-check: ⏳ Pending
  • Installation Tests: ⏳ Pending
  • Docs Build: ⏳ Pending

💡 Suggestions (Non-blocking)

Defensive check in get_hierarchy()

If close() is called and then get_hierarchy() is invoked (e.g., due to a stale reference), it will crash with AttributeError on self._stage.GetFabricId(). Consider:

def get_hierarchy(self):
    if self._stage is None:
        raise RuntimeError("FabricStageCache has been closed")
    # ... rest of method

This provides a clearer error message for debugging stale reference issues.

Consider an integration test for multi-view sharing

The unit tests verify the service lifecycle well. An integration test confirming that two FabricFrameView instances share the same hierarchy handle would strengthen confidence:

def test_multiple_frame_views_share_hierarchy(self):
    """Multiple FabricFrameView instances share the same hierarchy handle."""
    view1 = FabricFrameView(prim_paths=["/World/robot1"])
    view2 = FabricFrameView(prim_paths=["/World/robot2"])
    
    assert view1._fabric_hierarchy is view2._fabric_hierarchy

Summary

Clean, focused PR with good architecture and test coverage. The rebase onto develop keeps this PR current. Ready to merge once CI passes. 👍


Automated review by Isaac Lab Review Bot — updated for SHA 4c262b9

@pv-nvidia
Copy link
Copy Markdown
Contributor Author

Superseded by the consolidated PR #5728 (pv/fabric-full-stack).

@pv-nvidia pv-nvidia closed this May 22, 2026
@pv-nvidia pv-nvidia reopened this May 22, 2026
@pv-nvidia pv-nvidia force-pushed the pv/fabric-stage-cache branch from 4c262b9 to 612274a Compare May 23, 2026 18:15
Introduces FabricStageCache — a lightweight cache for the usdrt stage
attachment and IFabricHierarchy handles, registered as a service on
SimulationContext via the service locator (set_service/get_service).

Multiple FabricFrameView instances now share a single hierarchy handle
instead of each creating its own. The cache is automatically closed on
SimulationContext.clear_instance().

Also replaces the assert device check with a proper RuntimeError and
removes the now-unused isaaclab.sim import from fabric_frame_view.py.
@pv-nvidia pv-nvidia force-pushed the pv/fabric-stage-cache branch from 612274a to f3d4667 Compare May 23, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant