refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory by pv-nvidia · Pull Request #5673 · isaac-sim/IsaacLab

pv-nvidia · 2026-05-18T12:52:05Z

Problem

FabricFrameView has an internal _use_fabric flag that falls back to UsdFrameView when Fabric is disabled or the device is unsupported. This is a misleading abstraction: the class claims to be a Fabric-accelerated view but sometimes silently delegates everything to a different (slower) implementation. Callers have no way to know which code path actually ran.

Solution

Move the Fabric-enabled + device-supported check from FabricFrameView.__init__ up to the FrameView factory's _get_backend() method. The factory now dispatches to three implementations:

Condition	Backend key	Class
PhysX + Fabric enabled + supported device	`physx`	`FabricFrameView`
PhysX without Fabric or unsupported device	`usd`	`UsdFrameView`
Newton	`newton`	`NewtonSiteFrameView`

UsdFrameView is eagerly registered on the factory since it lives in isaaclab (not a backend package like isaaclab_physx), so FactoryBase's dynamic import (isaaclab_{backend}.sim.views) can't discover it.

Changes

FrameView._get_backend(): checks /physics/fabricEnabled setting + device support via _is_fabric_supported_device(), returns "physx", "usd", or "newton"
_is_fabric_supported_device(): accepts any valid cuda:<N> index (not just cuda:0) for multi-GPU support
FabricFrameView: stripped of all _use_fabric conditionals, SettingsManager import, _fabric_supported_devices constant — it just does Fabric, nothing else
UsdFrameView: eagerly registered as "usd" backend on the factory

Result

Each FrameView implementation now has a single responsibility. FabricFrameView assumes Fabric is available (the factory guarantees this). No runtime behavior change for existing callers — the same code paths execute, just dispatched from a cleaner location.

Multi-GPU setups (cuda:1, cuda:2, etc.) are properly supported by the device check function.

Dependencies

None — this PR applies cleanly to current develop.

isaaclab-review-bot

🤖 Isaac Lab Review Bot

PR #5673: refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory

Summary

This PR contains 3 commits that introduce a service locator pattern and refactor the FrameView backend dispatch logic:

596bedb - Add typed ServiceLocator to SimulationContext
dac5a16 - Add unit tests and changelog fragment
4c71366 - Move Fabric/USD dispatch from FabricFrameView to FrameView factory

✅ What's Good

Clean Service Locator Implementation (service_locator.py)
- Well-designed subscript API (services[cls] = instance)
- Proper lifecycle management with close() handling
- Good defensive coding with _try_close() checking callable
Improved Separation of Concerns
- Factory pattern in FrameView now handles backend selection logic
- FabricFrameView is now simpler - only handles Fabric operations
- Removed conditional checks inside FabricFrameView methods
Comprehensive Tests
- test_service_locator.py covers edge cases (non-callable close, missing keys, etc.)
- Good coverage of the service locator API
Proper Integration
- SimulationContext.clear_instance() now calls services.close_all()
- Services property exposed for external use

🔍 Review Items

1. `_FABRIC_SUPPORTED_DEVICES` Duplication (Minor)

# In frame_view.py:
_FABRIC_SUPPORTED_DEVICES = ("cpu", "cuda", "cuda:0")

This was previously in fabric_frame_view.py. Consider:

Making this a module-level constant that can be imported
Or document why the factory owns this knowledge

2. Device Argument Extraction (frame_view.py:58-60)

device = kwargs.get("device", "cpu")
if len(args) >= 2:
    device = args[1]

This relies on positional argument order. Consider adding a comment documenting the expected signature or using keyword-only enforcement.

3. Return Type Consistency (`ServiceLocator.getitem`)

def __getitem__(self, cls: type[_T]) -> _T | None:
    return self._services.get(cls)  # returns None if missing

Returning None for missing keys differs from standard dict behavior (which raises KeyError). This is documented but worth noting - callers must always handle None.

4. No `SettingsManager` Import Removed from `FabricFrameView`

The import of SettingsManager was removed from fabric_frame_view.py since Fabric checks moved to the factory. 👍

📋 Checklist

Code follows Isaac Lab style guidelines
Type hints present and correct
Docstrings provided for public APIs
Unit tests added for new functionality
Changelog fragment included
No obvious regressions

Verdict: Approve ✅

This is a clean refactor that improves code organization by moving dispatch logic to the factory where it belongs. The service locator is a useful addition for managing backend-specific singletons. Minor suggestions above are non-blocking.

Reviewed at commit: 4c71366

isaaclab-review-bot

🤖 Isaac Lab Review Bot

PR #5673: refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory

Summary

This PR refactors the backend dispatch logic by moving Fabric/USD selection from FabricFrameView.__init__ to the FrameView factory's _get_backend() method. This is a clean separation-of-concerns improvement.

Changes:

frame_view.py: Factory now handles Fabric-enabled check and device support validation
fabric_frame_view.py: Simplified to only handle Fabric operations (assumes Fabric is available)

✅ What's Good

Clean Single Responsibility
- FabricFrameView no longer pretends to be one thing while sometimes behaving as another
- Each implementation (Fabric, USD, Newton) now has one clear responsibility
- Eliminates the Liskov substitution concern mentioned in the PR description
Factory Owns Dispatch Logic
- _get_backend() now centralizes all backend selection decisions
- Device support check moved to the right location (factory, not implementation)
- Warning message preserved for unsupported device configurations
Code Reduction
- Net deletion of ~35 lines from fabric_frame_view.py
- Removed conditional branches (if not self._use_fabric) throughout the class
- Removed the internal _use_fabric flag and related imports
No Behavioral Change for Callers
- Same code paths execute, just dispatched from a cleaner location
- Warning message text identical to original

🔍 Findings

1. Behavior Change When `SimulationContext` is None (Medium)

Location: frame_view.py:56

if ctx is None:
    return "usd"  # Previously returned "physx"

The old _get_backend() returned "physx" when no context existed, which would instantiate FabricFrameView (with its internal fallback). Now it returns "usd" directly.

Risk: Code that instantiates FrameView before SimulationContext exists will now get UsdFrameView instead of FabricFrameView. This is likely the correct behavior (no context = no physics = USD fallback), but worth documenting or testing.

2. Device Argument Extraction Relies on Positional Order (Low)

Location: frame_view.py:70-72

device = kwargs.get("device", "cpu")
if len(args) >= 2:
    device = args[1]

This extraction relies on the assumption that device is the second positional argument. Consider:

Adding a brief comment documenting the expected signature: FrameView(prim_path, device, ...)
Or using inspect.signature for more robust extraction (though likely overkill for internal code)

3. Multi-GPU Device Support (Informational)

Location: frame_view.py:22

_FABRIC_SUPPORTED_DEVICES = ("cpu", "cuda", "cuda:0")

The original code had a TODO about extending to cuda:N for multi-GPU support. This constant was moved but the limitation persists. Not a regression—just noting this is still a future consideration.

4. Eager Registration Pattern is Good (Positive)

Location: frame_view.py:100-102

# Eagerly register UsdFrameView — it lives in isaaclab, not a backend package
FrameView.register("usd", UsdFrameView)

This is the right approach since FactoryBase's dynamic import can't discover UsdFrameView in the standard location.

📋 Checklist

Code follows Isaac Lab style guidelines
Type hints present and correct
Docstrings updated to reflect new behavior
No obvious regressions
Changelog fragment (CI reports missing—expected for pure refactors)

Verdict

This is a well-executed refactor that improves code organization by moving dispatch logic to the factory where it belongs. The SimulationContext is None behavior change (#1) is worth considering but is likely correct. Minor suggestion for documenting positional arg extraction (#2) is non-blocking.

Reviewed at commit: 5e6be23

Update (3d5caa6): ✅ Changes fully address the refactor goals and improve multi-GPU support.

Incremental Changes Reviewed

Multi-GPU Device Support Added (Positive)
- New _is_fabric_supported_device() helper in frame_view.py now accepts any cuda:N device
- Resolves the TODO mentioned in Finding #3 of original review
- Proper validation via int(device.split(":", 1)[1]) to reject invalid device strings
Complete Factory Dispatch (Positive)
- All if not self._use_fabric conditionals removed from FabricFrameView
- _use_fabric flag and related imports fully eliminated
- FabricFrameView is now instantiated only when Fabric path is appropriate
Docstring Updates (Positive)
- Updated FabricFrameView docstring correctly states it only handles Fabric operations
- Device argument documentation expanded to cover cuda:1, cuda:2, etc.
- Factory docstring clarifies the three PhysX dispatch paths
Test Coverage (Informational)
- test_views_xform_prim_fabric.py adds multi-GPU test cases for cuda:1
- Tests properly guarded by ISAACLAB_TEST_MULTI_GPU env var and torch.cuda.device_count() check

Minor Note

The warning message in the factory differs slightly from the original:

Original: "only supported on cpu, cuda, cuda:0" (joined _fabric_supported_devices)
New: "only supported on cpu and cuda:<N> devices"

This is fine — the new message is more accurate since any CUDA index is now supported.

Verdict

The incremental changes complete the refactor cleanly. Multi-GPU support is a nice bonus. LGTM.

# Description Removes the `cuda:0`-only restriction in `FabricFrameView`. USDRT `SelectPrims` now accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g., `cuda:1`) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU. Changes: - **Drop device allowlist.** Removes `_fabric_supported_devices`, the device guard in `__init__`, and the corresponding assertion in `_initialize_fabric`. Any CUDA device (or CPU) now works. - **Multi-GPU test coverage.** Three `cuda:1`-parameterized tests gated by `ISAACLAB_TEST_MULTI_GPU=1` env var, plus a dedicated CI workflow on the multi-GPU runner that sets it. - **Fix deprecated `wp.to_torch()` calls.** Replaced with `.torch` accessor on ProxyArray (avoids DeprecationWarning). - **TODOs for follow-up PRs.**: - #5673 - #5674 ## Type of change - New feature (non-breaking change which adds functionality) `cuda:0` continues to work exactly as before; `cuda:1`+ now also works instead of silently falling back to USD. No public API surface changed. ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there > Note: this PR uses a fragment file at `source/isaaclab_physx/changelog.d/feat-frame-view-enable-mgpu.rst` per the fragment-based changelog system. ## Test plan Three new tests gated by `ISAACLAB_TEST_MULTI_GPU=1` and parameterized with `["cuda:1"]`: - `test_fabric_cuda1_world_pose_roundtrip` — `set_world_poses` → `get_world_poses` returns the same values on a non-primary CUDA device. - `test_fabric_cuda1_no_usd_writeback` — Fabric writes on `cuda:1` do not write back to USD. - `test_fabric_cuda1_scales_roundtrip` — covers the `set_scales` write path on `cuda:1`. A dedicated CI workflow (`test-fabric-multi-gpu.yaml`) runs on the `[self-hosted, linux, x64, gpu, multi-gpu]` runner with `ISAACLAB_TEST_MULTI_GPU=1` set. Pre-flights with `nvidia-smi` and `torch.cuda.device_count()`, fails loudly if the runner has < 2 GPUs. To verify locally on a multi-GPU machine: ```bash ISAACLAB_TEST_MULTI_GPU=1 ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v ``` To verify the `cuda:0` path is unchanged (multi-GPU tests auto-skip): ```bash ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v ```

…factory FabricFrameView had an internal _use_fabric flag that fell back to UsdFrameView when Fabric was disabled or the device was unsupported. This violated single-responsibility: FabricFrameView pretended to be one class but sometimes behaved as another. Now the FrameView factory handles all dispatch: - PhysX + Fabric enabled + supported device → FabricFrameView - PhysX without Fabric (or unsupported device) → UsdFrameView - Newton → NewtonSiteFrameView FabricFrameView no longer checks _use_fabric or _fabric_supported_devices. It assumes Fabric is available (the factory guarantees this). UsdFrameView is eagerly registered on the factory since it lives in isaaclab (not a backend package), so FactoryBase's dynamic import (isaaclab_{backend}.sim.views) can't discover it.

pv-nvidia · 2026-05-22T13:11:33Z

Superseded by the consolidated PR #5728 (pv/fabric-full-stack).

github-actions Bot added the isaac-lab Related to Isaac Lab team label May 18, 2026

pv-nvidia force-pushed the pv/fabric-view-no-fallback branch 3 times, most recently from b51594f to 4c71366 Compare May 18, 2026 17:10

isaaclab-review-bot Bot approved these changes May 18, 2026

View reviewed changes

pv-nvidia force-pushed the pv/fabric-view-no-fallback branch from 4c71366 to 5e6be23 Compare May 20, 2026 14:10

isaaclab-review-bot Bot reviewed May 20, 2026

View reviewed changes

isaaclab-review-bot Bot mentioned this pull request May 20, 2026

Enable mgpu in FrameView #5514

Merged

7 tasks

pv-nvidia mentioned this pull request May 21, 2026

feat: Full Fabric acceleration stack — local poses, stage cache, fused compose #5728

Draft

pv-nvidia force-pushed the pv/fabric-view-no-fallback branch from 5e6be23 to 3d5caa6 Compare May 22, 2026 00:05

pv-nvidia closed this May 22, 2026

pv-nvidia reopened this May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory#5673

refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory#5673
pv-nvidia wants to merge 1 commit into
isaac-sim:developfrom
pv-nvidia:pv/fabric-view-no-fallback

pv-nvidia commented May 18, 2026 •

edited

Loading

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

isaaclab-review-bot Bot left a comment •

edited

Loading

Uh oh!

pv-nvidia commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pv-nvidia commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Result

Dependencies

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

✅ What's Good

🔍 Review Items

1. _FABRIC_SUPPORTED_DEVICES Duplication (Minor)

2. Device Argument Extraction (frame_view.py:58-60)

3. Return Type Consistency (ServiceLocator.__getitem__)

4. No SettingsManager Import Removed from FabricFrameView

📋 Checklist

Verdict: Approve ✅

Uh oh!

isaaclab-review-bot Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

🤖 Isaac Lab Review Bot

Summary

✅ What's Good

🔍 Findings

1. Behavior Change When SimulationContext is None (Medium)

2. Device Argument Extraction Relies on Positional Order (Low)

3. Multi-GPU Device Support (Informational)

4. Eager Registration Pattern is Good (Positive)

📋 Checklist

Verdict

Incremental Changes Reviewed

Minor Note

Verdict

Uh oh!

pv-nvidia commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pv-nvidia commented May 18, 2026 •

edited

Loading

1. `_FABRIC_SUPPORTED_DEVICES` Duplication (Minor)

3. Return Type Consistency (`ServiceLocator.getitem`)

4. No `SettingsManager` Import Removed from `FabricFrameView`

isaaclab-review-bot Bot left a comment •

edited

Loading

1. Behavior Change When `SimulationContext` is None (Medium)