refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory#5673
refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory#5673pv-nvidia wants to merge 1 commit into
Conversation
b51594f to
4c71366
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
PR #5673: refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory
Summary
This PR contains 3 commits that introduce a service locator pattern and refactor the FrameView backend dispatch logic:
596bedb- Add typedServiceLocatortoSimulationContextdac5a16- Add unit tests and changelog fragment4c71366- Move Fabric/USD dispatch fromFabricFrameViewtoFrameViewfactory
✅ What's Good
-
Clean Service Locator Implementation (
service_locator.py)- Well-designed subscript API (
services[cls] = instance) - Proper lifecycle management with
close()handling - Good defensive coding with
_try_close()checking callable
- Well-designed subscript API (
-
Improved Separation of Concerns
- Factory pattern in
FrameViewnow handles backend selection logic FabricFrameViewis now simpler - only handles Fabric operations- Removed conditional checks inside
FabricFrameViewmethods
- Factory pattern in
-
Comprehensive Tests
test_service_locator.pycovers edge cases (non-callable close, missing keys, etc.)- Good coverage of the service locator API
-
Proper Integration
SimulationContext.clear_instance()now callsservices.close_all()- Services property exposed for external use
🔍 Review Items
1. _FABRIC_SUPPORTED_DEVICES Duplication (Minor)
# In frame_view.py:
_FABRIC_SUPPORTED_DEVICES = ("cpu", "cuda", "cuda:0")This was previously in fabric_frame_view.py. Consider:
- Making this a module-level constant that can be imported
- Or document why the factory owns this knowledge
2. Device Argument Extraction (frame_view.py:58-60)
device = kwargs.get("device", "cpu")
if len(args) >= 2:
device = args[1]This relies on positional argument order. Consider adding a comment documenting the expected signature or using keyword-only enforcement.
3. Return Type Consistency (ServiceLocator.__getitem__)
def __getitem__(self, cls: type[_T]) -> _T | None:
return self._services.get(cls) # returns None if missingReturning None for missing keys differs from standard dict behavior (which raises KeyError). This is documented but worth noting - callers must always handle None.
4. No SettingsManager Import Removed from FabricFrameView
The import of SettingsManager was removed from fabric_frame_view.py since Fabric checks moved to the factory. 👍
📋 Checklist
- Code follows Isaac Lab style guidelines
- Type hints present and correct
- Docstrings provided for public APIs
- Unit tests added for new functionality
- Changelog fragment included
- No obvious regressions
Verdict: Approve ✅
This is a clean refactor that improves code organization by moving dispatch logic to the factory where it belongs. The service locator is a useful addition for managing backend-specific singletons. Minor suggestions above are non-blocking.
Reviewed at commit: 4c71366
4c71366 to
5e6be23
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
PR #5673: refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory
Summary
This PR refactors the backend dispatch logic by moving Fabric/USD selection from FabricFrameView.__init__ to the FrameView factory's _get_backend() method. This is a clean separation-of-concerns improvement.
Changes:
frame_view.py: Factory now handles Fabric-enabled check and device support validationfabric_frame_view.py: Simplified to only handle Fabric operations (assumes Fabric is available)
✅ What's Good
-
Clean Single Responsibility
FabricFrameViewno longer pretends to be one thing while sometimes behaving as another- Each implementation (Fabric, USD, Newton) now has one clear responsibility
- Eliminates the Liskov substitution concern mentioned in the PR description
-
Factory Owns Dispatch Logic
_get_backend()now centralizes all backend selection decisions- Device support check moved to the right location (factory, not implementation)
- Warning message preserved for unsupported device configurations
-
Code Reduction
- Net deletion of ~35 lines from
fabric_frame_view.py - Removed conditional branches (
if not self._use_fabric) throughout the class - Removed the internal
_use_fabricflag and related imports
- Net deletion of ~35 lines from
-
No Behavioral Change for Callers
- Same code paths execute, just dispatched from a cleaner location
- Warning message text identical to original
🔍 Findings
1. Behavior Change When SimulationContext is None (Medium)
Location: frame_view.py:56
if ctx is None:
return "usd" # Previously returned "physx"The old _get_backend() returned "physx" when no context existed, which would instantiate FabricFrameView (with its internal fallback). Now it returns "usd" directly.
Risk: Code that instantiates FrameView before SimulationContext exists will now get UsdFrameView instead of FabricFrameView. This is likely the correct behavior (no context = no physics = USD fallback), but worth documenting or testing.
2. Device Argument Extraction Relies on Positional Order (Low)
Location: frame_view.py:70-72
device = kwargs.get("device", "cpu")
if len(args) >= 2:
device = args[1]This extraction relies on the assumption that device is the second positional argument. Consider:
- Adding a brief comment documenting the expected signature:
FrameView(prim_path, device, ...) - Or using
inspect.signaturefor more robust extraction (though likely overkill for internal code)
3. Multi-GPU Device Support (Informational)
Location: frame_view.py:22
_FABRIC_SUPPORTED_DEVICES = ("cpu", "cuda", "cuda:0")The original code had a TODO about extending to cuda:N for multi-GPU support. This constant was moved but the limitation persists. Not a regression—just noting this is still a future consideration.
4. Eager Registration Pattern is Good (Positive)
Location: frame_view.py:100-102
# Eagerly register UsdFrameView — it lives in isaaclab, not a backend package
FrameView.register("usd", UsdFrameView)This is the right approach since FactoryBase's dynamic import can't discover UsdFrameView in the standard location.
📋 Checklist
- Code follows Isaac Lab style guidelines
- Type hints present and correct
- Docstrings updated to reflect new behavior
- No obvious regressions
- Changelog fragment (CI reports missing—expected for pure refactors)
Verdict
This is a well-executed refactor that improves code organization by moving dispatch logic to the factory where it belongs. The SimulationContext is None behavior change (#1) is worth considering but is likely correct. Minor suggestion for documenting positional arg extraction (#2) is non-blocking.
Reviewed at commit: 5e6be23
Update (3d5caa6): ✅ Changes fully address the refactor goals and improve multi-GPU support.
Incremental Changes Reviewed
-
Multi-GPU Device Support Added (Positive)
- New
_is_fabric_supported_device()helper inframe_view.pynow accepts anycuda:Ndevice - Resolves the TODO mentioned in Finding #3 of original review
- Proper validation via
int(device.split(":", 1)[1])to reject invalid device strings
- New
-
Complete Factory Dispatch (Positive)
- All
if not self._use_fabricconditionals removed fromFabricFrameView _use_fabricflag and related imports fully eliminatedFabricFrameViewis now instantiated only when Fabric path is appropriate
- All
-
Docstring Updates (Positive)
- Updated
FabricFrameViewdocstring correctly states it only handles Fabric operations - Device argument documentation expanded to cover
cuda:1,cuda:2, etc. - Factory docstring clarifies the three PhysX dispatch paths
- Updated
-
Test Coverage (Informational)
test_views_xform_prim_fabric.pyadds multi-GPU test cases forcuda:1- Tests properly guarded by
ISAACLAB_TEST_MULTI_GPUenv var andtorch.cuda.device_count()check
Minor Note
The warning message in the factory differs slightly from the original:
- Original:
"only supported on cpu, cuda, cuda:0"(joined_fabric_supported_devices) - New:
"only supported on cpu and cuda:<N> devices"
This is fine — the new message is more accurate since any CUDA index is now supported.
Verdict
The incremental changes complete the refactor cleanly. Multi-GPU support is a nice bonus. LGTM.
# Description Removes the `cuda:0`-only restriction in `FabricFrameView`. USDRT `SelectPrims` now accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g., `cuda:1`) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU. Changes: - **Drop device allowlist.** Removes `_fabric_supported_devices`, the device guard in `__init__`, and the corresponding assertion in `_initialize_fabric`. Any CUDA device (or CPU) now works. - **Multi-GPU test coverage.** Three `cuda:1`-parameterized tests gated by `ISAACLAB_TEST_MULTI_GPU=1` env var, plus a dedicated CI workflow on the multi-GPU runner that sets it. - **Fix deprecated `wp.to_torch()` calls.** Replaced with `.torch` accessor on ProxyArray (avoids DeprecationWarning). - **TODOs for follow-up PRs.**: - #5673 - #5674 ## Type of change - New feature (non-breaking change which adds functionality) `cuda:0` continues to work exactly as before; `cuda:1`+ now also works instead of silently falling back to USD. No public API surface changed. ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there > Note: this PR uses a fragment file at `source/isaaclab_physx/changelog.d/feat-frame-view-enable-mgpu.rst` per the fragment-based changelog system. ## Test plan Three new tests gated by `ISAACLAB_TEST_MULTI_GPU=1` and parameterized with `["cuda:1"]`: - `test_fabric_cuda1_world_pose_roundtrip` — `set_world_poses` → `get_world_poses` returns the same values on a non-primary CUDA device. - `test_fabric_cuda1_no_usd_writeback` — Fabric writes on `cuda:1` do not write back to USD. - `test_fabric_cuda1_scales_roundtrip` — covers the `set_scales` write path on `cuda:1`. A dedicated CI workflow (`test-fabric-multi-gpu.yaml`) runs on the `[self-hosted, linux, x64, gpu, multi-gpu]` runner with `ISAACLAB_TEST_MULTI_GPU=1` set. Pre-flights with `nvidia-smi` and `torch.cuda.device_count()`, fails loudly if the runner has < 2 GPUs. To verify locally on a multi-GPU machine: ```bash ISAACLAB_TEST_MULTI_GPU=1 ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v ``` To verify the `cuda:0` path is unchanged (multi-GPU tests auto-skip): ```bash ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v ```
…factory
FabricFrameView had an internal _use_fabric flag that fell back to
UsdFrameView when Fabric was disabled or the device was unsupported.
This violated single-responsibility: FabricFrameView pretended to be
one class but sometimes behaved as another.
Now the FrameView factory handles all dispatch:
- PhysX + Fabric enabled + supported device → FabricFrameView
- PhysX without Fabric (or unsupported device) → UsdFrameView
- Newton → NewtonSiteFrameView
FabricFrameView no longer checks _use_fabric or _fabric_supported_devices.
It assumes Fabric is available (the factory guarantees this).
UsdFrameView is eagerly registered on the factory since it lives in
isaaclab (not a backend package), so FactoryBase's dynamic import
(isaaclab_{backend}.sim.views) can't discover it.
5e6be23 to
3d5caa6
Compare
|
Superseded by the consolidated PR #5728 (pv/fabric-full-stack). |
Problem
FabricFrameViewhas an internal_use_fabricflag that falls back toUsdFrameViewwhen Fabric is disabled or the device is unsupported. This is a misleading abstraction: the class claims to be a Fabric-accelerated view but sometimes silently delegates everything to a different (slower) implementation. Callers have no way to know which code path actually ran.Solution
Move the Fabric-enabled + device-supported check from
FabricFrameView.__init__up to theFrameViewfactory's_get_backend()method. The factory now dispatches to three implementations:physxFabricFrameViewusdUsdFrameViewnewtonNewtonSiteFrameViewUsdFrameViewis eagerly registered on the factory since it lives inisaaclab(not a backend package likeisaaclab_physx), soFactoryBase's dynamic import (isaaclab_{backend}.sim.views) can't discover it.Changes
FrameView._get_backend(): checks/physics/fabricEnabledsetting + device support via_is_fabric_supported_device(), returns"physx","usd", or"newton"_is_fabric_supported_device(): accepts any validcuda:<N>index (not justcuda:0) for multi-GPU supportFabricFrameView: stripped of all_use_fabricconditionals,SettingsManagerimport,_fabric_supported_devicesconstant — it just does Fabric, nothing elseUsdFrameView: eagerly registered as"usd"backend on the factoryResult
Each FrameView implementation now has a single responsibility.
FabricFrameViewassumes Fabric is available (the factory guarantees this). No runtime behavior change for existing callers — the same code paths execute, just dispatched from a cleaner location.Multi-GPU setups (
cuda:1,cuda:2, etc.) are properly supported by the device check function.Dependencies
None — this PR applies cleanly to current develop.