Skip to content

feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677

Draft
pv-nvidia wants to merge 7 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses
Draft

feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677
pv-nvidia wants to merge 7 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses

Conversation

@pv-nvidia
Copy link
Copy Markdown
Contributor

Motivation

Resolves the core feature request: Fabric-accelerated local-pose operations for FabricFrameView. Previously, only world poses were GPU-accelerated via Fabric — local poses fell back to USD round-trips, defeating the purpose of Fabric for parent-child hierarchies.

Changes

New operations on FabricFrameView:

Method Description
get_local_poses() Read local pos/quat from omni:fabric:localMatrix via indexed kernels
set_local_poses() Write local pos/quat → propagate to world matrix
get_scales() Read scale from Fabric world matrix
set_scales() Write scale → recompose world matrix

Dirty propagation:

  • set_local_poses() marks world matrix dirty → next get_world_poses() re-propagates
  • set_world_poses() marks local matrix dirty → next get_local_poses() re-derives
  • Per-view tracking: multiple views don't interfere with each other's dirty state

Topology recovery:

  • _rebuild_fabric_arrays(): when PrepareForReuse() detects topology changes (prims added/removed), all selections and indexed arrays are rebuilt automatically instead of crashing

Infrastructure used:

Tests

49 tests pass (36 existing + 13 new):

  • Local-pose round-trips (set→get identity)
  • World↔local propagation correctness
  • Per-view dirty isolation
  • Topology-change recovery
  • Scaled parent/child combinations

Dependencies


Merge Order

This PR is part of a stacked series for Fabric-accelerated local poses:

  1. feat: add typed service locator to SimulationContext #5672 — service locator
  2. feat: add FabricStageCache service for shared hierarchy handles #5676 — FabricStageCache
  3. feat: add indexed fabric transform kernels for local/world matrix propagation #5675 — indexed fabric kernels
  4. refactor: move Fabric/USD dispatch from FabricFrameView to FrameView factory #5673 — factory dispatch (independent refactor)
  5. refactor: fuse set_world_poses/set_scales into single _compose_fabric_transform #5674 — fused compose (independent refactor)
  6. feat: Fabric-accelerated get/set_local_poses via indexedfabricarray #5677 — Fabric local poses ← this PR (depends on feat: add typed service locator to SimulationContext #5672, feat: add FabricStageCache service for shared hierarchy handles #5676, feat: add indexed fabric transform kernels for local/world matrix propagation #5675)

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 18, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Review of new commits: PR was rebased and expanded since last review.
Previously reviewed: b15d6235 | Now reviewing: 80838c00

Summary

The PR has been expanded from 5 to 6 commits, adding Fabric-accelerated get/set_local_poses:

  1. Service locator infrastructure on SimulationContext
  2. Service locator tests + changelog
  3. Indexed Fabric transform kernels in isaaclab.utils.warp.fabric
  4. FabricStageCache as a shared hierarchy handle
  5. (NEW) Merge commit to consolidate branches
  6. (NEW) FabricFrameView rewrite with get/set_local_poses + dirty tracking

✅ New Additions Since Last Review

  • Fabric-accelerated local poses: set_local_poses / get_local_poses now use wp.indexedfabricarray to read/write omni:fabric:localMatrix directly on the GPU
  • Bidirectional world↔local sync:
    • set_world_poses → recomputes localMatrix via _sync_local_from_world()
    • set_local_poses → marks _world_dirty, world recomputed on next get_world_poses
  • Per-view dirty tracking: _world_dirty flag is instance-scoped, so concurrent views on the same stage don't clear each other's flag
  • Parent matrix handling: _build_parent_indexed_array() + _compute_parent_fabric_indices() for parent world matrix lookups
  • Topology-adaptive: PrepareForReuse() calls + _rebuild_trans_ro_arrays() for automatic recovery
  • Comprehensive tests: 13 new integration tests covering local/world consistency, rotated/scaled parents, multi-view isolation

🔧 Remaining Observations

[Minor] Index array dtype mismatch still present

The kernels declare indices: ArrayUInt32, but _compute_fabric_indices() returns dtype=wp.int32:

# fabric_frame_view.py
return wp.array(indices, dtype=wp.int32, device=self._device)

Warp will silently cast, so this works in practice. Suggestion: switch to dtype=wp.uint32 for consistency with the kernel signatures. Not blocking.

[Minor] Undefined buffer references in get_local_poses

get_local_poses references self._fabric_local_translations_buf and self._fabric_local_orientations_buf:

if use_cached:
    translations_wp = self._fabric_local_translations_buf
    orientations_wp = self._fabric_local_orientations_buf

These don't appear to be initialized in _initialize_fabric(). Verify these buffers are created alongside the existing world-pose buffers.

📋 Architecture Notes

The world↔local propagation design is clean:

  • Write world → update local: _sync_local_from_world() runs update_indexed_local_matrix_from_world kernel immediately after world writes
  • Write local → lazy world update: _world_dirty flag defers update_indexed_world_matrix_from_local until the next world read

This asymmetry makes sense: world writes are typically followed by physics steps (which don't need locals), while local writes are often followed by world reads for rendering.

📋 Verdict

LGTM — the new local-pose acceleration is a significant feature addition. The bidirectional sync logic is well-designed, and the test coverage is comprehensive (33 test functions across all new modules). The minor dtype observation from the previous review remains, plus one potential undefined buffer issue to double-check.


Automated review by isaaclab-review-bot • Reviewed at 80838c00

@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch 3 times, most recently from 40b5aeb to 80838c0 Compare May 18, 2026 17:10
pv-nvidia added 7 commits May 20, 2026 13:52
…pagation

Add Warp kernels that operate on wp.indexedfabricarray for direct
local↔world matrix propagation without round-tripping through USD:

- decompose_indexed_fabric_transforms: extract pos/quat/scale from ifa
- compose_indexed_fabric_transforms: write pos/quat/scale into ifa
- update_indexed_local_matrix_from_world: local = inv(parent) * world
- update_indexed_world_matrix_from_local: world = parent * local

Also refactor existing kernels to use wp.where (branchless) instead of
if/else for broadcast index selection.

These kernels are the foundation for Fabric-accelerated get/set_local_poses
in FabricFrameView.
…factory

FabricFrameView had an internal _use_fabric flag that fell back to
UsdFrameView when Fabric was disabled or the device was unsupported.
This violated single-responsibility: FabricFrameView pretended to be
one class but sometimes behaved as another.

Now the FrameView factory handles all dispatch:
- PhysX + Fabric enabled + supported device → FabricFrameView
- PhysX without Fabric (or unsupported device) → UsdFrameView
- Newton → NewtonSiteFrameView

FabricFrameView no longer checks _use_fabric or _fabric_supported_devices.
It assumes Fabric is available (the factory guarantees this).

UsdFrameView is eagerly registered on the factory since it lives in
isaaclab (not a backend package), so FactoryBase's dynamic import
(isaaclab_{backend}.sim.views) can't discover it.
Extract _compose_fabric_transform() to deduplicate the kernel-launch
logic shared by set_world_poses and set_scales. The initial USD->Fabric
sync now composes position, orientation, and scale in one call, so
PrepareForReuse is invoked exactly once per logical update.

Also replace assert with RuntimeError in _rebuild_fabric_arrays so the
topology-change guard survives python -O.
Introduces FabricStageCache — a lightweight cache for the usdrt stage
attachment and IFabricHierarchy handles, registered as a service on
SimulationContext via the service locator (set_service/get_service).

Multiple FabricFrameView instances now share a single hierarchy handle
instead of each creating its own. The cache is automatically closed on
SimulationContext.clear_instance().

Also replaces the assert device check with a proper RuntimeError and
removes the now-unused isaaclab.sim import from fabric_frame_view.py.
# Conflicts:
#	source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
- get_local_poses / set_local_poses via indexed fabric kernels
- get_scales / set_scales with full Fabric acceleration
- Per-view dirty tracking with automatic re-propagation
- Topology-change recovery via _rebuild_fabric_arrays
- 13 new integration tests (49 total pass)
@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch from 80838c0 to 9ff3155 Compare May 20, 2026 14:11
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Reviewing new commits: PR rebased and updated since last review.
Previously reviewed: eb5582ec | Now reviewing: 9ff3155

Summary

This PR delivers a well-architected feature: GPU-accelerated local-pose operations for FabricFrameView. The implementation is comprehensive and addresses the core limitation where local poses previously fell back to USD round-trips.

✅ Strengths

1. Clean Architecture

  • FabricStageCache provides shared hierarchy handles via the service locator pattern, avoiding per-view duplication
  • Three persistent selections (trans_sel_ro, world_sel_rw, local_sel_rw) cleanly separate read vs. write access patterns
  • Factory dispatch in FrameView._get_backend() correctly routes to UsdFrameView when Fabric is unavailable

2. Robust World↔Local Consistency

  • Bidirectional dirty tracking: set_local_poses marks _world_dirty, deferred until next world read
  • _sync_local_from_world() / _sync_world_from_local_if_dirty() keep matrices consistent
  • Per-view dirty flags prevent concurrent views from clearing each other's pending syncs

3. Topology-Adaptive Design

  • PrepareForReuse() + lazy array rebuild in _get_*_array() handles Fabric memory layout changes
  • _rebuild_trans_ro_arrays() consolidates index and indexed-array refresh

4. Excellent Test Coverage

  • 13+ new integration tests covering rotated parents, scaled parents, multi-view isolation
  • test_set_local_then_get_world_with_rotated_parent validates transpose-convention correctness
  • test_multi_view_per_view_dirty_isolation catches per-stage vs. per-view flag bugs

🔧 Minor Observations

[Minor] Index array dtype mismatch

_compute_fabric_indices() returns dtype=wp.int32, but kernels declare indices: ArrayUInt32:

return wp.array(indices, dtype=wp.int32, device=self._device)

Warp silently casts, so this works. Suggestion: use dtype=wp.uint32 for consistency with kernel signatures. Not blocking.

[Nit] Docstring transpose-convention note

The docstrings for update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local explain the transpose identity well. Consider adding a brief note that this relies on Fabric's row-major storage convention for future maintainers.

[Style] Empty sentinel shape

_fabric_empty_2d_array_sentinel uses shape (0, 0):

self._fabric_empty_2d_array_sentinel = wp.zeros((0, 0), dtype=wp.float32, device=self._device)

This is fine since the kernels gate on shape[0] > 0, but (0, 3) or (0, 4) might be slightly more self-documenting for unused position/quaternion slots.

📋 Architecture Notes

The asymmetric sync strategy is well-reasoned:

  • Write world → sync local immediately: _sync_local_from_world() runs right after world writes because downstream code (e.g., rendering) typically reads locals soon after
  • Write local → lazy world sync: _world_dirty flag defers the world = parent * local kernel until the next world read, avoiding unnecessary computation when multiple local writes occur before a world read

📋 CI Status

CI checks are currently pending. The pre-commit and changelog checks have passed.

📋 Verdict

LGTM — This is a significant feature addition that completes the Fabric acceleration story for FabricFrameView. The bidirectional sync logic is sound, test coverage is thorough, and the codebase is well-documented. The minor dtype observation is non-blocking.


Automated review by isaaclab-review-bot • Reviewed at 9ff31550

@pv-nvidia
Copy link
Copy Markdown
Contributor Author

Superseded by the consolidated PR #5728 (pv/fabric-full-stack).

@pv-nvidia pv-nvidia closed this May 22, 2026
@pv-nvidia pv-nvidia reopened this May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant