[IR Container] Phase 2.4 Per-fusion statement tracking by mdavis36 · Pull Request #5961 · NVIDIA/Fuser

mdavis36 · 2026-02-12T22:08:17Z

Summary

Add per-Fusion ownership tracking maps to IrContainer so each Fusion can efficiently query only its own Vals and Exprs within a shared container. Update all Fusion-level accessors to filter by ownership.

This is the highest-risk change in the Phase 2 chain. Every accessor call path is touched — vals(), deterministic_vals(), deterministic_exprs(), unordered_exprs(), and more now return ownership-filtered results rather than raw container contents. For single-Fusion containers the results are identical, but the implementation changes underneath every consumer.

Relationship to Phase 2

This is the critical invariant that makes shared containers safe:

Invariant: Fusion accessors filter by ownership

  Fusion A ─┐
             ├──→ shared_ptr<IrContainer> ──→ {val_0(A), val_1(A), val_0'(B), val_1'(B)}
  Fusion B ─┘

  A.vals()  →  {val_0, val_1}       // Only A's vals
  B.vals()  →  {val_0', val_1'}     // Only B's vals
  container->vals()  →  {val_0, val_1, val_0', val_1'}  // ALL vals (raw)

Without per-Fusion filtering, a shared container copy would break every consumer — Fusion::copy would iterate all vals (including other Fusions'), deterministic_vals() would return interleaved results, and StatementGuard rollback would destroy statements belonging to other Fusions.

This invariant is what allows Phase 2 to maintain independent IR graphs despite shared storage:

Invariant : fusion->vals() returns {v : v in container AND v->container() == fusion}
Invariant : Fusion::clear() only clears THIS Fusion's state (not ir_container()->clear())

CI Risk

Highest of all Phase 2 PRs. Every accessor-dependent code path is touched. For single-Fusion containers, filtered results are identical to unfiltered — but regressions would surface in accessor-heavy code paths (scheduling, lowering, codegen).

github-actions · 2026-02-12T22:09:11Z

Description

Transition Fusion::ir_container_ from unique_ptr to shared_ptr to enable container sharing
Add per-fusion statement tracking with per_fusion_vals_/per_fusion_exprs_ maps in IrContainer
Update Fusion accessors (vals(), deterministic_exprs(), etc.) to return per-Fusion filtered results
Add Fusion tracking API (addFusion/removeFusion/transferFusion/sharingCount) to IrContainer
Temporarily disable parallel compilation during shared_ptr transition

Changes walkthrough

Relevant files

Enhancement

fusion.cpp `Update Fusion methods for shared_ptr and per-fusion tracking` csrc/fusion.cpp Update Fusion constructor/destructor to use shared_ptr and track fusion ownership Modify clear() to use removeStatementsOwnedBy() for per-Fusion cleanup Update registerVal/registerExpr to populate per-fusion tracking maps Update removeVal/removeExpr to clean up per-fusion maps Fix swap() to transfer statement ownership between Fusions Update copy() to pass destination fusion for proper ownership tracking	+22/-23
container.cpp `Implement per-fusion statement tracking and ownership management` csrc/ir/container.cpp Add per-fusion tracking maps (per_fusion_vals_, per_fusion_exprs_) and sharing_fusions_ set Implement Fusion tracking methods (addFusion, removeFusion, transferFusion, etc.) Add per-Fusion filtered accessors (valsOwnedBy, exprsOwnedBy, deterministicValsOwnedBy, etc.) Add transferStatementOwnership() and removeStatementsOwnedBy() for ownership management Update copy() to accept destination fusion parameter Update swap() to handle per-fusion tracking maps	+161/-4
fusion.h `Update Fusion interface for shared_ptr and per-fusion accessors` csrc/fusion.h Change ir_container_ from unique_ptr to shared_ptr Add ir_container_ptr() accessor method Update deterministic_vals() and deterministic_exprs() to return per-Fusion filtered results Update vals() and unordered_exprs() to return per-Fusion filtered results Update deterministic_vals_map() and deterministic_exprs_map() to return per-Fusion filtered results	+15/-14
container.h `Update IrContainer interface for shared ownership and per-fusion` `tracking` csrc/ir/container.h Remove parent_ pointer and add sharing_fusions_ set for multiple fusion tracking Add per-fusion tracking maps (per_fusion_vals_, per_fusion_exprs_) Add Fusion tracking API declarations (addFusion, removeFusion, transferFusion, etc.) Add per-Fusion filtered accessor declarations (valsOwnedBy, exprsOwnedBy, etc.) Update copy() signature to accept destination fusion parameter	+29/-9

Configuration changes

fusion_kernel_runtime.cpp `Temporarily disable parallel compilation during transition` csrc/runtime/fusion_kernel_runtime.cpp Add kPhase2DisableParallelCompile flag to temporarily disable parallel compilation Update compileFusionParallel() to check the temporary disable flag	+7/-2

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 No relevant tests
⚡ Recommended focus areas for review
Memory Management The change from unique_ptr to shared_ptr for ir_container_ suggests multiple Fusions can now share the same IrContainer. The transferStatementOwnership and removeStatementsOwnedBy methods need careful review to ensure proper memory management and avoid dangling pointers when Fusions are destroyed. Fusion::Fusion() : ir_container_(std::make_shared<IrContainer>()) { ir_container_->addFusion(this); } Performance Regression The addition of kPhase2DisableParallelCompile flag suggests potential concurrency issues with the new per-fusion tracking. This may cause performance regressions in multi-segment fusion compilation. Need to verify this is temporary and understand the underlying thread safety concerns. // TODO: Remove when std::shared_mutex is added to IrContainer. constexpr bool kPhase2DisableParallelCompile = true; API Compatibility Multiple method signatures have changed (e.g., deterministic_vals() now returns deque<Val> instead of const deque<Val>, copy() now takes additional Fusion* parameter). This breaks the const-correctness and may impact existing code that depends on these APIs. std::deque<Val> deterministicValsOwnedBy( const Fusion fusion) const noexcept; std::deque<Expr> deterministicExprsOwnedBy( const Fusion fusion) const noexcept; std::unordered_map<Val, int64_t> deterministicValsMapOwnedBy( const Fusion fusion) const noexcept; std::unordered_map<Expr, int64_t> deterministicExprsMapOwnedBy( const Fusion fusion) const noexcept;

mdavis36 · 2026-02-12T22:09:32Z

!test

Add per_fusion_vals_ / per_fusion_exprs_ maps to IrContainer so each Fusion can efficiently query only its own statements in a shared container. Fusion forwarding methods (vals(), unordered_exprs(), deterministic_vals(), etc.) now return per-Fusion filtered results. Fusion::clear() uses removeStatementsOwnedBy(this) instead of ir_container()->clear().

mdavis36 · 2026-02-18T06:37:34Z

!test

greptile-apps · 2026-02-18T06:44:04Z

Greptile Summary

This PR adds per-Fusion ownership tracking maps (per_fusion_vals_, per_fusion_exprs_) to IrContainer, enabling each Fusion to efficiently query only its own Vals and Exprs within a (potentially shared) container. All Fusion-level accessors (vals(), deterministic_vals(), unordered_exprs(), etc.) now return ownership-filtered results.

Ownership tracking: New std::unordered_map<const Fusion*, std::unordered_set<...>> maps in IrContainer track which Vals/Exprs belong to which Fusion. Registration, removal, and rollback paths all update these maps.
Filtered accessors: New valsOwnedBy, exprsOwnedBy, deterministicValsOwnedBy, etc. methods on IrContainer provide per-Fusion filtered views. Fusion's public accessors now delegate to these.
Swap and transfer: Fusion::swap now calls transferStatementOwnership to correctly re-key the per-Fusion maps after container content swaps.
Clear behavior: Fusion::clear() now calls removeStatementsOwnedBy(this) instead of ir_container()->clear(), preserving other Fusions' state.

Key concerns for shared-container correctness:

removeStatementsCreatedAfter (used by StatementGuard) operates on container-global deque positions. In a multi-Fusion container, it could destroy statements belonging to another Fusion and leave dangling pointers in that Fusion's per-fusion maps.
numExprs()/numVals() still return container-global counts, creating an inconsistency with the per-Fusion filtered accessors.
Fusion::swap transfer logic is incorrect when both Fusions share the same container (unlikely today but relevant for future phases).

For single-Fusion containers (the current runtime scenario), this PR is correct and preserves existing behavior. The identified issues are latent hazards for the planned shared-container path.

Confidence Score: 3/5

Safe for single-Fusion containers (current runtime), but contains latent correctness issues for the intended shared-container scenario.
The per-Fusion ownership tracking is well-structured and correctly maintains bookkeeping for all register/remove/clear paths in the single-Fusion case. However, removeStatementsCreatedAfter (used by StatementGuard) can destroy another Fusion's statements in a shared container, numExprs/numVals are inconsistent with filtered accessors, and Fusion::swap has incorrect transfer logic when both Fusions share a container. These are latent issues that will surface when shared containers are activated in later phases.
csrc/fusion.cpp (removeStatementsCreatedAfter shared-container safety, swap transfer logic), csrc/fusion.h (numExprs/numVals inconsistency)

Important Files Changed

Filename	Overview
csrc/fusion.cpp	Core per-Fusion ownership logic: register/remove/swap/clear all updated with per-fusion map bookkeeping. removeStatementsCreatedAfter has a latent shared-container bug where it can destroy another Fusion's statements.
csrc/fusion.h	Accessor methods now delegate to per-Fusion filtered IrContainer methods. numExprs/numVals still return container-global counts — inconsistent with filtered accessors and potentially unsafe for StatementGuard in shared containers.
csrc/ir/container.cpp	New per-Fusion query and management methods: valsOwnedBy, exprsOwnedBy, transferStatementOwnership, removeStatementsOwnedBy, and deterministic*OwnedBy. removeStatementsOwnedBy has O(n²) complexity from mid-deque erasure.
csrc/ir/container.h	Declares per-Fusion tracking maps and new public query/management API. Clean interface design with const-correct method signatures.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph FusionA["Fusion A"]
        A_vals["vals()"]
        A_exprs["unordered_exprs()"]
        A_det["deterministic_vals()"]
        A_reg["registerVal() / registerExpr()"]
        A_clear["clear()"]
    end

    subgraph FusionB["Fusion B"]
        B_vals["vals()"]
        B_exprs["unordered_exprs()"]
        B_reg["registerVal() / registerExpr()"]
        B_clear["clear()"]
    end

    subgraph IrContainer["Shared IrContainer"]
        raw_vals["vals_up_ / vals_ (all statements)"]
        pf_vals["per_fusion_vals_[A] = {v0, v1}"]
        pf_vals_b["per_fusion_vals_[B] = {v2, v3}"]
        pf_exprs["per_fusion_exprs_[A] = {e0}"]
        pf_exprs_b["per_fusion_exprs_[B] = {e1}"]
        transfer["transferStatementOwnership()"]
        remove["removeStatementsOwnedBy()"]
    end

    A_vals -->|"valsOwnedBy(A)"| pf_vals
    A_exprs -->|"exprsOwnedBy(A)"| pf_exprs
    A_det -->|"deterministicValsOwnedBy(A)"| pf_vals
    A_reg -->|"inserts into"| pf_vals
    A_reg -->|"inserts into"| pf_exprs
    A_clear -->|"calls"| remove

    B_vals -->|"valsOwnedBy(B)"| pf_vals_b
    B_exprs -->|"exprsOwnedBy(B)"| pf_exprs_b
    B_reg -->|"inserts into"| pf_vals_b
    B_reg -->|"inserts into"| pf_exprs_b
    B_clear -->|"calls"| remove

    pf_vals --- raw_vals
    pf_vals_b --- raw_vals

_{Last reviewed commit: 8b162d9}

greptile-apps

_{4 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-18T06:44:11Z

csrc/ir/container.cpp

+void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {
+  auto vals_it = per_fusion_vals_.find(fusion);
+  if (vals_it != per_fusion_vals_.end()) {
+    for (auto it = vals_up_.begin(); it != vals_up_.end();) {
+      if (vals_it->second.count(it->get()) > 0) {
+        vals_.erase(it->get());
+        it = vals_up_.erase(it);
+      } else {
+        ++it;
+      }
+    }
+    per_fusion_vals_.erase(vals_it);
+  }
+
+  auto exprs_it = per_fusion_exprs_.find(fusion);
+  if (exprs_it != per_fusion_exprs_.end()) {
+    for (auto it = exprs_up_.begin(); it != exprs_up_.end();) {
+      if (exprs_it->second.count(it->get()) > 0) {
+        exprs_.erase(it->get());
+        it = exprs_up_.erase(it);
+      } else {
+        ++it;
+      }
+    }
+    per_fusion_exprs_.erase(exprs_it);
+  }
+}


O(n) linear scan on deque erasure

removeStatementsOwnedBy iterates the entire vals_up_ and exprs_up_ deques, and calls erase on the deque mid-iteration. Each std::deque::erase in the middle is O(n), making this overall O(n²) in the number of statements.

For large fusions (which NVIDIA workloads commonly are), this could become a noticeable performance bottleneck, especially since Fusion::clear() is called from the destructor and from copy operations. Consider using a remove-then-erase idiom or std::erase_if (C++20) to reduce this to a single pass:

Suggested change

void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {

auto vals_it = per_fusion_vals_.find(fusion);

if (vals_it != per_fusion_vals_.end()) {

for (auto it = vals_up_.begin(); it != vals_up_.end();) {

if (vals_it->second.count(it->get()) > 0) {

vals_.erase(it->get());

it = vals_up_.erase(it);

} else {

++it;

}

}

per_fusion_vals_.erase(vals_it);

}

auto exprs_it = per_fusion_exprs_.find(fusion);

if (exprs_it != per_fusion_exprs_.end()) {

for (auto it = exprs_up_.begin(); it != exprs_up_.end();) {

if (exprs_it->second.count(it->get()) > 0) {

exprs_.erase(it->get());

it = exprs_up_.erase(it);

} else {

++it;

}

}

per_fusion_exprs_.erase(exprs_it);

}

}

void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {

auto vals_it = per_fusion_vals_.find(fusion);

if (vals_it != per_fusion_vals_.end()) {

const auto& owned_vals = vals_it->second;

std::erase_if(vals_up_, [&](const std::unique_ptr<Val>& v) {

if (owned_vals.count(v.get()) > 0) {

vals_.erase(v.get());

return true;

}

return false;

});

per_fusion_vals_.erase(vals_it);

}

auto exprs_it = per_fusion_exprs_.find(fusion);

if (exprs_it != per_fusion_exprs_.end()) {

const auto& owned_exprs = exprs_it->second;

std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {

if (owned_exprs.count(e.get()) > 0) {

exprs_.erase(e.get());

return true;

}

return false;

});

per_fusion_exprs_.erase(exprs_it);

}

}

greptile-apps · 2026-02-18T06:44:15Z

csrc/ir/container.cpp

+const std::unordered_set<Val*>& IrContainer::valsOwnedBy(
+    const Fusion* fusion) const {
+  static const std::unordered_set<Val*> empty;
+  auto it = per_fusion_vals_.find(fusion);
+  return it != per_fusion_vals_.end() ? it->second : empty;
+}
+
+const std::unordered_set<Expr*>& IrContainer::exprsOwnedBy(
+    const Fusion* fusion) const {
+  static const std::unordered_set<Expr*> empty;
+  auto it = per_fusion_exprs_.find(fusion);
+  return it != per_fusion_exprs_.end() ? it->second : empty;
+}


Static local variables for empty sets

valsOwnedBy and exprsOwnedBy both use static const local variables to return an empty set by reference when a Fusion has no entries. While functionally correct and safe (since they are const), using static locals in what could become multi-threaded code is worth noting. The const-ness avoids race conditions on the data itself, but if future code ever removes the const, the static would become a thread-safety hazard. An alternative is to use an inline static const class member.

This is a minor style observation — the current implementation is safe.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-18T06:44:16Z

csrc/fusion.cpp

  IrContainer::swap(*(a.ir_container()), *(b.ir_container()));

-  // Fix parent pointers after swapping containers
-  // After swap, each Fusion owns a different IrContainer, so we must
-  // update the parent backpointers in those containers to point to their new
-  // owners
+  // After swapping container contents, per-Fusion tracking keys point to the
+  // wrong Fusions. Rename: a's container had b's entries, b's had a's.
+  a.ir_container()->transferStatementOwnership(&b, &a);
+  b.ir_container()->transferStatementOwnership(&a, &b);


Swap + transfer is not safe when a and b share the same container

When a and b share the same IrContainer (i.e., a.ir_container_.get() == b.ir_container_.get()):

IrContainer::swap(*Ca, *Ca) is a no-op (swapping with itself)

a.ir_container()->transferStatementOwnership(&b, &a) merges b's entries into a's

b.ir_container()->transferStatementOwnership(&a, &b) then moves all entries (original a + original b) into b's — leaving a with nothing

This may not be reachable today, but if shared containers are the end-goal, this path would need a guard (e.g., early return if a.ir_container_.get() == b.ir_container_.get()).

greptile-apps · 2026-02-18T06:44:19Z

Additional Comments (2)

csrc/fusion.cpp
Shared-container safety gap in rollback

In removeStatementsCreatedAfter, statements are popped from the back of the container-global exprs_up_ deque (line 445), but only the current Fusion's per-fusion map is cleaned up (line 449: c->per_fusion_exprs_[this].erase(e)).

In a shared container scenario where another Fusion has appended statements after the StatementGuard snapshot, the statement at the back of exprs_up_ could belong to the other Fusion. This code would:

Destroy the unique_ptr (freeing the Expr memory)
Remove it from exprs_ (global set)
Only erase from per_fusion_exprs_[this] — leaving a dangling pointer in the other Fusion's per-fusion set

The same issue applies to the vals_up_ loop below (line 467).

This is safe for single-Fusion containers today, but since this PR's stated goal is enabling shared-container safety, the rollback path should also clean up the correct owner's per-fusion map — e.g., by looking up which Fusion actually owns each statement being removed.

csrc/fusion.h
numExprs/numVals still return container-global counts

numExprs() and numVals() forward directly to the container's total counts, not per-Fusion counts. This is inconsistent with the fact that vals(), unordered_exprs(), and deterministic_* now return per-Fusion filtered results.

More importantly, StatementGuard uses numExprs() and numVals() to snapshot statement counts and later passes them to removeStatementsCreatedAfter, which pops from the global deques. In a shared container where another Fusion adds statements concurrently, the snapshot will include the other Fusion's additions, and the rollback will remove them — destroying statements that don't belong to this Fusion.

For single-Fusion containers this is fine, but these methods should be updated to return per-Fusion counts (or at least documented as container-global) before shared containers are actually used.

mdavis36 changed the base branch from main to md/phase2-shared-ptr February 12, 2026 22:08

mdavis36 force-pushed the md/phase2-shared-ptr branch from 3a199c8 to 53e5045 Compare February 12, 2026 22:09

mdavis36 changed the title ~~[IR Container] Phase 2 Per-fusion statement tracking~~ [IR Container] Phase 2.4 Per-fusion statement tracking Feb 18, 2026

mdavis36 force-pushed the md/phase2-per-fusion branch from 33629cb to 8b162d9 Compare February 18, 2026 03:13

mdavis36 force-pushed the md/phase2-shared-ptr branch from 53e5045 to f8ff364 Compare February 18, 2026 03:13

This was referenced Feb 18, 2026

[IR Container] Phase 2.6 Concurrency & Thread Safety #5971

Open

[IR Container] Phase 2 IR Container Refactor #5975

Draft

mdavis36 marked this pull request as ready for review February 18, 2026 06:37

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IR Container] Phase 2.4 Per-fusion statement tracking #5961

[IR Container] Phase 2.4 Per-fusion statement tracking #5961
mdavis36 wants to merge 1 commit intomd/phase2-shared-ptrfrom
md/phase2-per-fusion

mdavis36 commented Feb 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 12, 2026

Changes walkthrough

PR Reviewer Guide

Uh oh!

mdavis36 commented Feb 12, 2026

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mdavis36 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relationship to Phase 2

CI Risk

Uh oh!

github-actions bot commented Feb 12, 2026

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

mdavis36 commented Feb 12, 2026

Uh oh!

mdavis36 commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mdavis36 commented Feb 12, 2026 •

edited

Loading