[IR Container] Phase 2.4 Per-fusion statement tracking #5961
[IR Container] Phase 2.4 Per-fusion statement tracking #5961mdavis36 wants to merge 1 commit intomd/phase2-shared-ptrfrom
Conversation
Description
|
| Relevant files | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Enhancement |
| ||||||||
| Configuration changes |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Memory Management
|
3a199c8 to
53e5045
Compare
|
!test |
Add per_fusion_vals_ / per_fusion_exprs_ maps to IrContainer so each Fusion can efficiently query only its own statements in a shared container. Fusion forwarding methods (vals(), unordered_exprs(), deterministic_vals(), etc.) now return per-Fusion filtered results. Fusion::clear() uses removeStatementsOwnedBy(this) instead of ir_container()->clear().
33629cb to
8b162d9
Compare
53e5045 to
f8ff364
Compare
|
!test |
Greptile SummaryThis PR adds per-Fusion ownership tracking maps (
Key concerns for shared-container correctness:
For single-Fusion containers (the current runtime scenario), this PR is correct and preserves existing behavior. The identified issues are latent hazards for the planned shared-container path. Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph FusionA["Fusion A"]
A_vals["vals()"]
A_exprs["unordered_exprs()"]
A_det["deterministic_vals()"]
A_reg["registerVal() / registerExpr()"]
A_clear["clear()"]
end
subgraph FusionB["Fusion B"]
B_vals["vals()"]
B_exprs["unordered_exprs()"]
B_reg["registerVal() / registerExpr()"]
B_clear["clear()"]
end
subgraph IrContainer["Shared IrContainer"]
raw_vals["vals_up_ / vals_ (all statements)"]
pf_vals["per_fusion_vals_[A] = {v0, v1}"]
pf_vals_b["per_fusion_vals_[B] = {v2, v3}"]
pf_exprs["per_fusion_exprs_[A] = {e0}"]
pf_exprs_b["per_fusion_exprs_[B] = {e1}"]
transfer["transferStatementOwnership()"]
remove["removeStatementsOwnedBy()"]
end
A_vals -->|"valsOwnedBy(A)"| pf_vals
A_exprs -->|"exprsOwnedBy(A)"| pf_exprs
A_det -->|"deterministicValsOwnedBy(A)"| pf_vals
A_reg -->|"inserts into"| pf_vals
A_reg -->|"inserts into"| pf_exprs
A_clear -->|"calls"| remove
B_vals -->|"valsOwnedBy(B)"| pf_vals_b
B_exprs -->|"exprsOwnedBy(B)"| pf_exprs_b
B_reg -->|"inserts into"| pf_vals_b
B_reg -->|"inserts into"| pf_exprs_b
B_clear -->|"calls"| remove
pf_vals --- raw_vals
pf_vals_b --- raw_vals
Last reviewed commit: 8b162d9 |
| void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) { | ||
| auto vals_it = per_fusion_vals_.find(fusion); | ||
| if (vals_it != per_fusion_vals_.end()) { | ||
| for (auto it = vals_up_.begin(); it != vals_up_.end();) { | ||
| if (vals_it->second.count(it->get()) > 0) { | ||
| vals_.erase(it->get()); | ||
| it = vals_up_.erase(it); | ||
| } else { | ||
| ++it; | ||
| } | ||
| } | ||
| per_fusion_vals_.erase(vals_it); | ||
| } | ||
|
|
||
| auto exprs_it = per_fusion_exprs_.find(fusion); | ||
| if (exprs_it != per_fusion_exprs_.end()) { | ||
| for (auto it = exprs_up_.begin(); it != exprs_up_.end();) { | ||
| if (exprs_it->second.count(it->get()) > 0) { | ||
| exprs_.erase(it->get()); | ||
| it = exprs_up_.erase(it); | ||
| } else { | ||
| ++it; | ||
| } | ||
| } | ||
| per_fusion_exprs_.erase(exprs_it); | ||
| } | ||
| } |
There was a problem hiding this comment.
O(n) linear scan on deque erasure
removeStatementsOwnedBy iterates the entire vals_up_ and exprs_up_ deques, and calls erase on the deque mid-iteration. Each std::deque::erase in the middle is O(n), making this overall O(n²) in the number of statements.
For large fusions (which NVIDIA workloads commonly are), this could become a noticeable performance bottleneck, especially since Fusion::clear() is called from the destructor and from copy operations. Consider using a remove-then-erase idiom or std::erase_if (C++20) to reduce this to a single pass:
| void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) { | |
| auto vals_it = per_fusion_vals_.find(fusion); | |
| if (vals_it != per_fusion_vals_.end()) { | |
| for (auto it = vals_up_.begin(); it != vals_up_.end();) { | |
| if (vals_it->second.count(it->get()) > 0) { | |
| vals_.erase(it->get()); | |
| it = vals_up_.erase(it); | |
| } else { | |
| ++it; | |
| } | |
| } | |
| per_fusion_vals_.erase(vals_it); | |
| } | |
| auto exprs_it = per_fusion_exprs_.find(fusion); | |
| if (exprs_it != per_fusion_exprs_.end()) { | |
| for (auto it = exprs_up_.begin(); it != exprs_up_.end();) { | |
| if (exprs_it->second.count(it->get()) > 0) { | |
| exprs_.erase(it->get()); | |
| it = exprs_up_.erase(it); | |
| } else { | |
| ++it; | |
| } | |
| } | |
| per_fusion_exprs_.erase(exprs_it); | |
| } | |
| } | |
| void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) { | |
| auto vals_it = per_fusion_vals_.find(fusion); | |
| if (vals_it != per_fusion_vals_.end()) { | |
| const auto& owned_vals = vals_it->second; | |
| std::erase_if(vals_up_, [&](const std::unique_ptr<Val>& v) { | |
| if (owned_vals.count(v.get()) > 0) { | |
| vals_.erase(v.get()); | |
| return true; | |
| } | |
| return false; | |
| }); | |
| per_fusion_vals_.erase(vals_it); | |
| } | |
| auto exprs_it = per_fusion_exprs_.find(fusion); | |
| if (exprs_it != per_fusion_exprs_.end()) { | |
| const auto& owned_exprs = exprs_it->second; | |
| std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) { | |
| if (owned_exprs.count(e.get()) > 0) { | |
| exprs_.erase(e.get()); | |
| return true; | |
| } | |
| return false; | |
| }); | |
| per_fusion_exprs_.erase(exprs_it); | |
| } | |
| } |
| const std::unordered_set<Val*>& IrContainer::valsOwnedBy( | ||
| const Fusion* fusion) const { | ||
| static const std::unordered_set<Val*> empty; | ||
| auto it = per_fusion_vals_.find(fusion); | ||
| return it != per_fusion_vals_.end() ? it->second : empty; | ||
| } | ||
|
|
||
| const std::unordered_set<Expr*>& IrContainer::exprsOwnedBy( | ||
| const Fusion* fusion) const { | ||
| static const std::unordered_set<Expr*> empty; | ||
| auto it = per_fusion_exprs_.find(fusion); | ||
| return it != per_fusion_exprs_.end() ? it->second : empty; | ||
| } |
There was a problem hiding this comment.
Static local variables for empty sets
valsOwnedBy and exprsOwnedBy both use static const local variables to return an empty set by reference when a Fusion has no entries. While functionally correct and safe (since they are const), using static locals in what could become multi-threaded code is worth noting. The const-ness avoids race conditions on the data itself, but if future code ever removes the const, the static would become a thread-safety hazard. An alternative is to use an inline static const class member.
This is a minor style observation — the current implementation is safe.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| IrContainer::swap(*(a.ir_container()), *(b.ir_container())); | ||
|
|
||
| // Fix parent pointers after swapping containers | ||
| // After swap, each Fusion owns a different IrContainer, so we must | ||
| // update the parent backpointers in those containers to point to their new | ||
| // owners | ||
| // After swapping container contents, per-Fusion tracking keys point to the | ||
| // wrong Fusions. Rename: a's container had b's entries, b's had a's. | ||
| a.ir_container()->transferStatementOwnership(&b, &a); | ||
| b.ir_container()->transferStatementOwnership(&a, &b); |
There was a problem hiding this comment.
Swap + transfer is not safe when a and b share the same container
When a and b share the same IrContainer (i.e., a.ir_container_.get() == b.ir_container_.get()):
IrContainer::swap(*Ca, *Ca)is a no-op (swapping with itself)a.ir_container()->transferStatementOwnership(&b, &a)merges b's entries into a'sb.ir_container()->transferStatementOwnership(&a, &b)then moves all entries (original a + original b) into b's — leaving a with nothing
This may not be reachable today, but if shared containers are the end-goal, this path would need a guard (e.g., early return if a.ir_container_.get() == b.ir_container_.get()).
Additional Comments (2)
In In a shared container scenario where another Fusion has appended statements after the
The same issue applies to the This is safe for single-Fusion containers today, but since this PR's stated goal is enabling shared-container safety, the rollback path should also clean up the correct owner's per-fusion map — e.g., by looking up which Fusion actually owns each statement being removed.
More importantly, For single-Fusion containers this is fine, but these methods should be updated to return per-Fusion counts (or at least documented as container-global) before shared containers are actually used. |
Summary
Add per-Fusion ownership tracking maps to IrContainer so each Fusion can efficiently query only its own Vals and Exprs within a shared container. Update all Fusion-level accessors to filter by ownership.
This is the highest-risk change in the Phase 2 chain. Every accessor call path is touched —
vals(),deterministic_vals(),deterministic_exprs(),unordered_exprs(), and more now return ownership-filtered results rather than raw container contents. For single-Fusion containers the results are identical, but the implementation changes underneath every consumer.Relationship to Phase 2
This is the critical invariant that makes shared containers safe:
Without per-Fusion filtering, a shared container copy would break every consumer —
Fusion::copywould iterate all vals (including other Fusions'),deterministic_vals()would return interleaved results, andStatementGuardrollback would destroy statements belonging to other Fusions.This invariant is what allows Phase 2 to maintain independent IR graphs despite shared storage:
fusion->vals()returns{v : v in container AND v->container() == fusion}Fusion::clear()only clears THIS Fusion's state (notir_container()->clear())CI Risk
Highest of all Phase 2 PRs. Every accessor-dependent code path is touched. For single-Fusion containers, filtered results are identical to unfiltered — but regressions would surface in accessor-heavy code paths (scheduling, lowering, codegen).