Skip to content

[IR Container] Phase 2.4 Per-fusion statement tracking #5961

Open
mdavis36 wants to merge 1 commit intomd/phase2-shared-ptrfrom
md/phase2-per-fusion
Open

[IR Container] Phase 2.4 Per-fusion statement tracking #5961
mdavis36 wants to merge 1 commit intomd/phase2-shared-ptrfrom
md/phase2-per-fusion

Conversation

@mdavis36
Copy link
Collaborator

@mdavis36 mdavis36 commented Feb 12, 2026

Summary

Add per-Fusion ownership tracking maps to IrContainer so each Fusion can efficiently query only its own Vals and Exprs within a shared container. Update all Fusion-level accessors to filter by ownership.

This is the highest-risk change in the Phase 2 chain. Every accessor call path is touched — vals(), deterministic_vals(), deterministic_exprs(), unordered_exprs(), and more now return ownership-filtered results rather than raw container contents. For single-Fusion containers the results are identical, but the implementation changes underneath every consumer.

Relationship to Phase 2

This is the critical invariant that makes shared containers safe:

Invariant: Fusion accessors filter by ownership

  Fusion A ─┐
             ├──→ shared_ptr<IrContainer> ──→ {val_0(A), val_1(A), val_0'(B), val_1'(B)}
  Fusion B ─┘

  A.vals()  →  {val_0, val_1}       // Only A's vals
  B.vals()  →  {val_0', val_1'}     // Only B's vals
  container->vals()  →  {val_0, val_1, val_0', val_1'}  // ALL vals (raw)

Without per-Fusion filtering, a shared container copy would break every consumer — Fusion::copy would iterate all vals (including other Fusions'), deterministic_vals() would return interleaved results, and StatementGuard rollback would destroy statements belonging to other Fusions.

This invariant is what allows Phase 2 to maintain independent IR graphs despite shared storage:

  • Invariant : fusion->vals() returns {v : v in container AND v->container() == fusion}
  • Invariant : Fusion::clear() only clears THIS Fusion's state (not ir_container()->clear())

CI Risk

Highest of all Phase 2 PRs. Every accessor-dependent code path is touched. For single-Fusion containers, filtered results are identical to unfiltered — but regressions would surface in accessor-heavy code paths (scheduling, lowering, codegen).

@mdavis36 mdavis36 changed the base branch from main to md/phase2-shared-ptr February 12, 2026 22:08
@github-actions
Copy link

Description

  • Transition Fusion::ir_container_ from unique_ptr to shared_ptr to enable container sharing

  • Add per-fusion statement tracking with per_fusion_vals_/per_fusion_exprs_ maps in IrContainer

  • Update Fusion accessors (vals(), deterministic_exprs(), etc.) to return per-Fusion filtered results

  • Add Fusion tracking API (addFusion/removeFusion/transferFusion/sharingCount) to IrContainer

  • Temporarily disable parallel compilation during shared_ptr transition

Changes walkthrough

Relevant files
Enhancement
fusion.cpp
Update Fusion methods for shared_ptr and per-fusion tracking

csrc/fusion.cpp

  • Update Fusion constructor/destructor to use shared_ptr and track
    fusion ownership
  • Modify clear() to use removeStatementsOwnedBy() for per-Fusion cleanup
  • Update registerVal/registerExpr to populate per-fusion tracking maps
  • Update removeVal/removeExpr to clean up per-fusion maps
  • Fix swap() to transfer statement ownership between Fusions
  • Update copy() to pass destination fusion for proper ownership tracking
  • +22/-23 
    container.cpp
    Implement per-fusion statement tracking and ownership management

    csrc/ir/container.cpp

  • Add per-fusion tracking maps (per_fusion_vals_, per_fusion_exprs_) and
    sharing_fusions_ set
  • Implement Fusion tracking methods (addFusion, removeFusion,
    transferFusion, etc.)
  • Add per-Fusion filtered accessors (valsOwnedBy, exprsOwnedBy,
    deterministicValsOwnedBy, etc.)
  • Add transferStatementOwnership() and removeStatementsOwnedBy() for
    ownership management
  • Update copy() to accept destination fusion parameter
  • Update swap() to handle per-fusion tracking maps
  • +161/-4 
    fusion.h
    Update Fusion interface for shared_ptr and per-fusion accessors

    csrc/fusion.h

  • Change ir_container_ from unique_ptr to shared_ptr
  • Add ir_container_ptr() accessor method
  • Update deterministic_vals() and deterministic_exprs() to return
    per-Fusion filtered results
  • Update vals() and unordered_exprs() to return per-Fusion filtered
    results
  • Update deterministic_vals_map() and deterministic_exprs_map() to
    return per-Fusion filtered results
  • +15/-14 
    container.h
    Update IrContainer interface for shared ownership and per-fusion
    tracking

    csrc/ir/container.h

  • Remove parent_ pointer and add sharing_fusions_ set for multiple
    fusion tracking
  • Add per-fusion tracking maps (per_fusion_vals_, per_fusion_exprs_)
  • Add Fusion tracking API declarations (addFusion, removeFusion,
    transferFusion, etc.)
  • Add per-Fusion filtered accessor declarations (valsOwnedBy,
    exprsOwnedBy, etc.)
  • Update copy() signature to accept destination fusion parameter
  • +29/-9   
    Configuration changes
    fusion_kernel_runtime.cpp
    Temporarily disable parallel compilation during transition

    csrc/runtime/fusion_kernel_runtime.cpp

  • Add kPhase2DisableParallelCompile flag to temporarily disable parallel
    compilation
  • Update compileFusionParallel() to check the temporary disable flag
  • +7/-2     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review
    Memory Management

    The change from unique_ptr to shared_ptr for ir_container_ suggests multiple Fusions can now share the same IrContainer. The transferStatementOwnership and removeStatementsOwnedBy methods need careful review to ensure proper memory management and avoid dangling pointers when Fusions are destroyed.

    Fusion::Fusion() : ir_container_(std::make_shared<IrContainer>()) {
      ir_container_->addFusion(this);
    }
    Performance Regression

    The addition of kPhase2DisableParallelCompile flag suggests potential concurrency issues with the new per-fusion tracking. This may cause performance regressions in multi-segment fusion compilation. Need to verify this is temporary and understand the underlying thread safety concerns.

    // TODO: Remove when std::shared_mutex is added to IrContainer.
    constexpr bool kPhase2DisableParallelCompile = true;
    API Compatibility

    Multiple method signatures have changed (e.g., deterministic_vals() now returns deque<Val*> instead of const deque<Val*>, copy() now takes additional Fusion* parameter). This breaks the const-correctness and may impact existing code that depends on these APIs.

    std::deque<Val*> deterministicValsOwnedBy(
        const Fusion* fusion) const noexcept;
    std::deque<Expr*> deterministicExprsOwnedBy(
        const Fusion* fusion) const noexcept;
    std::unordered_map<Val*, int64_t> deterministicValsMapOwnedBy(
        const Fusion* fusion) const noexcept;
    std::unordered_map<Expr*, int64_t> deterministicExprsMapOwnedBy(
        const Fusion* fusion) const noexcept;

    @mdavis36 mdavis36 force-pushed the md/phase2-shared-ptr branch from 3a199c8 to 53e5045 Compare February 12, 2026 22:09
    @mdavis36
    Copy link
    Collaborator Author

    !test

    @mdavis36 mdavis36 changed the title [IR Container] Phase 2 Per-fusion statement tracking [IR Container] Phase 2.4 Per-fusion statement tracking Feb 18, 2026
    Add per_fusion_vals_ / per_fusion_exprs_ maps to IrContainer so each
    Fusion can efficiently query only its own statements in a shared
    container. Fusion forwarding methods (vals(), unordered_exprs(),
    deterministic_vals(), etc.) now return per-Fusion filtered results.
    Fusion::clear() uses removeStatementsOwnedBy(this) instead of
    ir_container()->clear().
    @mdavis36
    Copy link
    Collaborator Author

    !test

    @mdavis36 mdavis36 marked this pull request as ready for review February 18, 2026 06:37
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Greptile Summary

    This PR adds per-Fusion ownership tracking maps (per_fusion_vals_, per_fusion_exprs_) to IrContainer, enabling each Fusion to efficiently query only its own Vals and Exprs within a (potentially shared) container. All Fusion-level accessors (vals(), deterministic_vals(), unordered_exprs(), etc.) now return ownership-filtered results.

    • Ownership tracking: New std::unordered_map<const Fusion*, std::unordered_set<...>> maps in IrContainer track which Vals/Exprs belong to which Fusion. Registration, removal, and rollback paths all update these maps.
    • Filtered accessors: New valsOwnedBy, exprsOwnedBy, deterministicValsOwnedBy, etc. methods on IrContainer provide per-Fusion filtered views. Fusion's public accessors now delegate to these.
    • Swap and transfer: Fusion::swap now calls transferStatementOwnership to correctly re-key the per-Fusion maps after container content swaps.
    • Clear behavior: Fusion::clear() now calls removeStatementsOwnedBy(this) instead of ir_container()->clear(), preserving other Fusions' state.

    Key concerns for shared-container correctness:

    • removeStatementsCreatedAfter (used by StatementGuard) operates on container-global deque positions. In a multi-Fusion container, it could destroy statements belonging to another Fusion and leave dangling pointers in that Fusion's per-fusion maps.
    • numExprs()/numVals() still return container-global counts, creating an inconsistency with the per-Fusion filtered accessors.
    • Fusion::swap transfer logic is incorrect when both Fusions share the same container (unlikely today but relevant for future phases).

    For single-Fusion containers (the current runtime scenario), this PR is correct and preserves existing behavior. The identified issues are latent hazards for the planned shared-container path.

    Confidence Score: 3/5

    • Safe for single-Fusion containers (current runtime), but contains latent correctness issues for the intended shared-container scenario.
    • The per-Fusion ownership tracking is well-structured and correctly maintains bookkeeping for all register/remove/clear paths in the single-Fusion case. However, removeStatementsCreatedAfter (used by StatementGuard) can destroy another Fusion's statements in a shared container, numExprs/numVals are inconsistent with filtered accessors, and Fusion::swap has incorrect transfer logic when both Fusions share a container. These are latent issues that will surface when shared containers are activated in later phases.
    • csrc/fusion.cpp (removeStatementsCreatedAfter shared-container safety, swap transfer logic), csrc/fusion.h (numExprs/numVals inconsistency)

    Important Files Changed

    Filename Overview
    csrc/fusion.cpp Core per-Fusion ownership logic: register/remove/swap/clear all updated with per-fusion map bookkeeping. removeStatementsCreatedAfter has a latent shared-container bug where it can destroy another Fusion's statements.
    csrc/fusion.h Accessor methods now delegate to per-Fusion filtered IrContainer methods. numExprs/numVals still return container-global counts — inconsistent with filtered accessors and potentially unsafe for StatementGuard in shared containers.
    csrc/ir/container.cpp New per-Fusion query and management methods: valsOwnedBy, exprsOwnedBy, transferStatementOwnership, removeStatementsOwnedBy, and deterministic*OwnedBy. removeStatementsOwnedBy has O(n²) complexity from mid-deque erasure.
    csrc/ir/container.h Declares per-Fusion tracking maps and new public query/management API. Clean interface design with const-correct method signatures.

    Flowchart

    %%{init: {'theme': 'neutral'}}%%
    flowchart TD
        subgraph FusionA["Fusion A"]
            A_vals["vals()"]
            A_exprs["unordered_exprs()"]
            A_det["deterministic_vals()"]
            A_reg["registerVal() / registerExpr()"]
            A_clear["clear()"]
        end
    
        subgraph FusionB["Fusion B"]
            B_vals["vals()"]
            B_exprs["unordered_exprs()"]
            B_reg["registerVal() / registerExpr()"]
            B_clear["clear()"]
        end
    
        subgraph IrContainer["Shared IrContainer"]
            raw_vals["vals_up_ / vals_ (all statements)"]
            pf_vals["per_fusion_vals_[A] = {v0, v1}"]
            pf_vals_b["per_fusion_vals_[B] = {v2, v3}"]
            pf_exprs["per_fusion_exprs_[A] = {e0}"]
            pf_exprs_b["per_fusion_exprs_[B] = {e1}"]
            transfer["transferStatementOwnership()"]
            remove["removeStatementsOwnedBy()"]
        end
    
        A_vals -->|"valsOwnedBy(A)"| pf_vals
        A_exprs -->|"exprsOwnedBy(A)"| pf_exprs
        A_det -->|"deterministicValsOwnedBy(A)"| pf_vals
        A_reg -->|"inserts into"| pf_vals
        A_reg -->|"inserts into"| pf_exprs
        A_clear -->|"calls"| remove
    
        B_vals -->|"valsOwnedBy(B)"| pf_vals_b
        B_exprs -->|"exprsOwnedBy(B)"| pf_exprs_b
        B_reg -->|"inserts into"| pf_vals_b
        B_reg -->|"inserts into"| pf_exprs_b
        B_clear -->|"calls"| remove
    
        pf_vals --- raw_vals
        pf_vals_b --- raw_vals
    
    Loading

    Last reviewed commit: 8b162d9

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    4 files reviewed, 5 comments

    Edit Code Review Agent Settings | Greptile

    Comment on lines +223 to +249
    void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {
    auto vals_it = per_fusion_vals_.find(fusion);
    if (vals_it != per_fusion_vals_.end()) {
    for (auto it = vals_up_.begin(); it != vals_up_.end();) {
    if (vals_it->second.count(it->get()) > 0) {
    vals_.erase(it->get());
    it = vals_up_.erase(it);
    } else {
    ++it;
    }
    }
    per_fusion_vals_.erase(vals_it);
    }

    auto exprs_it = per_fusion_exprs_.find(fusion);
    if (exprs_it != per_fusion_exprs_.end()) {
    for (auto it = exprs_up_.begin(); it != exprs_up_.end();) {
    if (exprs_it->second.count(it->get()) > 0) {
    exprs_.erase(it->get());
    it = exprs_up_.erase(it);
    } else {
    ++it;
    }
    }
    per_fusion_exprs_.erase(exprs_it);
    }
    }
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    O(n) linear scan on deque erasure

    removeStatementsOwnedBy iterates the entire vals_up_ and exprs_up_ deques, and calls erase on the deque mid-iteration. Each std::deque::erase in the middle is O(n), making this overall O(n²) in the number of statements.

    For large fusions (which NVIDIA workloads commonly are), this could become a noticeable performance bottleneck, especially since Fusion::clear() is called from the destructor and from copy operations. Consider using a remove-then-erase idiom or std::erase_if (C++20) to reduce this to a single pass:

    Suggested change
    void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {
    auto vals_it = per_fusion_vals_.find(fusion);
    if (vals_it != per_fusion_vals_.end()) {
    for (auto it = vals_up_.begin(); it != vals_up_.end();) {
    if (vals_it->second.count(it->get()) > 0) {
    vals_.erase(it->get());
    it = vals_up_.erase(it);
    } else {
    ++it;
    }
    }
    per_fusion_vals_.erase(vals_it);
    }
    auto exprs_it = per_fusion_exprs_.find(fusion);
    if (exprs_it != per_fusion_exprs_.end()) {
    for (auto it = exprs_up_.begin(); it != exprs_up_.end();) {
    if (exprs_it->second.count(it->get()) > 0) {
    exprs_.erase(it->get());
    it = exprs_up_.erase(it);
    } else {
    ++it;
    }
    }
    per_fusion_exprs_.erase(exprs_it);
    }
    }
    void IrContainer::removeStatementsOwnedBy(const Fusion* fusion) {
    auto vals_it = per_fusion_vals_.find(fusion);
    if (vals_it != per_fusion_vals_.end()) {
    const auto& owned_vals = vals_it->second;
    std::erase_if(vals_up_, [&](const std::unique_ptr<Val>& v) {
    if (owned_vals.count(v.get()) > 0) {
    vals_.erase(v.get());
    return true;
    }
    return false;
    });
    per_fusion_vals_.erase(vals_it);
    }
    auto exprs_it = per_fusion_exprs_.find(fusion);
    if (exprs_it != per_fusion_exprs_.end()) {
    const auto& owned_exprs = exprs_it->second;
    std::erase_if(exprs_up_, [&](const std::unique_ptr<Expr>& e) {
    if (owned_exprs.count(e.get()) > 0) {
    exprs_.erase(e.get());
    return true;
    }
    return false;
    });
    per_fusion_exprs_.erase(exprs_it);
    }
    }

    Comment on lines +191 to +203
    const std::unordered_set<Val*>& IrContainer::valsOwnedBy(
    const Fusion* fusion) const {
    static const std::unordered_set<Val*> empty;
    auto it = per_fusion_vals_.find(fusion);
    return it != per_fusion_vals_.end() ? it->second : empty;
    }

    const std::unordered_set<Expr*>& IrContainer::exprsOwnedBy(
    const Fusion* fusion) const {
    static const std::unordered_set<Expr*> empty;
    auto it = per_fusion_exprs_.find(fusion);
    return it != per_fusion_exprs_.end() ? it->second : empty;
    }
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Static local variables for empty sets

    valsOwnedBy and exprsOwnedBy both use static const local variables to return an empty set by reference when a Fusion has no entries. While functionally correct and safe (since they are const), using static locals in what could become multi-threaded code is worth noting. The const-ness avoids race conditions on the data itself, but if future code ever removes the const, the static would become a thread-safety hazard. An alternative is to use an inline static const class member.

    This is a minor style observation — the current implementation is safe.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

    Comment on lines 112 to +117
    IrContainer::swap(*(a.ir_container()), *(b.ir_container()));

    // Fix parent pointers after swapping containers
    // After swap, each Fusion owns a different IrContainer, so we must
    // update the parent backpointers in those containers to point to their new
    // owners
    // After swapping container contents, per-Fusion tracking keys point to the
    // wrong Fusions. Rename: a's container had b's entries, b's had a's.
    a.ir_container()->transferStatementOwnership(&b, &a);
    b.ir_container()->transferStatementOwnership(&a, &b);
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Swap + transfer is not safe when a and b share the same container

    When a and b share the same IrContainer (i.e., a.ir_container_.get() == b.ir_container_.get()):

    1. IrContainer::swap(*Ca, *Ca) is a no-op (swapping with itself)
    2. a.ir_container()->transferStatementOwnership(&b, &a) merges b's entries into a's
    3. b.ir_container()->transferStatementOwnership(&a, &b) then moves all entries (original a + original b) into b's — leaving a with nothing

    This may not be reachable today, but if shared containers are the end-goal, this path would need a guard (e.g., early return if a.ir_container_.get() == b.ir_container_.get()).

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 18, 2026

    Additional Comments (2)

    csrc/fusion.cpp
    Shared-container safety gap in rollback

    In removeStatementsCreatedAfter, statements are popped from the back of the container-global exprs_up_ deque (line 445), but only the current Fusion's per-fusion map is cleaned up (line 449: c->per_fusion_exprs_[this].erase(e)).

    In a shared container scenario where another Fusion has appended statements after the StatementGuard snapshot, the statement at the back of exprs_up_ could belong to the other Fusion. This code would:

    1. Destroy the unique_ptr (freeing the Expr memory)
    2. Remove it from exprs_ (global set)
    3. Only erase from per_fusion_exprs_[this] — leaving a dangling pointer in the other Fusion's per-fusion set

    The same issue applies to the vals_up_ loop below (line 467).

    This is safe for single-Fusion containers today, but since this PR's stated goal is enabling shared-container safety, the rollback path should also clean up the correct owner's per-fusion map — e.g., by looking up which Fusion actually owns each statement being removed.


    csrc/fusion.h
    numExprs/numVals still return container-global counts

    numExprs() and numVals() forward directly to the container's total counts, not per-Fusion counts. This is inconsistent with the fact that vals(), unordered_exprs(), and deterministic_* now return per-Fusion filtered results.

    More importantly, StatementGuard uses numExprs() and numVals() to snapshot statement counts and later passes them to removeStatementsCreatedAfter, which pops from the global deques. In a shared container where another Fusion adds statements concurrently, the snapshot will include the other Fusion's additions, and the rollback will remove them — destroying statements that don't belong to this Fusion.

    For single-Fusion containers this is fine, but these methods should be updated to return per-Fusion counts (or at least documented as container-global) before shared containers are actually used.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant