Skip to content

[IR Container] Phase 2 IR Container Refactor#5975

Draft
mdavis36 wants to merge 1 commit intomainfrom
md/phase2-ir-refactor
Draft

[IR Container] Phase 2 IR Container Refactor#5975
mdavis36 wants to merge 1 commit intomainfrom
md/phase2-ir-refactor

Conversation

@mdavis36
Copy link
Collaborator

Summary

Complete Phase 2 of the IR Container Refactor: transition IrContainer from unique_ptr to shared_ptr ownership, enabling multiple Fusions to share one container while maintaining independent IR graphs. This lays the foundation for Phase 3 scalar unification and Host IR JIT Intermediate compilation.

The work is organized into two layers: foundation changes (PRs A–B) that restructure Fusion to own all per-Fusion state and control statement registration, and shared_ptr changes (PRs 1–4) that add shared ownership, per-Fusion tracking, copy/move/swap semantics, and thread safety.

Motivation

The nvFuser segmenter splits a user-authored Fusion into segment Fusions, each of which currently gets a full clone of all IR nodes — including scalars that are identical across segments. This duplication prevents a single ExpressionEvaluator from evaluating buffer sizes across segment boundaries, requiring expensive per-segment evaluation.

Phase 2 enables shared container storage so that Phase 3 can reuse scalars:

Phase 1 (unique_ptr — separate containers, duplicated scalars):
  completeFusion → [Container_0] → {s0, s1, tv0, tv1}
  segment_0      → [Container_1] → {s0', s1', tv0', tv1'}    // Full clones
  segment_1      → [Container_2] → {s0'', s1'', tv2', tv3'}  // Full clones

Phase 2 (shared_ptr — shared containers, duplicated scalars):
  completeFusion ─┐
  segment_0      ─┼──→ [Shared Container] → {s0, s1, tv0, tv1, tv0', tv1', tv2', tv3'}
  segment_1      ─┘

Phase 3 (shared_ptr — shared containers, shared scalars):
  completeFusion ─┐
  segment_0      ─┼──→ [Shared Container] → {s0, s1, tv0, tv1, tv0', tv1', tv2', tv3'}
  segment_1      ─┘
  // s0, s1 are SHARED (same Val* objects) — not cloned
  // Single ExpressionEvaluator bound to s0, s1 evaluates all segments

What Changed

Foundation (PRs #5954#5958) — Restructure Fusion as owner of per-Fusion state:

Component Change
Special values zero_val_, one_val_, etc. moved from IrContainer to Fusion (lazy-cached)
Axioms & metadata axioms_, metadata_ moved from IrContainer to Fusion
Statement registration registerVal/registerExpr/removeVal/removeExpr inlined into Fusion
StatementGuard Rollback nulls special val cache pointers to prevent dangling references
SubstituteInExpr Guards against self-substitution (exposed by shared zero_val_ identity)

shared_ptr transition (PRs #5960, #5961, #5964, #5971) — Enable shared container storage:

Component Change
Pointer type unique_ptr<IrContainer>shared_ptr<IrContainer>
Fusion tracking IrContainer tracks sharing Fusions via sharing_fusions_ set
Per-Fusion tracking per_fusion_vals_, per_fusion_exprs_ maps for ownership queries
Accessor filtering vals(), deterministic_vals(), etc. return only the calling Fusion's nodes
Copy semantics Copy constructor shares container, clones nodes into shared storage
Move/swap Ownership-filtered pointer swap with three-case handling
Name counters Per-Fusion val_type_name_map_ ensures cloned Vals get matching names
Thread safety std::shared_mutex with two-layer locking (IrContainer self-locking + Fusion ContainerMutator)
Fusion::clear() Uses removeStatementsOwnedBy(this) instead of ir_container()->clear()

Architecture Before and After

BEFORE (main):
  class Fusion {
      unique_ptr<IrContainer> ir_container_;  // Exclusive ownership
      // Special vals, axioms, metadata, registration — all in IrContainer
  };

AFTER (this PR):
  class Fusion {
      shared_ptr<IrContainer> ir_container_;  // Shared ownership

      // Per-Fusion state (moved from IrContainer):
      Val* zero_val_, *one_val_, *true_val_, *false_val_;
      NamedScalar* magic_zero_val_;
      std::vector<Val*> axioms_;
      std::unordered_map<Val*, Val*> metadata_;
      std::unordered_map<ValType, int64_t> val_type_name_map_;  // Name counters
      int64_t expr_name_counter_;

      // Statement registration inlined (was in IrContainer)
      void registerVal(Val*);
      void registerExpr(Expr*);
      void removeVal(Val*);
      void removeExpr(Expr*);
  };

  class IrContainer {
      mutable std::shared_mutex mutex_;                              // Thread safety
      std::unordered_set<Fusion*> sharing_fusions_;                  // Fusion tracking
      std::unordered_map<Fusion*, std::unordered_set<Val*>> per_fusion_vals_;   // Ownership tracking
      std::unordered_map<Fusion*, std::unordered_set<Expr*>> per_fusion_exprs_;
      // Pure storage: vals_up_, exprs_up_, vals_, exprs_, deterministic_*
  };

Key Design Invariants

  1. Statement ownership is singular. Each Statement has exactly one owning Fusion.
  2. Fusion accessors filter by ownership. fusion->vals() =/= ir_container()->vals().
  3. Mutations only affect owned statements. Swap/move never touch other Fusions' nodes.
  4. Fusion::clear() only clears this Fusion's state. Not the shared container.
  5. Moved-from Fusion is valid. Gets a new empty container.
  6. Container lifetime is reference-counted. Destroyed when last Fusion is destroyed.

Validation Results

Test Suite Result
Full C++ suite (6224 tests) 5280 passed, 2 CUDA OOM (environmental), ~940 skipped
Host IR tests (84 tests) 82 passed, 2 skipped
Python repro tests (92 tests) 80 passed, 12 skipped
Python frontend tests (235 tests) 233 passed, 1 CUDA OOM (environmental)
Smoke tests (34 tests) 34/34 passed
ASAN smoke tests (34 tests) 34/34 passed, ASAN clean
ASAN copy/move tests (122 tests) 121 passed, 1 skipped, ASAN clean

Zero regressions from the shared_ptr changes. All failures are pre-existing CUDA out-of-memory errors.

PR Chain

This comprehensive PR is composed of six atomic, individually CI-green increments:

Foundation (standalone, CI-clean Fusion API restructuring):

shared_ptr transition:

  1. PR [IR Container] Phase 2.3 Basic shared ptr #5960 — shared_ptr + Fusion tracking infrastructure
  2. PR [IR Container] Phase 2.4 Per-fusion statement tracking  #5961 — Per-Fusion statement tracking & ownership-filtered accessors
  3. PR [IR Container] Phase 2.5 Copy-Move Semantics #5964 — Copy/move/swap semantics + per-Fusion name counters
  4. PR [IR Container] Phase 2.6 Concurrency & Thread Safety #5971 — Thread safety + re-enable parallel compilation + dead code removal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant