Add alias-aware token threading for memory operations.#89
Add alias-aware token threading for memory operations.#89shreyas-omkar wants to merge 1 commit intoJuliaGPU:mainfrom
Conversation
Introduce alias analysis–based token threading: - Group pointers into alias sets. - Maintain per-alias-set token chains. - Thread tokens only between potentially aliasing operations. - Conservatively fall back to the global set for unknown pointers. - Preserve existing control-flow token merging semantics. Enables independent memory operations to execute without unnecessary serialization.
maleadt
left a comment
There was a problem hiding this comment.
Did you test this with a concrete example that would benefit from it?
| for arg in stmt.args[2:end] | ||
| # Find the pointer argument and propagate | ||
| arg_aliases = tracker[arg] | ||
| if arg_aliases !== ALIAS_UNIVERSE || arg_aliases isa Set |
There was a problem hiding this comment.
What else can arg_aliases be if not ALIAS_UNIVERSE or an AliasSet?
There was a problem hiding this comment.
Yes, this condition is redundant. Will fix it.
| function is_tile_array_constructor(func) | ||
| # Check if this is a TileArray constructor call | ||
| # You'll need to detect the specific GlobalRef for TileArray | ||
| return false # TODO: implement | ||
| end |
There was a problem hiding this comment.
TileArrays are never constructed in the kernel. Or do you mean tensor and partition views?
There was a problem hiding this comment.
You're right, misnaming on my part. Renaming this to is_partition_or_tensor_view and implementing it to detect partition/tensor view call sites. The intent was to identify the point where a new SSA value gets a distinct alias set rooted at a specific base argument
| # Block has args, body, terminator | ||
| # body is an iterator that yields (ssa_idx, entry) where entry has .stmt and .typ | ||
| for (ssa_idx, entry) in block.body | ||
| analyze_statement!(tracker, SSAValue(ssa_idx), entry.stmt) | ||
| end | ||
| return |
There was a problem hiding this comment.
The flat traversal was intentional as a first pass wanted to establish correct alias propagation at the top level before handling the loop/branch cases, since nested blocks raise questions about how loop carried pointer SSA values should inherit alias sets across iterations. Will add the recursion now and descend into nested blocks from analyze_statement!. Have a benchmark with an interleaved multi-array kernel in progress to confirm per-alias chains form correctly across the branch boundaries before pushing.
Feat #1
Introduce alias analysis based token threading:
Enables independent memory operations to execute without unnecessary serialization.