perf: avoid O(N^2) exiting-branch checks in CodeFolding#8599
perf: avoid O(N^2) exiting-branch checks in CodeFolding#8599Changqing-JING wants to merge 2 commits intoWebAssembly:mainfrom
Conversation
1dae3f3 to
66dff99
Compare
| } | ||
| // Pre-populate the cache once at the top level so all subsequent | ||
| // exitingBranchCache_ lookups are O(1). | ||
| if (num == 0) { |
There was a problem hiding this comment.
- We are called more than once with
num == 0, so I think this is doing more work than needed? (there are three calls to this, two withnum == 0as the default value) - We may also not end up needing the cache at all, if other issues stop us earlier.
- We will also only need the cache for some expressions, not the entire function.
To fix those issues, how about making new line 702 call a function that checks for external break targets. That function would lazily populate a cache internally, that is, given a specific expression it would compute it and cache results for that expression and all children (avoiding walking children already in the cache).
There was a problem hiding this comment.
@kripken
Thank you for review
Addressed all three suggestions: cache is now lazily populated on first hasExitingBranches() call (avoiding work when not needed), uses unordered_set instead of map<Expression*, bool>, and a exitingBranchCachePopulated_ bool prevents redundant re-computation when the cache is empty.
daf81f7 to
f263f08
Compare
|
|
||
| // Cache of expressions that have branches exiting to targets defined | ||
| // outside them. Populated lazily on first access via PostWalker. | ||
| std::unordered_set<Expression*> exitingBranchCache_; |
There was a problem hiding this comment.
| std::unordered_set<Expression*> exitingBranchCache_; | |
| std::unordered_set<Expression*> exitingBranchCache; |
We don't use a convention like that for "internal" things.
|
|
||
| // Cache of expressions that have branches exiting to targets defined | ||
| // outside them. Populated lazily on first access via PostWalker. | ||
| std::unordered_set<Expression*> exitingBranchCache_; |
There was a problem hiding this comment.
Please move this down to the function that uses it. Then a single comment will work for both (atm the comment appears twice).
| // efficient bottom-up traversal. | ||
| bool hasExitingBranches(Expression* expr) { | ||
| if (!exitingBranchCachePopulated_) { | ||
| populateExitingBranchCache(getFunction()->body); |
There was a problem hiding this comment.
Looks like this still scans the entire function. I suggest that we only scan expr itself. That will still avoid re-computing things, but avoid scanning things that we never need to look at.
This does require that the cache store a bool, so we know if we scanned or not, and if we did, if we found branches out or not. But I think that is worth it - usually we will scan very few things.
Follow up PR of #8586 to optimize CodeFolding
optimizeTerminatingTailscallsEffectAnalyzerper tail item, each walking the full subtree. On deeply nested blocks this is O(N^2).Replace the per-item walks with a single O(N) bottom-up
PostWalker(populateExitingBranchCache) that pre-computes exiting-branch results for every node, making subsequent lookups O(1).Example: AssemblyScript GC compiles
__visit_membersas abr_tabledispatch over all types, producing ~N nested blocks with ~N tails. The old code walks each tail's subtree separately -- O(N^2) total node visits. With this change, one bottom-up walk covers all nodes, then each tail lookup is O(1).benchmark data
The test module is from issue #7319
#7319 (comment)
In main head
time ./build/bin/wasm-opt -Oz --enable-bulk-memory --enable-multivalue --enable-reference-types --enable-gc --enable-tail-call --enable-exception-handling -o /dev/null ./test3.wasm real 9m16.111s user 35m33.985s sys 0m51.000sIn the PR