Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 67 additions & 3 deletions src/passes/CodeFolding.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@
#include "ir/effects.h"
#include "ir/eh-utils.h"
#include "ir/find_all.h"
#include "ir/iteration.h"
#include "ir/label-utils.h"
#include "ir/utils.h"
#include "pass.h"
Expand Down Expand Up @@ -299,13 +300,78 @@ struct CodeFolding
returnTails.clear();
unoptimizables.clear();
modifieds.clear();
exitingBranchCache.clear();
exitingBranchCachePopulated = false;
if (needEHFixups) {
EHUtils::handleBlockNestedPops(func, *getModule());
}
}
}

private:
// Cache of expressions that have branches exiting to targets defined
// outside them. Populated lazily on first access via hasExitingBranches().
std::unordered_set<Expression*> exitingBranchCache;
bool exitingBranchCachePopulated = false;

bool hasExitingBranches(Expression* expr) {
if (!exitingBranchCachePopulated) {
populateExitingBranchCache(getFunction()->body);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this still scans the entire function. I suggest that we only scan expr itself. That will still avoid re-computing things, but avoid scanning things that we never need to look at.

This does require that the cache store a bool, so we know if we scanned or not, and if we did, if we found branches out or not. But I think that is worth it - usually we will scan very few things.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-expression cache would still be O(N^2) in the nested block case. AssemblyScript GC emits __visit_members with deeply nested blocks + br_table, where the nesting level equals the number of classes (4000+ in real apps). Each nested block gets queried by optimizeTerminatingTails, and each query walks its overlapping subtree independently, giving O(N + (N-1) + ... + 1) = O(N^2) total work even with the cache.

We also cannot reuse a child's cached bool to compute a parent's result, because knowing "child has exiting branches" does not tell us which names exit -- the parent may define/resolve some of them. To compose results bottom-up, we would need to store the full set of unresolved names per expression. I benchmarked that approach (storing unordered_map<Expression*, unordered_set> and propagating name sets upward), but the per-node set allocation overhead on millions of nodes made -Oz significantly slower than the baseline (~13min vs ~5min).

The whole-function scan avoids both issues by computing all results in a single O(N) pass using only integer counters, with no per-node name storage.

exitingBranchCachePopulated = true;
}
return exitingBranchCache.count(expr);
}

// Pre-populate the exiting branch cache for all sub-expressions of root
// in a single O(N) bottom-up walk. After this, exitingBranchCache
// lookups are O(1).
void populateExitingBranchCache(Expression* root) {
struct CachePopulator
: public PostWalker<CachePopulator,
UnifiedExpressionVisitor<CachePopulator>> {
std::unordered_set<Expression*>& cache;
// Track unresolved branch targets at each node. We propagate children's
// targets upward: add uses, remove defs. If any remain, the expression
// has exiting branches.
std::unordered_map<Expression*, std::unordered_set<Name>> targetSets;

CachePopulator(std::unordered_set<Expression*>& cache) : cache(cache) {}

void visitExpression(Expression* curr) {
std::unordered_set<Name> targets;
// Merge children's target sets into ours (move to avoid copies)
ChildIterator children(curr);
for (auto* child : children) {
auto it = targetSets.find(child);
if (it != targetSets.end()) {
if (targets.empty()) {
targets = std::move(it->second);
} else {
targets.merge(it->second);
}
targetSets.erase(it);
}
}
// Add branch uses (names this expression branches to)
BranchUtils::operateOnScopeNameUses(
curr, [&](Name& name) { targets.insert(name); });
// Remove branch defs (names this expression defines as targets)
BranchUtils::operateOnScopeNameDefs(curr, [&](Name& name) {
if (name.is()) {
targets.erase(name);
}
});
bool hasExiting = !targets.empty();
if (hasExiting) {
cache.insert(curr);
targetSets[curr] = std::move(targets);
}
}
};
CachePopulator populator(exitingBranchCache);
populator.walk(root);
}

// check if we can move a list of items out of another item. we can't do so
// if one of the items has a branch to something inside outOf that is not
// inside that item
Expand Down Expand Up @@ -637,9 +703,7 @@ struct CodeFolding
// TODO: this should not be a problem in
// *non*-terminating tails, but
// double-verify that
if (EffectAnalyzer(
getPassOptions(), *getModule(), newItem)
.hasExternalBreakTargets()) {
if (hasExitingBranches(newItem)) {
return true;
}
return false;
Expand Down
Loading