Skip to content

fix(ui): guard Barnes-Hut octree against unbounded recursion on coincident nodes#821

Closed
Ljove02 wants to merge 2 commits into
DeusData:mainfrom
Ljove02:fix/ui-graph-octree-oom
Closed

fix(ui): guard Barnes-Hut octree against unbounded recursion on coincident nodes#821
Ljove02 wants to merge 2 commits into
DeusData:mainfrom
Ljove02:fix/ui-graph-octree-oom

Conversation

@Ljove02

@Ljove02 Ljove02 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What & why

octree_insert in src/ui/layout3d.c never terminates when two bodies occupy
the same (or nearly the same) position: it keeps pushing them into the same
child octant, halving half_size each level, until it allocates octree cells
without bound. On large graphs — where coincident/near-coincident positions are
common — this exhausts memory and freezes / SIGKILLs the UI process.

Root cause behind:

The fix

Add a depth / minimum-cell-size floor to octree_insert. When reached (points
are effectively coincident) we stop subdividing and fold the body into the cell
as a mass-weighted aggregate — how Barnes-Hut already treats a far cluster. This
restores the intended O(n log n) time / O(n) memory for any input.

With the runaway gone, the caps that were masking it are lifted:

  • server HARD_MAX_NODES 10000 -> 200000 (DEFAULT_MAX_NODES 2000 -> 60000)
  • UI GRAPH_RENDER_NODE_LIMIT 2000 -> 60000

Verification

A real 45,511-node / 66,839-edge project: /api/layout returns all 45,511 nodes
in ~2.9s, memory flat, no freeze; renders through the existing InstancedMesh
unchanged. Previously this froze the machine (34 GB swap) during layout.

The cap values are a policy choice — happy to gate behind CBM_UI_MAX_RENDER_NODES
or split into a separate commit if you'd prefer the octree guard alone.

…ident nodes

octree_insert never terminated when bodies shared (or nearly shared) a
position: half_size shrank toward zero and octree cells were allocated
without bound, exhausting memory and freezing / SIGKILLing the UI on large
graphs (DeusData#498, DeusData#726, DeusData#402). Add a depth/min-size floor that folds coincident
points into an aggregate cell (mass-weighted centroid), restoring the
octree's O(n log n) time and O(n) memory for any input.

With the runaway removed, lift the node caps that were masking it: server
HARD_MAX_NODES 10000 -> 200000 (default 2000 -> 60000) and the UI's
GRAPH_RENDER_NODE_LIMIT 2000 -> 60000. A 45,511-node / 66,839-edge project
now lays out in ~2.9s and renders via the existing InstancedMesh; it
previously froze the machine.

Signed-off-by: Ljove02 <135197334+Ljove02@users.noreply.github.com>
@Ljove02 Ljove02 requested a review from DeusData as a code owner July 3, 2026 20:53
@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Heads-up: the security / codeql-gate failure on this PR is not your change — it was a repo-side gate bug (the check scanned only the 5 newest CodeQL runs and lost track of PRs on a busy queue). That's fixed on main now (#820). Any push to your branch — or simply clicking GitHub's Update branch button — will trigger a fresh run with the fixed gate and it should go green. Sorry for the noise, and thanks for your patience!

@DeusData DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory ux/behavior Display bugs, docs, adoption UX priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. labels Jul 4, 2026
@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thank you — the octree recursion guard was exactly right, and the coincident-point analysis (unbounded subdivision once half_size underflows the coordinate ULP) precisely explained the 34GB-swap freezes. We carried the guard half over the line as 2e8f0f0 (PR #848) with you credited as co-author: the depth+half_size floor with the mass-weighted centroid fold, and the HARD_MAX_NODES raise so power users of CBM_UI_MAX_RENDER_NODES get the higher ceiling. We deferred the default render-cap change (2000→60000) — that's a real UX/perf policy shift for every user (multi-MB layout JSON, ~3s server layout), and it deserves its own PR and discussion; please do open that as a follow-up, it's a reasonable ask on its own merits. Refs #498/#726/#402 (awaiting reporter confirmation). Closing in favor of the distill — thanks again for a sharp fix!

@DeusData DeusData closed this Jul 4, 2026
rarepops pushed a commit to rarepops/codebase-memory-mcp that referenced this pull request Jul 4, 2026
Coincident (or sub-ULP-separated) bodies made octree_insert subdivide
forever in the graph-UI 3D layout, calloc-ing one cell per level until
the process crashed (stack overflow) or froze the machine allocating
(the 34GB-swap reports). Guard distilled from DeusData#821: octree_insert now
carries a depth and stops at depth 26 or half_size < 1e-4f, folding the
body into the cell as a mass-weighted centroid aggregate
(body_index = -1). octree_repulse already clamps d to 0.01 before the
dx/d division, so folded coincident bodies get exactly zero force and
no NaN.

The default-cap raise bundled in DeusData#821 (DEFAULT_MAX_NODES and
GRAPH_RENDER_NODE_LIMIT 2000 -> 60000) is a UX policy change deferred
to its own discussion per review; HARD_MAX_NODES is raised to 200000 so
opt-in CBM_UI_MAX_RENDER_NODES users get the new ceiling.

Guard test layout_coincident_nodes_bounded drives the public layout API
with same-file nodes whose distinct qualified names share one 32-bit
FNV-1a hash (bit-identical coincident positions on every platform), in
a fork+alarm child so the unfixed runaway cannot take down the runner.

Refs DeusData#498, DeusData#726, DeusData#402

Co-authored-by: Ljove02 <135197334+Ljove02@users.noreply.github.com>
Signed-off-by: Martin Vogel <martin.vogel.tech@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. stability/performance Server crashes, OOM, hangs, high CPU/memory ux/behavior Display bugs, docs, adoption UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants