Skip to content

fix: stop idle-reaper restart resurrection leaking dm volumes#48

Merged
markovejnovic merged 6 commits into
mainfrom
fix/idle-reaper-restart-resurrection
Jul 1, 2026
Merged

fix: stop idle-reaper restart resurrection leaking dm volumes#48
markovejnovic merged 6 commits into
mainfrom
fix/idle-reaper-restart-resurrection

Conversation

@markovejnovic

@markovejnovic markovejnovic commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

No description provided.

A :permanent DynamicSupervisor child is restarted on its intentional idle
{:stop, :normal}; init/1 then re-creates the dm resource terminate/2 just
destroyed, leaking and resurrecting hyper-rw-<vm> volumes and image/layer
dm chains. Make Img.Mutable, Img.Server and Layer.Server :temporary so
idle-teardown is final.
Adds a per-node Registry (Hyper.Node.Img.MutableRegistry) keyed by vm_id and
names each Img.Mutable through it, giving a one-mutable-per-vm invariant and a
cheap, non-blocking list of live mutable vm_ids for the Reaper to consult.
The reaper's liveness set was VMSupervisor children union cluster Routing,
which omits a vm whose mutable layer is alive but whose VM is mid-boot (not yet
routed) or in its post-stop idle grace. Union Img.Mutable.active_vm_ids/0 so the
reaper never removes a hyper-rw volume that a live owner still holds.
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/hyper/node/img/mutable.ex 33.33% 2 Missing ⚠️
lib/hyper/node/reaper.ex 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Test Results

377 tests  +6   376 ✅ +6   5s ⏱️ -1s
 67 suites +3     1 💤 ±0 
  2 files   ±0     0 ❌ ±0 

Results for commit c2ef242. ± Comparison against base commit e5e09ef.

♻️ This comment has been updated with latest results.

The restart: :temporary requirement on the refcounted layer servers is a
symptom of coupling external-resource lifetime to BEAM-process lifetime via
terminate/2. Add a TODO(arch) at the reaper sweep (the reconciler-in-waiting)
proposing reconciliation as the primary cleanup mechanism, plus a why-note at
each :temporary site pointing to it.
@markovejnovic markovejnovic merged commit c3a9766 into main Jul 1, 2026
6 checks passed
@markovejnovic markovejnovic deleted the fix/idle-reaper-restart-resurrection branch July 1, 2026 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant