Skip to content

feat(node): periodic Reaper to GC orphaned VM resources#44

Merged
markovejnovic merged 1 commit into
mainfrom
feat/reaper-orphan-gc
Jun 30, 2026
Merged

feat(node): periodic Reaper to GC orphaned VM resources#44
markovejnovic merged 1 commit into
mainfrom
feat/reaper-orphan-gc

Conversation

@markovejnovic

@markovejnovic markovejnovic commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

No description provided.

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 7.31707% with 38 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
lib/hyper/node/reaper.ex 0.00% 36 Missing ⚠️
lib/hyper/node/fire_vmm/jailer.ex 0.00% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

Test Results

324 tests  +14   323 ✅ +14   4s ⏱️ ±0s
 59 suites + 2     1 💤 ± 0 
  2 files   ± 0     0 ❌ ± 0 

Results for commit c455162. ± Comparison against base commit da96652.

♻️ This comment has been updated with latest results.

An unclean VM death (host OOM-kill, `:erlang.halt`, a crash before the FireVMM
relaunch-time cleanup runs) can strand a firecracker cgroup leaf and a
`hyper-rw-*` thin volume whose vm_id never reboots, so nothing ever clears them.

Add `Hyper.Node.Reaper`, a periodic liveness-aware GC started last under
`Hyper.Node`. Each tick it lists the cgroup leaves under the parent cgroup and
the `hyper-rw-*` dm volumes, subtracts the VMs this node still runs (via the VM
supervisor and the routing CRDT), and -- only after a leaf shows up orphaned on
two consecutive ticks (`Plan.confirm/2`, so a VM mid-boot is never reaped) --
removes the chroot/cgroup via the helper and the dm volume. Entirely
best-effort: every failure is logged and the next tick retries.

The decision logic is the pure `Hyper.Node.Reaper.Plan` (rw_ids/orphans/confirm),
covered by example + property tests; the GenServer is a thin shell around it. The
tick interval is a fixed constant -- the two-strike confirmation already makes
the exact value non-load-bearing, so nothing here is configurable.

Also corrects `Jailer.cgroup_dir/1` to `<parent>/<id>` (no `<exec>` level): the
jailer (cgroup v2, `--parent-cgroup`) places firecracker directly under the
parent cgroup, confirmed via `/proc/<pid>/cgroup`. This also fixes the existing
FireVMM teardown (`daemon.ex`), which removes the same leaf, and is what lets the
Reaper enumerate leaves by listing the parent dir.

Extracted from the VM-boot branch (chore/get-a-vm-running).
@markovejnovic markovejnovic force-pushed the feat/reaper-orphan-gc branch from dbca262 to c455162 Compare June 30, 2026 20:35
@markovejnovic markovejnovic merged commit 272e2de into main Jun 30, 2026
6 checks passed
@markovejnovic markovejnovic deleted the feat/reaper-orphan-gc branch June 30, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant