Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
d5bffcc
feat(suidhelper): mix suidhelper.install wrapping cargo xtask install
markovejnovic Jun 25, 2026
9201730
docs: document full node requirements (postgres, dm targets, kvm, cgr…
markovejnovic Jun 25, 2026
b68dd5e
docs: replace em-dashes with ASCII hyphens in intro.md
markovejnovic Jun 25, 2026
e2253f6
fix(suidhelper): make mix suidhelper.install tty-safe
markovejnovic Jun 25, 2026
4ddcd4d
feat(suidhelper): source device binaries from config, drop caller --bin
markovejnovic Jun 25, 2026
f340571
docs: Postgres quickstart, optional helper config, install caveats
markovejnovic Jun 25, 2026
23a5883
docs: how to load device-mapper modules for the dm targets
markovejnovic Jun 25, 2026
b0c0b5f
docs: how to create the parent cgroup + delegate controllers
markovejnovic Jun 25, 2026
5761a4e
fix(node): start Layer before Budget.Supervisor
markovejnovic Jun 25, 2026
8dc61d3
fix: quiet benign ThinPool port exits and keyless OTLP export
markovejnovic Jun 25, 2026
1320700
fix(thin_pool): reclaim a stale dm pool on init
markovejnovic Jun 25, 2026
6ef56d2
fix(node): validate OCI loader tools at boot
markovejnovic Jun 25, 2026
06cc05b
deslop
markovejnovic Jun 25, 2026
c0685ae
fix: ignore benign port exits in the remaining trap_exit servers
markovejnovic Jun 25, 2026
7fa0480
fix(fire_vmm): child_spec key must be :id, not :vm_id
markovejnovic Jun 25, 2026
2ac546a
fix(scheduler): log why a candidate refused placement
markovejnovic Jun 25, 2026
68c8d4b
feat(node): reclaim orphaned dm/loop devices at boot
markovejnovic Jun 25, 2026
b6ce604
feat(suidhelper): add firecracker/jailer/uid_gid_range to config
markovejnovic Jun 25, 2026
8f671a3
feat(suidhelper): add jailer subcommand that execs the jailer as root
markovejnovic Jun 25, 2026
a5e098c
fix(suidhelper): fail closed on close_range error; _exit on jailer ex…
markovejnovic Jun 25, 2026
5f9f36b
refactor(fire_vmm): drive jailer through suidhelper; drop Provider
markovejnovic Jun 25, 2026
3a58b77
feat(mix): add firecracker.install task to fetch+configure firecracke…
markovejnovic Jun 25, 2026
a8743fc
docs: firecracker/jailer are operator-installed via mix firecracker.i…
markovejnovic Jun 26, 2026
5770f57
fix(config): node and helper read the same firecracker/jailer TOML keys
markovejnovic Jun 26, 2026
ffd46a4
feat(mix): firecracker.install prints the chown/chmod root commands
markovejnovic Jun 26, 2026
202141b
feat(fire_vmm): log jailer/firecracker output and real exit status
markovejnovic Jun 26, 2026
0ecccf5
feat(fire_vmm): surface the API readiness-probe failure reason
markovejnovic Jun 26, 2026
2d48872
feat(suidhelper): add chroot-jail grant-api to hand the API socket to…
markovejnovic Jun 26, 2026
85bc4a4
feat(fire_vmm): grant the API socket before probing so the controller…
markovejnovic Jun 26, 2026
71d6440
fix(hyper): generate alphanumeric vm ids (firecracker rejects _)
markovejnovic Jun 26, 2026
b3c2315
fix(fire_vmm): self-register per-VM names in init, not via a start name
markovejnovic Jun 26, 2026
4196153
fix(suidhelper): grant-api opens the jail root dir for node traversal
markovejnovic Jun 26, 2026
5284878
fix(suidhelper): resolve dm symlink to real node before rootfs open
markovejnovic Jun 26, 2026
f2dc209
feat(fire_vmm): log boot failures in the Configuring state
markovejnovic Jun 26, 2026
725e1ca
fix(suidhelper): cgroup.kill the leaf before removing it
markovejnovic Jun 26, 2026
8fd3817
fix(fire_vmm): guarantee firecracker death on teardown
markovejnovic Jun 26, 2026
09c1ea8
feat(node): periodic Reaper to GC orphaned VM resources
markovejnovic Jun 26, 2026
988b026
refactor(vm): extract Hyper.Vm.Id (type + generator)
markovejnovic Jun 26, 2026
7811ed6
test(suidhelper): tolerate shell-set PWD in jailer empty-env e2e
markovejnovic Jun 26, 2026
b5a0c16
style(node): wrap with_image_lease spec after Vm.Id.t() rename
markovejnovic Jun 26, 2026
dfb6078
refactor(config): make config.toml the single source of truth
markovejnovic Jun 26, 2026
727d08e
refactor(config): nest cgroup + uid_gid_range under [jails] table
markovejnovic Jun 26, 2026
32d84cc
feat(config): read node tool paths from [tools]
markovejnovic Jun 27, 2026
9b72969
feat(config): merge optional /etc/hyper/config.exs at runtime
markovejnovic Jun 27, 2026
46dfd96
docs: highlight cookbook code blocks via makeup_syntect
markovejnovic Jun 27, 2026
ed6ad5d
docs(cookbook): document [tools]/[jails], node tools, user config
markovejnovic Jun 27, 2026
293293b
Merge origin/main into chore/get-a-vm-running
markovejnovic Jun 27, 2026
20b7d78
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 28, 2026
59ddea3
docs(cookbook): drop intro.md changes, keep main's version
markovejnovic Jun 30, 2026
7286728
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 30, 2026
01196b8
fix(node): drop duplicate Dmsetup.list left by merge 7286728
markovejnovic Jun 30, 2026
adee689
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 30, 2026
69084bf
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 30, 2026
0ee0a4f
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 30, 2026
2122bb9
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 30, 2026
5f5f5c5
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 30, 2026
7f6954d
Merge remote-tracking branch 'origin/main' into chore/get-a-vm-running
markovejnovic Jun 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions lib/hyper/cluster/routing.ex
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,26 @@ defmodule Hyper.Cluster.Routing do
@spec via(term()) :: {:via, module(), {atom(), term()}}
def via(key), do: {:via, Horde.Registry, {@name, key}}

@doc """
Register the calling process under `key` from inside its own `init`.

Prefer this over starting a process with a `{:via, Horde.Registry, _}` name.
OTP's post-start name check (`gen:get_proc_name`) calls `whereis_name`
immediately after the synchronous `register`, but Horde materialises the name
into its local ETS only asynchronously, via the DeltaCRDT diff loop. Under
registry churn that read loses the race and OTP aborts startup with
`{:process_not_registered_via, Horde.Registry}`. Registering from within
`init` carries no such self-check, while leaving the name cluster-resolvable
through `via/1` once the diff propagates (callers already tolerate that lag).
"""
@spec register_self(term()) :: :ok | {:error, {:already_registered, pid()}}
def register_self(key) do
case Horde.Registry.register(@name, key, nil) do
{:ok, _pid} -> :ok
{:error, {:already_registered, _pid}} = err -> err
end
end

@doc "Which node currently runs `vm_id`? `nil` if unknown."
@spec whereis(Hyper.Vm.Id.t()) :: node() | nil
@decorate with_span("Hyper.Cluster.Routing.whereis", include: [:vm_id])
Expand Down
20 changes: 19 additions & 1 deletion lib/hyper/node.ex
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@ defmodule Hyper.Node do
@spec test_system :: :ok | {:error, term()}
def test_system do
with {:ok, _} <- Hyper.Cfg.Budget.load(),
:ok <- Hyper.Node.FireVMM.Provider.ensure_installed(),
:ok <- check_firecracker_bins(),
:ok <- Hyper.Node.FireVMM.VmLinux.Provider.ensure_installed(),
:ok <- Hyper.Node.Vmlinux.test_system(),
:ok <- Hyper.Img.OciLoader.Umoci.ensure_installed(),
Expand All @@ -175,6 +175,24 @@ defmodule Hyper.Node do
end
end

@spec check_firecracker_bins ::
:ok
| {:error, {:firecracker_bin_missing | :jailer_bin_missing, Path.t()}}
| {:error, :firecracker_not_configured | :jailer_not_configured}
defp check_firecracker_bins do
with {:fc, {:ok, fc}} <- {:fc, Hyper.Cfg.Tools.firecracker_configured()},
{:jail, {:ok, jail}} <- {:jail, Hyper.Cfg.Tools.jailer_configured()} do
cond do
not Sys.Posix.executable?(fc) -> {:error, {:firecracker_bin_missing, fc}}
not Sys.Posix.executable?(jail) -> {:error, {:jailer_bin_missing, jail}}
true -> :ok
end
else
{:fc, :error} -> {:error, :firecracker_not_configured}
{:jail, :error} -> {:error, :jailer_not_configured}
end
end

@spec check_helper_base(Path.t()) ::
:ok | {:error, {:suid_helper_base_mismatch, Path.t(), Path.t()}}
defp check_helper_base(base) do
Expand Down
33 changes: 21 additions & 12 deletions lib/hyper/node/fire_vmm.ex
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,15 @@ defmodule Hyper.Node.FireVMM do

@spec start_link(Opts.t()) :: Supervisor.on_start()
def start_link(opts) do
Supervisor.start_link(__MODULE__, opts, name: via(opts.vm_id))
Supervisor.start_link(__MODULE__, opts)
end

@spec child_spec(Opts.t()) :: Supervisor.child_spec()
def child_spec(opts) do
# Keyed by VM id and :transient so a cleanly-stopped VM is not rebooted by
# the node-level DynamicSupervisor.
%{
vm_id: {__MODULE__, opts.vm_id},
id: {__MODULE__, opts.vm_id},
start: {__MODULE__, :start_link, [opts]},
type: :supervisor,
restart: :transient
Expand All @@ -63,18 +64,26 @@ defmodule Hyper.Node.FireVMM do

@impl true
def init(opts) do
children = [
# Client must be registered before Core: Core starts the State machine,
# which calls Client.run while waiting for the daemon's API. Client depends
# only on vm_id (an independent peer), so it has no reverse dependency.
{Client, %Client.Opts{vm_id: opts.vm_id}},
{Core, opts}
]
# Self-register the cluster routing entry here rather than via a start name;
# see `Hyper.Cluster.Routing.register_self/1`. A fresh random vm_id never
# collides, so `:already_registered` only happens against a stale dead
# incarnation - decline the start and let the supervisor retry clean.
case Hyper.Cluster.Routing.register_self({opts.vm_id, :supervisor}) do
:ok ->
children = [
# Client must be registered before Core: Core starts the State machine,
# which calls Client.run while waiting for the daemon's API. Client
# depends only on vm_id (an independent peer), so no reverse dependency.
{Client, %Client.Opts{vm_id: opts.vm_id}},
{Core, opts}
]

Supervisor.init(children, strategy: :one_for_one)
end
Supervisor.init(children, strategy: :one_for_one)

defp via(vm_id), do: Hyper.Cluster.Routing.via({vm_id, :supervisor})
{:error, _} ->
:ignore
end
end

@doc "Test whether the system can run firecracker VMMs."
@spec test_system() :: :ok | {:error, term()}
Expand Down
31 changes: 21 additions & 10 deletions lib/hyper/node/fire_vmm/client.ex
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,12 @@ defmodule Hyper.Node.FireVMM.Client do
@type t :: %__MODULE__{socket_path: Path.t()}
end

# Prod path (vm_id, no explicit name) starts unnamed and self-registers in
# `init` - see `Hyper.Cluster.Routing.register_self/1`. A `:name` override
# (test stand-ins) is honoured as a plain local name and skips registration.
@spec start_link(Opts.t()) :: GenServer.on_start()
def start_link(%Opts{} = opts) do
name =
case opts.name do
nil when not is_nil(opts.vm_id) -> via(opts.vm_id)
other -> other
end

GenServer.start_link(__MODULE__, opts, gen_opts(name))
GenServer.start_link(__MODULE__, opts, gen_opts(opts.name))
end

@spec via(Hyper.Vm.Id.t()) :: GenServer.name()
Expand All @@ -79,12 +76,26 @@ defmodule Hyper.Node.FireVMM.Client do
end

@impl true
@spec init(Opts.t()) :: {:ok, State.t()}
@spec init(Opts.t()) :: {:ok, State.t()} | {:stop, {:already_registered, pid()}}
def init(%Opts{} = opts) do
socket_path = opts.socket_path || Jailer.host_socket(opts.vm_id)
{:ok, %State{socket_path: socket_path}}
with :ok <- register(opts) do
socket_path = opts.socket_path || Jailer.host_socket(opts.vm_id)
{:ok, %State{socket_path: socket_path}}
end
end

# Register cluster-wide under {vm_id, :client} on the prod path. With an
# explicit name (test stand-in), the name is the local registration, so skip.
@spec register(Opts.t()) :: :ok | {:stop, {:already_registered, pid()}}
defp register(%Opts{name: nil, vm_id: vm_id}) when not is_nil(vm_id) do
case Hyper.Cluster.Routing.register_self({vm_id, :client}) do
:ok -> :ok
{:error, reason} -> {:stop, reason}
end
end

defp register(%Opts{}), do: :ok

@impl true
def handle_call({:run, op_fun}, _from, %State{socket_path: socket_path} = state) do
{:reply, op_fun.(socket_path: socket_path), state}
Expand Down
10 changes: 7 additions & 3 deletions lib/hyper/node/fire_vmm/core.ex
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,9 @@ defmodule Hyper.Node.FireVMM.Core do
* firecracker crash -> the `Daemon` child exits; both restart; `Daemon`
resets the stale jail and relaunches, and the fresh controller cold-boots.

`MuonTrap` kills the OS process when its port closes (teardown or BEAM death),
so no firecracker process outlives the supervisor.
`Daemon` guarantees firecracker is dead on teardown via the helper's
`cgroup.kill` (MuonTrap's port-close kill misses the setsid'd firecracker), so
no firecracker process outlives a graceful supervisor shutdown.
"""

use Supervisor
Expand All @@ -28,9 +29,12 @@ defmodule Hyper.Node.FireVMM.Core do
alias Hyper.Node.FireVMM.Daemon
alias Hyper.Node.FireVMM.State

# Started unnamed: nothing resolves the core by name (it is addressed as a
# child of `Hyper.Node.FireVMM`), so it needs no registry entry - and avoids a
# needless racy Horde registration at startup.
@spec start_link(FireVMM.Opts.t()) :: Supervisor.on_start()
def start_link(opts) do
Supervisor.start_link(__MODULE__, opts, name: Hyper.Cluster.Routing.via({opts.vm_id, :core}))
Supervisor.start_link(__MODULE__, opts)
end

@impl true
Expand Down
123 changes: 98 additions & 25 deletions lib/hyper/node/fire_vmm/daemon.ex
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,37 @@ defmodule Hyper.Node.FireVMM.Daemon do
The jailed firecracker OS process for one microVM, as a static child of
`Hyper.Node.FireVMM.Core`.

Lifecycle is supervisor-owned. On every (re)start it first resets any stale
jail left by a prior incarnation - the firecracker jailer refuses to reuse an
existing chroot - then builds the jailer command and runs it under
`MuonTrap.Daemon`, which kills the OS process when its port closes (controller
crash, container teardown, or BEAM death). So no firecracker process outlives
the supervisor, and `Core`'s `:one_for_all` restarting this child (e.g. after a
firecracker crash) cleanly cold-boots against a fresh jail.

The supervised process *is* the `MuonTrap.Daemon` - `start_link/1` does the
reset, then delegates and returns that pid.
A `trap_exit` GenServer that owns firecracker's lifetime end to end:

* on every (re)start it resets any stale jail left by a prior incarnation —
the firecracker jailer refuses to reuse an existing chroot — then launches
the jailer under a linked `MuonTrap.Daemon`. The supervised process is
`hyper-suidhelper jailer ...`, which `execve`s into the jailer (same pid).
* if firecracker exits, the linked `MuonTrap.Daemon` exits and this server
stops with that reason, so `Core`'s `:one_for_all` cold-boots the pair.
* on teardown it **guarantees firecracker is dead**: `MuonTrap`'s port-close
kills by process group, but the jailer `setsid`s firecracker into its own
session, so it escapes that kill and would leak (holding the cgroup, the
rootfs dm device, loop devices). `terminate/2` therefore runs the helper's
`cgroup.kill` teardown (`ChrootJail.remove`), which SIGKILLs the whole leaf
cgroup regardless of session. The same call on (re)start cleans up after a
prior incarnation the BEAM could not (a SIGKILL'd node leaves no
`terminate/2`); the periodic `Hyper.Node.Reaper` is the final backstop.
"""

use GenServer
use OpenTelemetryDecorator

alias Hyper.Node.FireVMM.{Jailer, Opts}
alias Hyper.SuidHelper
alias Unit.Time

use OpenTelemetryDecorator
require Logger

@shutdown_timeout Time.s(5)

defstruct [:opts, :muontrap]

@spec child_spec(Opts.t()) :: Supervisor.child_spec()
def child_spec(%Opts{} = opts) do
%{
Expand All @@ -34,20 +45,82 @@ defmodule Hyper.Node.FireVMM.Daemon do
}
end

@doc """
Reset the VM's stale jail, then launch the jailer under `MuonTrap.Daemon` and
return its pid. Fails (so the supervisor retries) if the reset cannot run.
"""
@spec start_link(Opts.t()) :: {:ok, pid()} | {:error, term()}
@decorate with_span("Hyper.Node.FireVMM.Daemon.start_link", include: [:id])
def start_link(%Opts{vm_id: id} = opts) do
with :ok <- SuidHelper.ChrootJail.remove(Jailer.chroot_dir(id), Jailer.cgroup_dir(id)) do
cmd = Jailer.command(opts)

case MuonTrap.Daemon.start_link(cmd.binary, cmd.args, []) do
{:ok, pid} -> {:ok, pid}
{:error, _} = err -> err
end
@spec start_link(Opts.t()) :: GenServer.on_start()
def start_link(%Opts{} = opts) do
GenServer.start_link(__MODULE__, opts)
end

@impl true
@decorate with_span("Hyper.Node.FireVMM.Daemon.init", include: [:id])
def init(%Opts{vm_id: id} = opts) do
# Trap exits so the linked MuonTrap's exit reaches `handle_info` (not a silent
# link kill) and so `terminate/2` runs on supervisor shutdown.
Process.flag(:trap_exit, true)

with :ok <- reset_stale_jail(id),
{:ok, muontrap} <- launch(opts) do
{:ok, %__MODULE__{opts: opts, muontrap: muontrap}}
else
{:error, reason} -> {:stop, reason}
end
end

# firecracker (the linked MuonTrap.Daemon) exited: stop with its reason so
# `Core`'s `:one_for_all` discards the controller too and cold-boots the pair.
@impl true
def handle_info({:EXIT, muontrap, reason}, %__MODULE__{muontrap: muontrap} = state) do
{:stop, reason, state}
end

def handle_info(_msg, state), do: {:noreply, state}

# Guarantee firecracker is dead and its jail cleared. MuonTrap cannot kill the
# setsid'd firecracker; the helper's `cgroup.kill` can. Best-effort: a failure
# here is logged, and the `Reaper` will retry, but it must not crash teardown.
@impl true
@decorate with_span("Hyper.Node.FireVMM.Daemon.terminate", include: [:id])
def terminate(_reason, %__MODULE__{opts: %Opts{vm_id: id}}) do
case clear_jail(id) do
:ok ->
:ok

{:error, reason} ->
Logger.error("vm #{id}: teardown failed to clear jail: #{inspect(reason)}")
end
end

@spec reset_stale_jail(Hyper.Vm.Id.t()) :: :ok | {:error, term()}
defp reset_stale_jail(id), do: clear_jail(id)

@spec clear_jail(Hyper.Vm.Id.t()) :: :ok | {:error, term()}
defp clear_jail(id) do
SuidHelper.ChrootJail.remove(Jailer.chroot_dir(id), Jailer.cgroup_dir(id))
end

@spec launch(Opts.t()) :: {:ok, pid()} | {:error, term()}
defp launch(%Opts{vm_id: id} = opts) do
cmd = Jailer.command(opts)

# Surface what the jailed process actually does: `log_output` routes the
# helper/jailer/firecracker stdout+stderr (guest serial console included)
# to the Logger, and `exit_status_to_reason` turns MuonTrap's opaque
# `:error_exit_status` into `{:firecracker_exited, status}` so a crash
# report names the real exit code instead of hiding it.
daemon_opts = [
log_output: :info,
log_prefix: "vm #{id} firecracker: ",
stderr_to_stdout: true,
exit_status_to_reason: &{:firecracker_exited, &1}
]

case MuonTrap.Daemon.start_link(cmd.binary, cmd.args, daemon_opts) do
{:ok, pid} ->
Logger.info("vm #{id}: jailer launched under MuonTrap (#{inspect(pid)})")
{:ok, pid}

{:error, reason} = err ->
Logger.error("vm #{id}: jailer failed to launch: #{inspect(reason)}")
err
end
end
end
Loading
Loading