You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
elfuse interrupts a running vCPU (for the cross-process guest-signal transport
and for the per-iteration safety timeout) by sending a host signal to the vCPU
thread, whose handler calls hv_vcpus_exit():
SIGUSR2 — cross-process guest-signal doorbell. proc_send_guest_signal()
(src/syscall/proc.c:358) writes the guest signum to /tmp/elfuse-sig-<pid>
and sends host SIGUSR2; the receiver's handler guest_signal_transport_handler() (proc.c:965) sets g_external_guest_signal
and calls hv_vcpus_exit(&g_timeout_vcpu, 1).
Because the signal is delivered to the vCPU thread while it is inside hv_vcpu_run, Apple HVF aborts the run with HV_EXIT_REASON_UNKNOWN (0x3)
instead of the clean HV_EXIT_REASON_CANCELED (0) that hv_vcpus_exit()
produces for a vCPU caught between runs. The run loop must therefore treat
UNKNOWN as a possible cancellation, which is ambiguous: a genuine hypervisor
fault could in principle also surface as UNKNOWN, so blindly resuming risks a
silent spin (raised by a cubic review on #76).
#76 (fix-cross-process-signal-el0) carries the EL0-preemption delivery fix and
routes UNKNOWN through the cancellation handler so the already-queued guest
signal is delivered instead of crashing the child. That keeps the ambiguity: the
run loop cannot tell our own preemption from a genuine fault, so it errs toward
resuming. This issue removes the ambiguity at its source rather than reasoning
about it after the fact.
Note: a real hv_vcpu_run API failure (non-HV_SUCCESS return) is already
caught by HV_CHECK_CTX (proc.c:1113) and crashes immediately. The only thing
that reaches the UNKNOWN branch is HV_SUCCESS + exit_reason == UNKNOWN,
i.e. our own hv_vcpus_exit landing mid-execution.
Goal
Deliver every self-directed hv_vcpus_exit from a thread other than the
vCPU thread, so hv_vcpu_run always returns a clean CANCELED. Once no
legitimate path can produce UNKNOWN, the run loop can treat any UNKNOWN as a
hard hypervisor failure and crash with diagnostics — no heuristic needed.
Why this is not blocked by an HVF constraint
The relevant HVF rules are:
a VM (hv_vm_create) is per process (one per elfuse --fork-child);
a vCPU is bound to the thread that created it (hv_vcpu_run must run on
that thread);
hv_vcpus_exit() is explicitly designed to be called from another thread
to force a VMEXIT.
The helper thread lives in the same host process as the vCPU thread, so there
is no cross-process issue — this is the idiomatic HVF watchdog pattern. The
current code already calls hv_vcpus_exit() on a stored handle
(g_timeout_vcpu), proving the handle is usable off-thread; this change only
moves the call to a dedicated thread.
Work items
1. Signal-mask discipline. Block SIGUSR2 (and SIGALRM, see item 2) on
every vCPU thread and the main thread via pthread_sigmask, leaving them
unblocked only on the dedicated sigwait thread. Establish the mask before
any vCPU thread is created, at every thread-creation site (bootstrap, CLONE_THREAD workers, fork-child bring-up), and re-establish it across fork/exec so children do not inherit a stale disposition. A single
missed site silently reintroduces UNKNOWN.
2. Move SIGALRM onto the same path. The "any UNKNOWN is abnormal"
invariant only holds if all signal-driven hv_vcpus_exit calls leave the
vCPU thread. Re-home the per-iteration timeout: either have the helper
thread own the timeout (e.g. sigtimedwait/timer + hv_vcpus_exit) or
replace alarm() with a mechanism that does not deliver SIGALRM to the
vCPU thread. Preserve the existing g_timed_out → CRASH_TIMEOUT
(exit 124) semantics and the guest ITIMER_REAL emulation that currently
shares alarm().
3. Live-vCPU registry. Replace the single g_timeout_vcpu global with a
per-process registry of all live vCPU handles (multi-threaded guests run
worker vCPUs, each on its own thread — g_timeout_vcpu is currently
last-writer-wins). The helper thread kicks the correct set on a transport
event; hv_vcpus_exit() accepts a vCPU array, so a single call can exit
all of them and let the signal/queue machinery sort out delivery. Register
on vCPU create, unregister on thread exit, guard with a lock.
4. Empirically verify CANCELED, not UNKNOWN. Confirm under stress that
helper-thread hv_vcpus_exit() against an actively-running vCPU yields CANCELED with zero UNKNOWN across many iterations (single- and
multi-threaded guests, and the cross-process fork case). The CANCELED-vs-
UNKNOWN split is supported by the code's own comments, but Apple HVF has
quirks — validate before making UNKNOWN fatal.
test-fork passes 100% over a large batch (e.g. 200+ runs); no elfuse --fork-child orphans left behind.
test-signal, test-signal-thread, test-mt-fork, and the timeout=0
validation suite stay green.
A stress harness records 0 HV_EXIT_REASON_UNKNOWN exits during heavy
cross-process signalling.
Risks / open questions
Signal-mask plumbing touches every thread-creation and fork/exec path; a
missed site is silent (reintroduces UNKNOWN) — mitigated by item 4's stress
check.
Extra thread per guest process: small overhead, but interacts with the
fork model (the helper must be re-created, not inherited, in each child).
Does the helper-thread hv_vcpus_exit reliably interrupt a vCPU blocked in a
host syscall issued from inside the HVC handler (e.g. nanosleep), or only
one executing guest code? Confirm the cross-process wake still works for a
child parked in a blocking host syscall.
Background
elfuse interrupts a running vCPU (for the cross-process guest-signal transport
and for the per-iteration safety timeout) by sending a host signal to the vCPU
thread, whose handler calls
hv_vcpus_exit():proc_send_guest_signal()(
src/syscall/proc.c:358) writes the guest signum to/tmp/elfuse-sig-<pid>and sends host SIGUSR2; the receiver's handler
guest_signal_transport_handler()(proc.c:965) setsg_external_guest_signaland calls
hv_vcpus_exit(&g_timeout_vcpu, 1).alarm_handler()setsg_timed_outandcalls
hv_vcpus_exit(&g_timeout_vcpu, 1).Because the signal is delivered to the vCPU thread while it is inside
hv_vcpu_run, Apple HVF aborts the run withHV_EXIT_REASON_UNKNOWN(0x3)instead of the clean
HV_EXIT_REASON_CANCELED(0) thathv_vcpus_exit()produces for a vCPU caught between runs. The run loop must therefore treat
UNKNOWN as a possible cancellation, which is ambiguous: a genuine hypervisor
fault could in principle also surface as UNKNOWN, so blindly resuming risks a
silent spin (raised by a cubic review on #76).
#76 (
fix-cross-process-signal-el0) carries the EL0-preemption delivery fix androutes UNKNOWN through the cancellation handler so the already-queued guest
signal is delivered instead of crashing the child. That keeps the ambiguity: the
run loop cannot tell our own preemption from a genuine fault, so it errs toward
resuming. This issue removes the ambiguity at its source rather than reasoning
about it after the fact.
Note: a real
hv_vcpu_runAPI failure (non-HV_SUCCESSreturn) is alreadycaught by
HV_CHECK_CTX(proc.c:1113) and crashes immediately. The only thingthat reaches the UNKNOWN branch is
HV_SUCCESS+exit_reason == UNKNOWN,i.e. our own
hv_vcpus_exitlanding mid-execution.Goal
Deliver every self-directed
hv_vcpus_exitfrom a thread other than thevCPU thread, so
hv_vcpu_runalways returns a cleanCANCELED. Once nolegitimate path can produce UNKNOWN, the run loop can treat any UNKNOWN as a
hard hypervisor failure and crash with diagnostics — no heuristic needed.
Why this is not blocked by an HVF constraint
The relevant HVF rules are:
hv_vm_create) is per process (one perelfuse --fork-child);hv_vcpu_runmust run onthat thread);
hv_vcpus_exit()is explicitly designed to be called from another threadto force a VMEXIT.
The helper thread lives in the same host process as the vCPU thread, so there
is no cross-process issue — this is the idiomatic HVF watchdog pattern. The
current code already calls
hv_vcpus_exit()on a stored handle(
g_timeout_vcpu), proving the handle is usable off-thread; this change onlymoves the call to a dedicated thread.
Work items
1. Signal-mask discipline. Block SIGUSR2 (and SIGALRM, see item 2) on
every vCPU thread and the main thread via
pthread_sigmask, leaving themunblocked only on the dedicated sigwait thread. Establish the mask before
any vCPU thread is created, at every thread-creation site (bootstrap,
CLONE_THREADworkers, fork-child bring-up), and re-establish it acrossfork/execso children do not inherit a stale disposition. A singlemissed site silently reintroduces UNKNOWN.
2. Move SIGALRM onto the same path. The "any UNKNOWN is abnormal"
invariant only holds if all signal-driven
hv_vcpus_exitcalls leave thevCPU thread. Re-home the per-iteration timeout: either have the helper
thread own the timeout (e.g.
sigtimedwait/timer +hv_vcpus_exit) orreplace
alarm()with a mechanism that does not deliver SIGALRM to thevCPU thread. Preserve the existing
g_timed_out→CRASH_TIMEOUT(exit 124) semantics and the guest
ITIMER_REALemulation that currentlyshares
alarm().3. Live-vCPU registry. Replace the single
g_timeout_vcpuglobal with aper-process registry of all live vCPU handles (multi-threaded guests run
worker vCPUs, each on its own thread —
g_timeout_vcpuis currentlylast-writer-wins). The helper thread kicks the correct set on a transport
event;
hv_vcpus_exit()accepts a vCPU array, so a single call can exitall of them and let the signal/queue machinery sort out delivery. Register
on vCPU create, unregister on thread exit, guard with a lock.
4. Empirically verify CANCELED, not UNKNOWN. Confirm under stress that
helper-thread
hv_vcpus_exit()against an actively-running vCPU yieldsCANCELEDwith zero UNKNOWN across many iterations (single- andmulti-threaded guests, and the cross-process fork case). The CANCELED-vs-
UNKNOWN split is supported by the code's own comments, but Apple HVF has
quirks — validate before making UNKNOWN fatal.
Acceptance criteria
HV_EXIT_REASON_UNKNOWNas fatal(
crash_report(CRASH_UNEXPECTED_EXIT)), matching the original pre-Fix cross-process signal delivery to EL0-preempted guests #76elsebranch.
test-forkpasses 100% over a large batch (e.g. 200+ runs); noelfuse --fork-childorphans left behind.test-signal,test-signal-thread,test-mt-fork, and the timeout=0validation suite stay green.
HV_EXIT_REASON_UNKNOWNexits during heavycross-process signalling.
Risks / open questions
missed site is silent (reintroduces UNKNOWN) — mitigated by item 4's stress
check.
fork model (the helper must be re-created, not inherited, in each child).
hv_vcpus_exitreliably interrupt a vCPU blocked in ahost syscall issued from inside the HVC handler (e.g.
nanosleep), or onlyone executing guest code? Confirm the cross-process wake still works for a
child parked in a blocking host syscall.
Out of scope
change
signal_deliver./tmp/elfuse-sig-<pid>file transport itself.