Route vCPU preemption signals through a dedicated sigwait thread to eliminate `HV_EXIT_REASON_UNKNOWN`

## Background

elfuse interrupts a running vCPU (for the cross-process guest-signal transport
and for the per-iteration safety timeout) by sending a host signal to the vCPU
thread, whose handler calls `hv_vcpus_exit()`:

- **SIGUSR2** — cross-process guest-signal doorbell. `proc_send_guest_signal()`
  (`src/syscall/proc.c:358`) writes the guest signum to `/tmp/elfuse-sig-<pid>`
  and sends host SIGUSR2; the receiver's handler
  `guest_signal_transport_handler()` (`proc.c:965`) sets `g_external_guest_signal`
  and calls `hv_vcpus_exit(&g_timeout_vcpu, 1)`.
- **SIGALRM** — per-iteration timeout. `alarm_handler()` sets `g_timed_out` and
  calls `hv_vcpus_exit(&g_timeout_vcpu, 1)`.

Because the signal is delivered **to the vCPU thread while it is inside
`hv_vcpu_run`**, Apple HVF aborts the run with `HV_EXIT_REASON_UNKNOWN` (0x3)
instead of the clean `HV_EXIT_REASON_CANCELED` (0) that `hv_vcpus_exit()`
produces for a vCPU caught between runs. The run loop must therefore treat
UNKNOWN as a possible cancellation, which is ambiguous: a genuine hypervisor
fault could in principle also surface as UNKNOWN, so blindly resuming risks a
silent spin (raised by a cubic review on #76).

#76 (`fix-cross-process-signal-el0`) carries the EL0-preemption delivery fix and
routes UNKNOWN through the cancellation handler so the already-queued guest
signal is delivered instead of crashing the child. That keeps the ambiguity: the
run loop cannot tell our own preemption from a genuine fault, so it errs toward
resuming. This issue removes the ambiguity at its source rather than reasoning
about it after the fact.

Note: a real `hv_vcpu_run` API failure (non-`HV_SUCCESS` return) is *already*
caught by `HV_CHECK_CTX` (`proc.c:1113`) and crashes immediately. The only thing
that reaches the UNKNOWN branch is `HV_SUCCESS` + `exit_reason == UNKNOWN`,
i.e. our own `hv_vcpus_exit` landing mid-execution.

## Goal

Deliver every self-directed `hv_vcpus_exit` from a thread **other than** the
vCPU thread, so `hv_vcpu_run` always returns a clean `CANCELED`. Once no
legitimate path can produce UNKNOWN, the run loop can treat **any** UNKNOWN as a
hard hypervisor failure and crash with diagnostics — no heuristic needed.

### Why this is not blocked by an HVF constraint

The relevant HVF rules are:

- a **VM** (`hv_vm_create`) is per **process** (one per `elfuse --fork-child`);
- a **vCPU** is bound to the **thread** that created it (`hv_vcpu_run` must run on
  that thread);
- **`hv_vcpus_exit()` is explicitly designed to be called from another thread**
  to force a VMEXIT.

The helper thread lives in the *same* host process as the vCPU thread, so there
is no cross-process issue — this is the idiomatic HVF watchdog pattern. The
current code already calls `hv_vcpus_exit()` on a stored handle
(`g_timeout_vcpu`), proving the handle is usable off-thread; this change only
moves the call to a dedicated thread.

## Work items

- [ ] **1. Signal-mask discipline.** Block SIGUSR2 (and SIGALRM, see item 2) on
      every vCPU thread and the main thread via `pthread_sigmask`, leaving them
      unblocked only on the dedicated sigwait thread. Establish the mask before
      any vCPU thread is created, at every thread-creation site (bootstrap,
      `CLONE_THREAD` workers, fork-child bring-up), and re-establish it across
      `fork`/`exec` so children do not inherit a stale disposition. A single
      missed site silently reintroduces UNKNOWN.

- [ ] **2. Move SIGALRM onto the same path.** The "any UNKNOWN is abnormal"
      invariant only holds if *all* signal-driven `hv_vcpus_exit` calls leave the
      vCPU thread. Re-home the per-iteration timeout: either have the helper
      thread own the timeout (e.g. `sigtimedwait`/timer + `hv_vcpus_exit`) or
      replace `alarm()` with a mechanism that does not deliver SIGALRM to the
      vCPU thread. Preserve the existing `g_timed_out` → `CRASH_TIMEOUT`
      (exit 124) semantics and the guest `ITIMER_REAL` emulation that currently
      shares `alarm()`.

- [ ] **3. Live-vCPU registry.** Replace the single `g_timeout_vcpu` global with a
      per-process registry of all live vCPU handles (multi-threaded guests run
      worker vCPUs, each on its own thread — `g_timeout_vcpu` is currently
      last-writer-wins). The helper thread kicks the correct set on a transport
      event; `hv_vcpus_exit()` accepts a vCPU array, so a single call can exit
      all of them and let the signal/queue machinery sort out delivery. Register
      on vCPU create, unregister on thread exit, guard with a lock.

- [ ] **4. Empirically verify CANCELED, not UNKNOWN.** Confirm under stress that
      helper-thread `hv_vcpus_exit()` against an actively-running vCPU yields
      `CANCELED` with zero UNKNOWN across many iterations (single- and
      multi-threaded guests, and the cross-process fork case). The CANCELED-vs-
      UNKNOWN split is supported by the code's own comments, but Apple HVF has
      quirks — validate before making UNKNOWN fatal.

## Acceptance criteria

- With items 1–3 landed, the run loop treats `HV_EXIT_REASON_UNKNOWN` as fatal
  (`crash_report(CRASH_UNEXPECTED_EXIT)`), matching the original pre-#76 `else`
  branch.
- `test-fork` passes 100% over a large batch (e.g. 200+ runs); no
  `elfuse --fork-child` orphans left behind.
- `test-signal`, `test-signal-thread`, `test-mt-fork`, and the timeout=0
  validation suite stay green.
- A stress harness records 0 `HV_EXIT_REASON_UNKNOWN` exits during heavy
  cross-process signalling.

## Risks / open questions

- Signal-mask plumbing touches every thread-creation and fork/exec path; a
  missed site is silent (reintroduces UNKNOWN) — mitigated by item 4's stress
  check.
- Extra thread per guest process: small overhead, but interacts with the
  fork model (the helper must be re-created, not inherited, in each child).
- Does the helper-thread `hv_vcpus_exit` reliably interrupt a vCPU blocked in a
  host syscall issued from inside the HVC handler (e.g. `nanosleep`), or only
  one executing guest code? Confirm the cross-process wake still works for a
  child parked in a blocking host syscall.

## Out of scope

- The EL0-preemption signal-delivery fix (already on #76). This issue does not
  change `signal_deliver`.
- Any change to the `/tmp/elfuse-sig-<pid>` file transport itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Route vCPU preemption signals through a dedicated sigwait thread to eliminate `HV_EXIT_REASON_UNKNOWN` #77

Background

Goal

Why this is not blocked by an HVF constraint

Work items

Acceptance criteria

Risks / open questions

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Route vCPU preemption signals through a dedicated sigwait thread to eliminate HV_EXIT_REASON_UNKNOWN #77

Description

Background

Goal

Why this is not blocked by an HVF constraint

Work items

Acceptance criteria

Risks / open questions

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Route vCPU preemption signals through a dedicated sigwait thread to eliminate `HV_EXIT_REASON_UNKNOWN` #77