From 46c877c27fcdd7a52d3da189564d74df4d68ccd2 Mon Sep 17 00:00:00 2001 From: Joseph Albert Nefario Date: Sat, 23 May 2026 13:26:28 +0300 Subject: [PATCH] tests: VM mode for kernel cells (aircrack-ng/rtl8812au on pinned kernel) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a libvirt-VM execution mode so the kernel-side cells of the regression matrix can run against the aircrack-ng/rtl8812au out-of-tree driver on a pinned kernel — sidestepping the OOT-driver-vs-modern-kernel breakage that kept the matrix from validating chipsets like RTL8814AU on hosts running recent kernels. Why VM: aircrack-ng/rtl8812au lags kernel API changes by 6-12 months (timer_*, cfg80211 callback signatures with MLO link_id, etc.). On kernel 6.15+ the OOT driver needs hand-patching to build, and morrownr/USB-WiFi flags that mainline rtw88 is the recommended path from 6.14 onwards. But mainline rtw88_8814au currently fails to probe the RTL8814AU on this lab's adapter (`failed to download firmware`, `error -22`) — so for 8814 specifically, the OOT driver is the only working kernel path. Pinning a VM to Ubuntu 22.04 LTS (kernel 5.15) gives us a stable platform where aircrack-ng's driver builds and loads cleanly without ongoing patching, regardless of how the host kernel evolves. Three pieces: 1. `tests/setup_vm.sh` — one-shot VM provisioner. Clones the local `jammy-base.qcow2` cloud image, generates a cloud-init seed (creates `dima` user with the caller's SSH key, NOPASSWD sudo, installs build-essential / dkms / linux-headers / iw / tcpdump / python3-scapy / aircrack-ng), `virt-install`s with a qemu-xhci USB controller for hot-plug, then runs `make dkms_install` of aircrack-ng/rtl8812au inside via cloud-init runcmd. ~5-10 min end to end. `--teardown` and `--status` subcommands included. 2. `tests/regress.py` refactor: introduces a `KernelHost` abstraction that owns every kernel-side operation (modprobe / sysfs reads / iw / tcpdump / scapy inject). Local mode = `subprocess.run`. VM mode = `ssh ... sudo` wrappers + `virsh attach-device` / `detach-device` for per-cell USB passthrough. New CLI flags `--vm-name` / `--vm-ssh` (env: DEVOURER_VM_NAME, DEVOURER_VM_SSH). When invoked under `sudo`, the script picks up SUDO_USER's SSH key so it can reach the VM without root having its own key provisioned. 3. Per-cell DUT routing: each cell calls `_ensure_dut_location` for each DUT, which (in VM mode) transitions the DUT between host and VM via virsh as needed. State always restored to "both DUTs on host" between cells via try/finally so a crashed cell doesn't poison the next one. Script start also has a `release_all_known_duts` pass to clean up any leftover-attached DUT from a previous aborted run. Validation on trainer-arch (Arch host kernel 6.18, VM Ubuntu 22.04 kernel 5.15, 8812AU + 8814AU on USB hub): | | TX = devourer | TX = kernel | |---|---|---| | RX = devourer | 0 hits / 4500 TX ✗ | 0 hits / 258 TX ✗ | | RX = kernel | 4172 hits / 4500 TX ✓ | 229 hits / 259 TX ✓ | - baseline ✓ (kernel-TX 8812 → kernel-RX 8814 inside VM, ~88% delivery) - devourer-TX validation ✓ (devourer-TX 8812 on host → kernel-RX 8814 in VM, ~93% delivery — confirms devourer's 8812AU TX really emits valid frames at the wire level) - the two failing cells are pre-existing devourer 8814 RX TODO, not regressions For comparison: the same hardware in local mode (PR #32 first run) got 1 hit on the devourer-TX→kernel-RX cell because mainline rtw88_8814au couldn't probe the chip. The VM with aircrack-ng gives ~4000x the signal. A few smaller fixes folded in: - TX-count parser surfaces "Failed to send packet" failure count separately from the rate-limited `` print count (previously a misleading low number when sends were failing) - `--no-baseline-abort` flag for partial-rig diagnostics - `wait_for_wlan_iface` timeout bumped to 20s (kernel rebinds + VM passthrough enumeration can take 10s+) - Kernel-TX cells `wait()` for inject_beacon to self-terminate instead of killing the ssh wrapper — captures the final "sent N frames" line (previously TX count showed 0 even though RX side received the frames) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 (1M context) --- tests/README.md | 187 +++++++---- tests/regress.py | 793 +++++++++++++++++++++++++++++----------------- tests/setup_vm.sh | 189 +++++++++++ 3 files changed, 820 insertions(+), 349 deletions(-) create mode 100755 tests/setup_vm.sh diff --git a/tests/README.md b/tests/README.md index cc45669..6b421ee 100644 --- a/tests/README.md +++ b/tests/README.md @@ -1,8 +1,8 @@ # devourer regression test rig Cross-driver matrix test that compares this project's userspace stack -against the kernel-driver baseline (aircrack-ng / mainline `rtw88`) for -both TX and RX on plugged-in USB Wi-Fi adapters. +against the kernel-driver baseline for both TX and RX on plugged-in USB +Wi-Fi adapters. ``` TX = devourer TX = kernel @@ -13,95 +13,154 @@ RX = kernel does dvr emit valid baseline / rig sanity check Each cell injects/receives the canonical beacon (SA `57:42:75:05:d6:00`, matching `txdemo/main.cpp`) for `--duration` seconds and counts hits. -The baseline cell runs first — if it fails the rig itself is broken -(channel busy, antennas, kernel driver mismatch) and the remaining cells -are skipped. +The baseline cell runs first — if it fails the rig itself is broken and +the remaining cells are skipped (override with `--no-baseline-abort`). -## Prerequisites +## Two run modes -- 2 supported USB Wi-Fi adapters plugged into the same host -- devourer built (`build/WiFiDriverDemo`, `build/WiFiDriverTxDemo`) -- Kernel driver(s) for the adapter(s) installed and `modprobe`-able - (rtl8812au/rtl8814au from aircrack-ng or your distro's `rtw88` for - mainline). The script doesn't care which — it queries sysfs for whatever - is bound. -- Python 3.9+ with `scapy` available (`pip install scapy` or your distro's - `python3-scapy`) -- `iw`, `tcpdump`, `ip` on PATH -- Passwordless `sudo`, or run the script directly as root -- NetworkManager users: stop NM for the duration of the test, or - `nmcli device set managed no` on the test interfaces before - running. (NM will fight you for the monitor-mode wlan iface otherwise.) +### Local mode -The script does a preflight check and prints distro-agnostic install -hints for anything missing. - -## Usage +The kernel-side cells run against whatever driver is bound to the DUTs on +the **host** (mainline `rtw88_*` or whatever's loaded). Cheap to set up +but limited to drivers that build cleanly against the host kernel — that's +a moving target as kernels evolve, especially for the out-of-tree +`aircrack-ng/rtl8812au` driver. ```bash sudo python3 tests/regress.py --channel 100 ``` -Auto-detects the first two supported adapters via sysfs. To pick -specific ones: +### VM mode (recommended) + +The kernel-side cells run inside a **pinned-kernel libvirt VM** that has +the OOT `aircrack-ng/rtl8812au` driver built and loaded. DUTs are +transferred between host and VM per cell via `virsh attach-device` / +`detach-device`. The VM's kernel never moves so the driver never breaks. + +Provision the VM once with the included script (Ubuntu 22.04 LTS, +kernel 5.15 — where aircrack-ng's driver builds without patches): ```bash -sudo python3 tests/regress.py \ - --tx-pid 0x8812 --rx-pid 0x8813 --channel 100 --duration 20 +sudo tests/setup_vm.sh # provision; ~5-10 min +sudo tests/setup_vm.sh --status # show VM IP, ssh hint ``` -Output is a markdown table printed to stdout — paste into PR comments -or save with `tee`: +Then run the matrix in VM mode: + +```bash +sudo python3 tests/regress.py --channel 100 \ + --vm-name devourer-testrig \ + --vm-ssh dima@ +``` + +VM mode is what unblocks chipsets where the host kernel driver doesn't +work — e.g. RTL8814AU, where mainline `rtw88_8814au` currently fails to +probe on kernels 6.15+ (`failed to download firmware`, `error -22`), but +`aircrack-ng/rtl8812au` claims it cleanly on the pinned kernel 5.15. + +## Prerequisites + +### On the host (both modes) + +- 2 supported USB Wi-Fi adapters plugged in +- devourer built (`build/WiFiDriverDemo`, `build/WiFiDriverTxDemo`) +- Python 3.9+ with `scapy` (`pip install scapy` or `python3-scapy`) +- `iw`, `tcpdump`, `ip` on PATH +- Passwordless `sudo`, or run directly as root + +### For local mode + +- Kernel driver(s) installed and `modprobe`-able for your DUTs (rtw88 or + aircrack-ng — script auto-detects whatever's bound via sysfs) +- NetworkManager users: stop NM, or `nmcli device set managed no` + on the test interfaces + +### For VM mode (in addition) + +- `libvirtd` + `virsh` + `virt-install` on the host +- `xorriso` (for the cloud-init seed ISO that `setup_vm.sh` generates) +- An Ubuntu 22.04 cloud image at `/var/lib/libvirt/images/jammy-base.qcow2` + (download from ) +- Working USB hot-plug on libvirt (`xhci` controller; `setup_vm.sh` adds it) +- The host user's SSH key in `~/.ssh/id_rsa.pub` (or set `SSH_PUBKEY=...` + before `setup_vm.sh`) — gets baked into the VM's `dima` user + +The script does a preflight check and prints distro-agnostic install +hints for anything missing. + +## Output + +Markdown table to stdout, ready to paste into PR comments: ``` -## Regression matrix — channel 100, 2026-05-23 12:34:56 +## Regression matrix — channel 100, 2026-05-23 13:22:14 - TX adapter: `0bda:8812` (RTL8812AU) - RX adapter: `0bda:8813` (RTL8814AU) -- Cell duration: 15s -- Pass threshold: ≥ 5 hits +- Kernel host: VM devourer-testrig via dima@10.216.129.126 +- Cell duration: 10s +- Pass threshold: ≥ 3 hits | | TX = devourer | TX = kernel | |---|---|---| -| RX = devourer | 42 hits / 7500 TX / 15s ✓ | 35 hits / 7500 TX / 15s ✓ | -| RX = kernel | 31 hits / 7500 TX / 15s ✓ | 47 hits / 7500 TX / 15s ✓ | +| RX = devourer | 0 hits / 4500 TX ✗ | 0 hits / 258 TX ✗ | +| RX = kernel | 4172 hits / 4500 TX ✓ | 229 hits / 259 TX ✓ | ``` -For debugging a specific cell that failed, re-run with `--keep-logs` — -per-cell stdout/stderr logs are symlinked at -`/tmp/devourer-regress-last/`. - -## Supported adapters +Pass/fail per cell on hit-count threshold (default ≥ 1 — generous because +air interference makes absolute counts unreliable). Bump for higher- +confidence runs on a quieter channel. -Listed in `SUPPORTED_DUTS` at the top of `regress.py`. Extend the dict -to add new chipsets — the rest of the script is chipset-agnostic. +For debugging a specific cell that failed, re-run with `--keep-logs` — +per-cell stdout/stderr logs end up at `/tmp/devourer-regress-last/`. -## Channel selection +## CLI knobs -The default `--channel 36` is a 5GHz channel that's typically quiet, -which means hit counts will be low but stable. For high-confidence -runs, pick a channel where your nearest AP is actively transmitting -(check via `iw dev wlan0 scan | grep -E "freq|SSID"` on a separate -device). +- `--channel N` — Wi-Fi channel for both adapters (default 36; pick the + channel your nearest AP is on for guaranteed traffic) +- `--duration SECONDS` — per-cell injection/measurement window (default 15) +- `--pass-threshold N` — min hits to pass (default 1) +- `--tx-pid 0xNNNN` / `--rx-pid 0xNNNN` — pick specific DUTs (defaults to + the first two auto-detected) +- `--no-baseline-abort` — run all 4 cells even if kernel-kernel fails + (useful when one chipset has no working kernel driver on this rig) +- `--vm-name NAME` / `--vm-ssh USER@HOST` — enter VM mode +- `--keep-logs` — symlink the temp log dir at `/tmp/devourer-regress-last` -## VM-readiness +Environment variable equivalents: `DEVOURER_VM_NAME`, `DEVOURER_VM_SSH`. -The kernel-cell shell-outs go through `run_kernel_cmd()` in `regress.py`. -Today it's `subprocess.run` (local). To migrate the kernel side into a -pinned-kernel VM — recommended once host-kernel upgrades start breaking -the out-of-tree aircrack-ng driver — replace `run_kernel_cmd` with an -`ssh user@trainer-vm sudo` wrapper and arrange USB hot-plug passthrough -into the VM via libvirt (`virsh attach-device` with a `` USB -spec). The matrix orchestrator doesn't need to change. +## Supported DUTs -## Known gaps +Listed in `SUPPORTED_DUTS` at the top of `regress.py`. Extend the dict +to add new chipsets — the rest of the script is chipset-agnostic. -- Tests "signal of life", not throughput. Hit counts vary 5-20× run-over- - run depending on ambient RF — thresholds are deliberately generous. -- Per-cell startup time is ~10s (devourer fwdl + warmup). 4 cells × ~25s - ≈ 100s per matrix run. Fine for manual runs, would be annoying for CI. -- No support yet for >2 adapters. To extend, add a pairing loop in +## Architecture notes + +- All kernel-side operations (modprobe / sysfs reads / `iw` / `tcpdump` / + scapy) go through one abstraction (`KernelHost`). Local mode runs them + via `subprocess.run`; VM mode wraps them in `ssh ... sudo`. Adding a + third backend (e.g. remote bare-metal box) is a new class. +- DUT routing in VM mode uses `virsh attach-device` (USB hot-plug). The + matrix moves DUTs between host and VM per cell as needed, restoring all + DUTs to the host on exit so the next cell starts from a clean baseline. +- `inject_beacon.py` is shipped to the VM via `scp` each run (small file) + and exits when its `--duration` elapses — orchestrator waits rather + than killing, so the final "sent N frames" line is captured. + +## Known limitations + +- Tests "signal of life", not throughput — air noise makes absolute + counts unreliable; pass-threshold is deliberately generous. +- Per matrix run: ~100s in local mode, ~3-4 min in VM mode (USB hot-plug + adds ~5s per cell transition). +- Two-adapter scope today. To extend to >2, add a pairing loop in `main()` that runs the 4-cell matrix per chipset pair. -- Kernel TX side uses scapy at 500 fps. If your kernel driver's - injection rate is the bottleneck on a given chip, lower - `--interval` in `inject_beacon.py`. +- VM mode assumes a single libvirt host running both `virsh` (locally) + and the VM. Pulling the VM onto a different host is a `--vm-ssh + user@vmhost` away on the kernel cell side, but `virsh attach-device` + still runs locally; if the VM is on a different host, run virsh there + (via your own wrapper). +- Cell 4 (`devourer-TX → devourer-RX`) requires both DUTs to be on the + host and devourer-claimable simultaneously. Works fine, but means both + chipsets need working devourer RX — if one is RX-broken (e.g. current + RTL8814AU TODO), that cell will always show 0 hits regardless of TX. diff --git a/tests/regress.py b/tests/regress.py index 46c56ac..9eb0a10 100755 --- a/tests/regress.py +++ b/tests/regress.py @@ -3,7 +3,7 @@ Runs a 4-cell test on a host with two compatible USB Wi-Fi adapters, comparing this project's userspace stack ("devourer") against the kernel driver -(aircrack-ng / mainline rtw88) for both TX and RX: +(mainline rtw88 or aircrack-ng/rtl8812au) for both TX and RX: TX = devourer TX = kernel RX = devourer [end-to-end dvr] [does dvr RX kernel-TX frame?] @@ -16,22 +16,33 @@ test duration. The baseline cell is run first; if it fails, the rig itself is broken (interference, channel, antennas) and the matrix is aborted. +Two run modes: + + * Local mode (default): kernel cells run against the host kernel driver, + devourer cells against host libusb. Both sides share the host. Cheap to + set up but limited to whichever drivers build cleanly against the host + kernel — that's a moving target as kernels evolve. + + * VM mode (--vm-name + --vm-ssh): kernel cells run inside a pinned-kernel + libvirt VM that has the OOT aircrack-ng driver built and loaded. The + VM's kernel never moves so the driver never breaks. DUTs are + transferred between host and VM per cell via virsh USB hot-plug. + Provision the VM with `tests/setup_vm.sh`. + Designed to be run manually after building devourer: cd /path/to/devourer && cmake --build build -j + # Local mode: sudo python3 tests/regress.py --channel 100 - -Supports any modern Linux distro: tool paths are resolved via `which`, wlan -interfaces are discovered via `iw dev`, and the kernel driver claiming each -DUT is read from sysfs (no hardcoded module names). NetworkManager users: -either stop NM for the duration, or `nmcli device set managed no` -on the test interfaces before running. - -VM-readiness: the kernel-cell shell-out goes through `run_kernel_cmd()`, -which today executes locally. To migrate the kernel-driver side into a -pinned-kernel VM (recommended once host kernel upgrades start breaking the -aircrack-ng/rtl8812au driver), point KERNEL_CELL_RUNNER at an `ssh` wrapper -and arrange USB passthrough to the VM via libvirt's USB hot-plug. + # VM mode (after tests/setup_vm.sh): + sudo python3 tests/regress.py --channel 100 \\ + --vm-name devourer-testrig --vm-ssh dima@10.216.129.126 + +Portability: tool paths resolved via `which`, wlan interfaces discovered via +`iw dev` (works for systemd `wlp*` and classic `wlan*`), kernel driver +claiming each DUT read from sysfs (no hardcoded module names). NetworkManager +users: stop NM for the duration of the test, or `nmcli device set +managed no` on the test interfaces. """ from __future__ import annotations @@ -40,6 +51,7 @@ import dataclasses import glob import os +import shlex import shutil import signal import subprocess @@ -66,7 +78,7 @@ "0bda:b811": "RTL8811AU", } -# Required external tools. Each entry: (binary, distro-agnostic install hint). +# Required external tools on the host. Each entry: (binary, install hint). REQUIRED_TOOLS = [ ("iw", "your distro's `iw` package"), ("tcpdump", "your distro's `tcpdump` package"), @@ -80,21 +92,252 @@ # --------------------------------------------------------------------------- -# Subprocess helpers — wrap shell-outs with structured output. +# Subprocess helpers. # --------------------------------------------------------------------------- def run(cmd: list[str], **kw) -> subprocess.CompletedProcess: - """Run a command synchronously, capturing output. Raises on non-zero - exit unless check=False is passed in **kw.""" + """Run a command synchronously on the host, capturing output.""" return subprocess.run(cmd, capture_output=True, text=True, **kw) -def run_kernel_cmd(cmd: list[str], **kw) -> subprocess.CompletedProcess: - """Kernel-cell shell-out. Today: same as `run` (local exec). When - migrating the kernel-driver side into a VM, wrap this with ssh / virsh - exec — every kernel-side call goes through here.""" - return run(cmd, **kw) +# --------------------------------------------------------------------------- +# KernelHost — abstracts "the machine that runs the kernel-driver side". +# +# In local mode the kernel host IS the local machine. In VM mode it's a +# libvirt guest reached via SSH. The orchestrator only sees this interface; +# everything kernel-side (modprobe, sysfs reads, iw, tcpdump, scapy) goes +# through it. +# --------------------------------------------------------------------------- + + +@dataclasses.dataclass +class KernelHost: + """One of two flavours. Use KernelHost.local() or KernelHost.via_ssh().""" + + # ssh target like "dima@10.216.129.126". Empty string for local execution. + ssh_target: str = "" + # libvirt domain name for USB passthrough. Empty for local mode (no DUT + # movement needed — DUTs already on the same machine). + vm_name: str = "" + + @classmethod + def local(cls) -> "KernelHost": + return cls(ssh_target="", vm_name="") + + @classmethod + def via_ssh(cls, ssh_target: str, vm_name: str) -> "KernelHost": + return cls(ssh_target=ssh_target, vm_name=vm_name) + + @property + def is_remote(self) -> bool: + return bool(self.ssh_target) + + # -- command execution --------------------------------------------------- + + def _ssh_prefix(self) -> list[str]: + """ssh argv with auth set up. When the script is launched via sudo, + use SUDO_USER's keys (root usually doesn't have keys provisioned + on the VM).""" + args = ["ssh", "-o", "BatchMode=yes", + "-o", "StrictHostKeyChecking=accept-new"] + sudo_user = os.environ.get("SUDO_USER") + if sudo_user: + sudo_home = os.path.expanduser(f"~{sudo_user}") + for keyname in ("id_rsa", "id_ed25519", "id_ecdsa"): + keypath = os.path.join(sudo_home, ".ssh", keyname) + if os.path.exists(keypath): + args += ["-i", keypath] + break + args.append(self.ssh_target) + return args + + def run(self, cmd: list[str], **kw) -> subprocess.CompletedProcess: + """Run a kernel-side command. Local: subprocess. Remote: ssh.""" + if self.is_remote: + wrapped = self._ssh_prefix() + [ + "sudo " + " ".join(shlex.quote(c) for c in cmd) + ] + return run(wrapped, **kw) + return run(cmd, **kw) + + def popen(self, cmd: list[str], **kw) -> subprocess.Popen: + """Spawn a long-running kernel-side process. Returns Popen.""" + if self.is_remote: + wrapped = self._ssh_prefix() + [ + "sudo " + " ".join(shlex.quote(c) for c in cmd) + ] + return subprocess.Popen(wrapped, start_new_session=True, **kw) + return subprocess.Popen(cmd, start_new_session=True, **kw) + + # -- file I/O across hosts ---------------------------------------------- + + def _scp_prefix(self) -> list[str]: + args = ["scp", "-o", "BatchMode=yes", + "-o", "StrictHostKeyChecking=accept-new"] + sudo_user = os.environ.get("SUDO_USER") + if sudo_user: + sudo_home = os.path.expanduser(f"~{sudo_user}") + for keyname in ("id_rsa", "id_ed25519", "id_ecdsa"): + keypath = os.path.join(sudo_home, ".ssh", keyname) + if os.path.exists(keypath): + args += ["-i", keypath] + break + return args + + def push_file(self, local_path: Path, remote_path: str) -> None: + """Copy a local file onto the kernel host. No-op in local mode.""" + if not self.is_remote: + return + scp_cmd = self._scp_prefix() + [ + str(local_path), f"{self.ssh_target}:{remote_path}", + ] + r = run(scp_cmd) + if r.returncode != 0: + raise RuntimeError(f"scp failed: {r.stderr.strip()}") + + def fetch_file(self, remote_path: str, local_path: Path) -> None: + """Copy a remote file back to local. No-op in local mode (same file).""" + if not self.is_remote: + return + scp_cmd = self._scp_prefix() + [ + f"{self.ssh_target}:{remote_path}", str(local_path), + ] + r = run(scp_cmd) + if r.returncode != 0: + local_path.write_text("") + + # -- DUT routing (VM passthrough) --------------------------------------- + + def _dut_already_attached(self, dut: "Dut") -> bool: + """True if dut is currently passed through to this VM.""" + r = run(["sudo", "virsh", "dumpxml", self.vm_name]) + if r.returncode != 0: + return False + want_v = f"" + want_p = f"" + return want_v in r.stdout and want_p in r.stdout + + def take_dut(self, dut: "Dut") -> None: + """Bring `dut` to this kernel host. Local: no-op (just unbind from + host kernel driver if any). VM: idempotent virsh attach-device.""" + if not self.is_remote: + return + if self._dut_already_attached(dut): + return + # Make sure the host kernel isn't holding it. + detach_from_host_kernel(dut) + xml = ( + "" + f"" + "" + ) + with tempfile.NamedTemporaryFile("w", suffix=".xml", delete=False) as f: + f.write(xml) + xml_path = f.name + try: + r = run(["sudo", "virsh", "attach-device", self.vm_name, + xml_path, "--live"]) + if r.returncode != 0: + raise RuntimeError( + f"virsh attach-device {dut.vidpid} → {self.vm_name} " + f"failed: {(r.stderr or r.stdout).strip()}" + ) + finally: + os.unlink(xml_path) + + def release_all_known_duts(self, duts: list["Dut"]) -> None: + """At script start, ensure no DUTs are leftover-attached to the VM + from a previous run. Called once before the matrix.""" + if not self.is_remote: + return + for dut in duts: + if self._dut_already_attached(dut): + self.release_dut(dut) + time.sleep(2.0) + + def release_dut(self, dut: "Dut") -> None: + """Send `dut` back to the host. Local: no-op. VM: virsh detach.""" + if not self.is_remote: + return + xml = ( + "" + f"" + "" + ) + with tempfile.NamedTemporaryFile("w", suffix=".xml", delete=False) as f: + f.write(xml) + xml_path = f.name + try: + run(["sudo", "virsh", "detach-device", self.vm_name, + xml_path, "--live"]) + # Don't raise on detach errors — device may already be gone. + finally: + os.unlink(xml_path) + + # -- wlan iface discovery ------------------------------------------------ + + def wait_for_wlan_iface(self, dut: "Dut", timeout: float = 20.0) -> str: + """Block until a wlan iface for `dut` appears on the kernel host.""" + end = time.monotonic() + timeout + while time.monotonic() < end: + iface = self._wlan_iface_for_dut(dut) + if iface: + return iface + time.sleep(0.5) + raise RuntimeError( + f"no wlan iface appeared for {dut.vidpid} on " + f"{self.ssh_target or 'local'} after {timeout}s — " + f"kernel driver may have failed to bind" + ) + + def _wlan_iface_for_dut(self, dut: "Dut") -> Optional[str]: + if not self.is_remote: + # Local mode: walk /sys/bus/usb/devices for the DUT, then + # look at /net/ for a wlan name. + for d in glob.glob("/sys/bus/usb/devices/*"): + try: + with open(f"{d}/idVendor") as f: + if f.read().strip() != dut.vid: + continue + with open(f"{d}/idProduct") as f: + if f.read().strip() != dut.pid: + continue + except (FileNotFoundError, PermissionError): + continue + net_dir = f"{d}:1.0/net" + if not os.path.isdir(net_dir): + return None + ifaces = os.listdir(net_dir) + return ifaces[0] if ifaces else None + return None + # Remote: ssh and iterate /sys/bus/usb/devices/ over there. + r = self.run([ + "sh", "-c", + "for d in /sys/bus/usb/devices/*; do " + " [ -e \"$d/idVendor\" ] || continue; " + f" [ \"$(cat $d/idVendor)\" = \"{dut.vid}\" ] || continue; " + f" [ \"$(cat $d/idProduct)\" = \"{dut.pid}\" ] || continue; " + " ls \"$d:1.0/net\" 2>/dev/null | head -1; " + " break; " + "done", + ]) + out = (r.stdout or "").strip() + return out or None + + def iface_to_monitor(self, iface: str, channel: int) -> None: + """Put a wlan iface into monitor mode on a channel.""" + def _go(cmd): + r = self.run(cmd) + if r.returncode != 0: + raise RuntimeError( + f"{' '.join(cmd)} exit={r.returncode}: " + f"{(r.stderr or r.stdout).strip() or '(no stderr)'}" + ) + _go(["ip", "link", "set", iface, "down"]) + _go(["iw", "dev", iface, "set", "type", "monitor"]) + _go(["ip", "link", "set", iface, "up"]) + _go(["iw", "dev", iface, "set", "channel", str(channel)]) # --------------------------------------------------------------------------- @@ -102,10 +345,8 @@ def run_kernel_cmd(cmd: list[str], **kw) -> subprocess.CompletedProcess: # --------------------------------------------------------------------------- -def preflight(devourer_root: Path) -> None: - """Bail early if anything required is missing. Better to fail with one - actionable message than to crash 4 cells in with a cryptic Python - traceback.""" +def preflight(devourer_root: Path, kh: KernelHost) -> None: + """Bail early if anything required is missing.""" missing = [] for tool, hint in REQUIRED_TOOLS: if shutil.which(tool) is None: @@ -114,7 +355,8 @@ def preflight(devourer_root: Path) -> None: try: __import__(mod) except ImportError: - missing.append(f" - python module `{mod}` not importable (install: {hint})") + missing.append(f" - python module `{mod}` not importable " + f"(install: {hint})") for binary in ("WiFiDriverDemo", "WiFiDriverTxDemo"): if not (devourer_root / "build" / binary).is_file(): missing.append( @@ -124,24 +366,46 @@ def preflight(devourer_root: Path) -> None: ) if os.geteuid() != 0: missing.append( - " - this script needs root (modprobe / iw / tcpdump / sysfs writes). " + " - this script needs root on the host (modprobe / sysfs writes). " "Re-run with `sudo`." ) + if kh.is_remote: + # Sanity check ssh + sudo on the VM works. + r = kh.run(["true"]) + if r.returncode != 0: + missing.append( + f" - kernel host `{kh.ssh_target}` not reachable via " + f"`ssh ... sudo true`: {(r.stderr or r.stdout).strip()}" + ) + else: + for tool in ("iw", "tcpdump", "ip", "modprobe", "python3"): + r = kh.run(["which", tool]) + if r.returncode != 0: + missing.append( + f" - kernel-host `{kh.ssh_target}` is missing " + f"`{tool}` — install inside the VM." + ) + # virsh on local host (we drive virsh from here). + if shutil.which("virsh") is None: + missing.append( + " - VM mode needs `virsh` on the local host (libvirt-clients)." + ) if missing: sys.stderr.write("Prerequisites not met:\n" + "\n".join(missing) + "\n") sys.exit(2) # --------------------------------------------------------------------------- -# DUT discovery — find plugged-in adapters via sysfs. +# DUT discovery — find plugged-in adapters via sysfs (host side only; +# the VM sees DUTs only when we explicitly hand them over via virsh). # --------------------------------------------------------------------------- @dataclasses.dataclass class Dut: - sysfs_id: str # e.g. "1-14" (the USB device path component) - vid: str # e.g. "0bda" - pid: str # e.g. "8812" + sysfs_id: str # e.g. "1-14" (the USB device path component on host) + vid: str # lowercase hex, e.g. "0bda" + pid: str # lowercase hex, e.g. "8812" chipset: str # human-readable @property @@ -177,103 +441,47 @@ def discover_duts() -> list[Dut]: return duts -def kernel_driver_for_dut(dut: Dut) -> Optional[str]: - """Return the kernel driver name currently bound to dut, or None if - nothing is bound.""" +def host_kernel_driver_for_dut(dut: Dut) -> Optional[str]: + """Return the kernel driver currently bound to dut on the HOST. None if + nothing bound (e.g. because the DUT is currently passed through to a VM).""" link = f"/sys/bus/usb/devices/{dut.iface_id}/driver" if not os.path.islink(link): return None return os.path.basename(os.readlink(link)) -def wlan_iface_for_dut(dut: Dut) -> Optional[str]: - """If the kernel driver bound to dut has surfaced a net interface, return - its name (e.g. 'wlp0s20f0u14' or 'wlan0'). Otherwise None.""" - net_dir = f"/sys/bus/usb/devices/{dut.iface_id}/net" - if not os.path.isdir(net_dir): - return None - ifaces = os.listdir(net_dir) - return ifaces[0] if ifaces else None - - -# --------------------------------------------------------------------------- -# Driver-state orchestration — bind / unbind a DUT to its kernel driver. -# --------------------------------------------------------------------------- - - -def detach_from_kernel(dut: Dut) -> None: - """Unbind dut from whatever kernel driver claims it, so devourer can - claim_interface(). If nothing claims it, no-op.""" - drv = kernel_driver_for_dut(dut) +def detach_from_host_kernel(dut: Dut) -> None: + """Unbind dut from whatever host kernel driver claims it. No-op if + nothing claims it.""" + drv = host_kernel_driver_for_dut(dut) if drv is None: return try: with open(f"/sys/bus/usb/drivers/{drv}/unbind", "w") as f: f.write(dut.iface_id) except OSError as e: - sys.stderr.write(f"detach_from_kernel({dut.iface_id}): {e}\n") + sys.stderr.write(f"detach_from_host_kernel({dut.iface_id}): {e}\n") -def attach_to_kernel(dut: Dut) -> None: - """Re-bind dut to its kernel driver. Caller is responsible for ensuring - the right module is modprobe'd first.""" - # The kernel picks the right driver automatically based on the device's - # modalias when we write to .../drivers_probe. +def attach_to_host_kernel(dut: Dut) -> None: + """Re-bind dut to whatever host kernel driver matches it (via modalias).""" try: with open("/sys/bus/usb/drivers_probe", "w") as f: f.write(dut.iface_id) except OSError as e: - sys.stderr.write(f"attach_to_kernel({dut.iface_id}): {e}\n") - - -def wait_for_wlan_iface(dut: Dut, timeout: float = 15.0) -> str: - """Block until the kernel driver surfaces a wlan iface for dut. - - 15s default accommodates the typical fwdl + association timeline for - rtw88 / rtl8812au drivers; the iface usually appears in 2-4s but the - out-of-tree aircrack-ng driver can take up to 10s on first probe.""" - end = time.monotonic() + timeout - while time.monotonic() < end: - iface = wlan_iface_for_dut(dut) - if iface is not None: - return iface - time.sleep(0.25) - raise RuntimeError( - f"no wlan iface appeared for {dut.vidpid} after {timeout}s — kernel " - f"driver may have failed to bind (check `dmesg | tail -20` for " - f"firmware-download or probe errors)" - ) - - -def kernel_iface_to_monitor(iface: str, channel: int) -> None: - """Put a kernel wlan iface into monitor mode on the given channel. - Idempotent — safe to call repeatedly. Surfaces command stderr on failure - so 'permission denied' / 'No such device' / 'Operation not supported' - error paths are actionable.""" - def _run(cmd): - r = run_kernel_cmd(cmd) - if r.returncode != 0: - raise RuntimeError( - f"{' '.join(cmd)} exit={r.returncode}: " - f"{(r.stderr or r.stdout).strip() or '(no stderr)'}" - ) - _run(["ip", "link", "set", iface, "down"]) - _run(["iw", "dev", iface, "set", "type", "monitor"]) - _run(["ip", "link", "set", iface, "up"]) - # set channel requires the iface to be up. - _run(["iw", "dev", iface, "set", "channel", str(channel)]) + sys.stderr.write(f"attach_to_host_kernel({dut.iface_id}): {e}\n") # --------------------------------------------------------------------------- -# Cell implementations — each returns (hits, total_attempts). +# Cell results + scoring. # --------------------------------------------------------------------------- @dataclasses.dataclass class CellResult: - hits: int # frames matching CANONICAL_SA observed by the RX side - tx_attempts: int # frames the TX side reports having submitted - tx_failures: int # devourer-only: # of "Failed to send packet" errors + hits: int + tx_attempts: int + tx_failures: int duration_s: float notes: str = "" @@ -288,85 +496,81 @@ def fmt(self, threshold: int) -> str: return f"{self.hits} hits / {attempts_str} / {self.duration_s:.0f}s {mark}" -def _spawn_devourer_rx( - devourer_root: Path, dut: Dut, channel: int, log_path: Path -) -> subprocess.Popen: +# --------------------------------------------------------------------------- +# Process spawners — devourer cells (always local) + kernel cells (via kh). +# --------------------------------------------------------------------------- + + +def _devourer_env(dut: Dut, channel: int) -> dict[str, str]: env = os.environ.copy() env["DEVOURER_VID"] = f"0x{dut.vid}" env["DEVOURER_PID"] = f"0x{dut.pid}" env["DEVOURER_CHANNEL"] = str(channel) env["DEVOURER_USB_QUIET"] = "1" + return env + + +def _spawn_devourer_rx( + devourer_root: Path, dut: Dut, channel: int, log_path: Path +) -> subprocess.Popen: fh = open(log_path, "w") return subprocess.Popen( [str(devourer_root / "build" / "WiFiDriverDemo")], - env=env, - stdout=fh, - stderr=subprocess.STDOUT, - start_new_session=True, + env=_devourer_env(dut, channel), + stdout=fh, stderr=subprocess.STDOUT, start_new_session=True, ) def _spawn_devourer_tx( devourer_root: Path, dut: Dut, channel: int, log_path: Path ) -> subprocess.Popen: - env = os.environ.copy() - env["DEVOURER_VID"] = f"0x{dut.vid}" - env["DEVOURER_PID"] = f"0x{dut.pid}" - env["DEVOURER_CHANNEL"] = str(channel) - env["DEVOURER_USB_QUIET"] = "1" fh = open(log_path, "w") return subprocess.Popen( [str(devourer_root / "build" / "WiFiDriverTxDemo")], - env=env, - stdout=fh, - stderr=subprocess.STDOUT, - start_new_session=True, + env=_devourer_env(dut, channel), + stdout=fh, stderr=subprocess.STDOUT, start_new_session=True, ) def _spawn_kernel_rx( - iface: str, channel: int, log_path: Path + kh: KernelHost, iface: str, channel: int, log_path: Path ) -> subprocess.Popen: - kernel_iface_to_monitor(iface, channel) - # tcpdump -e shows the link-layer header (so we see SA); filter by SA. + """RX side: tcpdump on the kernel host's wlan iface, filter on + CANONICAL_SA. Local mode writes to `log_path` directly; VM mode writes + to /tmp/dvr-tcpdump.log on the VM and the parser fetches it later.""" + kh.iface_to_monitor(iface, channel) fh = open(log_path, "w") - return subprocess.Popen( - [ - "tcpdump", - "-i", - iface, - "-e", - "-nn", - "-l", - "ether", - "src", - CANONICAL_SA, - ], - stdout=fh, - stderr=subprocess.DEVNULL, - start_new_session=True, - ) + if kh.is_remote: + # Use the VM's /tmp for stdout, fetch it after termination. + cmd = ["sh", "-c", + f"tcpdump -i {shlex.quote(iface)} -e -nn -l " + f"ether src {CANONICAL_SA} 2>/dev/null"] + else: + cmd = ["tcpdump", "-i", iface, "-e", "-nn", "-l", + "ether", "src", CANONICAL_SA] + return kh.popen(cmd, stdout=fh, stderr=subprocess.DEVNULL) def _spawn_kernel_tx( - devourer_root: Path, iface: str, channel: int, duration: float, log_path: Path + kh: KernelHost, devourer_root: Path, iface: str, channel: int, + duration: float, log_path: Path, ) -> subprocess.Popen: - kernel_iface_to_monitor(iface, channel) - injector = devourer_root / "tests" / "inject_beacon.py" + """TX side: scapy injector that emits the canonical beacon. + Local mode runs tests/inject_beacon.py directly. VM mode scps it over + first, then ssh-runs it.""" + kh.iface_to_monitor(iface, channel) fh = open(log_path, "w") - return subprocess.Popen( - [ - sys.executable, - str(injector), - "--iface", - iface, - "--duration", - str(duration), - ], - stdout=fh, - stderr=subprocess.STDOUT, - start_new_session=True, - ) + injector = devourer_root / "tests" / "inject_beacon.py" + if kh.is_remote: + # Ship the injector to the VM (overwrites each run — fine for the + # tiny script). + kh.push_file(injector, "/tmp/inject_beacon.py") + cmd = ["python3", "/tmp/inject_beacon.py", + "--iface", iface, "--duration", str(duration)] + else: + cmd = [sys.executable, str(injector), + "--iface", iface, "--duration", str(duration)] + return kh.popen(cmd, stdout=fh, stderr=subprocess.STDOUT) def _terminate(proc: subprocess.Popen, grace: float = 2.0) -> None: @@ -383,14 +587,16 @@ def _terminate(proc: subprocess.Popen, grace: float = 2.0) -> None: proc.wait() +# --------------------------------------------------------------------------- +# Log parsers. +# --------------------------------------------------------------------------- + + def _count_devourer_rx_hits(log_path: Path) -> int: - """devourer RX hits show up as `...hits=N` lines, where - N is monotonically increasing per match. Take the max N seen.""" last = 0 try: for line in log_path.read_text().splitlines(): if "" in line: - # line format: "txdemo SA match: hits=N total_rx=M len=L" for tok in line.split(): if tok.startswith("hits="): try: @@ -403,16 +609,9 @@ def _count_devourer_rx_hits(log_path: Path) -> int: def _count_devourer_tx_attempts(log_path: Path) -> tuple[int, int]: - """Returns (max_tx_count, send_failures). - - WiFiDriverTxDemo rate-limits its `TX #N rc=X` prints - (first 10 + every 500th N). The max N seen estimates total attempts — - BUT when send_packet fails, the inner loop runs slower than expected - (libusb error path takes longer than success path) and N may never - reach the next 500 boundary, so the max N from prints undercounts - badly. Surface the failure count alongside so the user can see what's - going on. - """ + """Returns (max_tx_count_logged, send_failures). The print is rate-limited + so when sends fail often, max_tx_count_logged stays low — surface failure + count alongside so the picture is honest.""" last = 0 failures = 0 try: @@ -431,8 +630,6 @@ def _count_devourer_tx_attempts(log_path: Path) -> tuple[int, int]: def _count_tcpdump_hits(log_path: Path) -> int: - """tcpdump -e emits one line per packet. The filter narrows to SA matches, - so line count == hit count.""" try: return sum(1 for _ in log_path.read_text().splitlines()) except FileNotFoundError: @@ -450,6 +647,45 @@ def _count_kernel_tx_sent(log_path: Path) -> int: return 0 +# --------------------------------------------------------------------------- +# Cell orchestration. +# --------------------------------------------------------------------------- + + +def _ensure_dut_location( + dut: Dut, want_at_kernel_host: bool, kh: KernelHost +) -> None: + """Move `dut` to the right machine for the current cell. + + Local mode: nothing to physically move — DUT is always on the host — + but we do need to detach it from any kernel driver before devourer can + claim it. + + VM mode: when `want_at_kernel_host` is True, virsh attach-device to the + VM. When False, virsh detach-device back to the host, then unbind from + whatever host kernel driver claims it (so devourer can libusb-claim). + """ + if not kh.is_remote: + # Local mode: ALWAYS detach from kernel before devourer use. + if not want_at_kernel_host: + detach_from_host_kernel(dut) + else: + # Need kernel-bound on host. Ensure it's bound (re-probe). + attach_to_host_kernel(dut) + return + # VM mode. + if want_at_kernel_host: + kh.take_dut(dut) + # Give the VM kernel time to enumerate and bind. + time.sleep(3.0) + else: + kh.release_dut(dut) + # Wait for the device to reappear on the host bus. + time.sleep(2.0) + # Then detach from host kernel driver so devourer can libusb-claim it. + detach_from_host_kernel(dut) + + def run_cell( devourer_root: Path, tx_dut: Dut, @@ -459,68 +695,60 @@ def run_cell( channel: int, duration: float, tmpdir: Path, + kh: KernelHost, ) -> CellResult: - """Run one matrix cell end-to-end. - - State contract: this function is responsible for the full bind/unbind - dance of its DUTs. On entry, it first re-attaches both DUTs to the - kernel (idempotent — works whether the previous cell left them bound - or not), so a crashed previous cell doesn't poison the next one's - starting state. On exit (always — try/finally), it re-attaches anything - it detached. - """ + """Run one matrix cell. State contract: always restore DUTs to a clean + baseline (host kernel-bound) on exit via try/finally.""" cell_id = f"tx-{tx_side}_rx-{rx_side}" tx_log = tmpdir / f"{cell_id}.tx.log" rx_log = tmpdir / f"{cell_id}.rx.log" - # Restore baseline: re-attach both DUTs to whatever kernel driver they - # match. Idempotent. The wait after attach gives the kernel time to - # finish its probe path before we either use it or detach it again. - attach_to_kernel(tx_dut) - attach_to_kernel(rx_dut) - time.sleep(1.5) - - # Then detach the ones that this cell needs devourer to own. - if tx_side == "devourer": - detach_from_kernel(tx_dut) - if rx_side == "devourer": - detach_from_kernel(rx_dut) + # Stage 1: route DUTs to their target machines for this cell. + _ensure_dut_location(tx_dut, want_at_kernel_host=(tx_side == "kernel"), kh=kh) + _ensure_dut_location(rx_dut, want_at_kernel_host=(rx_side == "kernel"), kh=kh) rx_proc: Optional[subprocess.Popen] = None tx_proc: Optional[subprocess.Popen] = None try: - # RX side starts first so it's listening when TX begins. + # Stage 2: bring up RX side first so it's listening when TX begins. if rx_side == "devourer": rx_proc = _spawn_devourer_rx(devourer_root, rx_dut, channel, rx_log) - # devourer needs ~5s for fwdl + channel set before it's actually RXing. + # devourer ~5s for fwdl + channel set. time.sleep(6.0) else: - rx_iface = wait_for_wlan_iface(rx_dut) - rx_proc = _spawn_kernel_rx(rx_iface, channel, rx_log) + rx_iface = kh.wait_for_wlan_iface(rx_dut) + rx_proc = _spawn_kernel_rx(kh, rx_iface, channel, rx_log) time.sleep(1.0) - # TX side. + # Stage 3: TX side. if tx_side == "devourer": tx_proc = _spawn_devourer_tx(devourer_root, tx_dut, channel, tx_log) - # txdemo also has a 5s warmup (sleep(5) before the TX loop). tx_warmup = 6.0 else: - tx_iface = wait_for_wlan_iface(tx_dut) + tx_iface = kh.wait_for_wlan_iface(tx_dut) tx_proc = _spawn_kernel_tx( - devourer_root, tx_iface, channel, duration, tx_log + kh, devourer_root, tx_iface, channel, duration, tx_log ) tx_warmup = 0.5 time.sleep(tx_warmup) measure_start = time.monotonic() - time.sleep(duration) + if tx_side == "kernel": + # inject_beacon.py self-terminates after its --duration. Wait for + # it instead of killing — otherwise we lose the final "sent N + # frames" line and the TX-count parser sees 0. Generous margin + # for ssh setup + scapy import. + try: + tx_proc.wait(timeout=duration + 15) + except subprocess.TimeoutExpired: + pass # fall through to _terminate + else: + # devourer TX loops forever — explicit kill after `duration`. + time.sleep(duration) measure_end = time.monotonic() - # Parse counts (do this BEFORE we terminate processes so they get a - # chance to flush their stdout buffers — terminate sends SIGINT which - # devourer traps and exits cleanly, flushing in the process). + # Stage 4: shut down, drain, parse. _terminate(tx_proc) - # Let any in-flight frames drain before we close the RX side. time.sleep(1.0) _terminate(rx_proc) @@ -541,21 +769,23 @@ def run_cell( duration_s=measure_end - measure_start, ) finally: - # Always clean up subprocesses + restore kernel binding, even if the - # cell raised. This prevents one cell's failure from poisoning the - # next cell's starting state. + # Restore clean baseline so the next cell starts from a known state. if tx_proc is not None and tx_proc.poll() is None: _terminate(tx_proc) if rx_proc is not None and rx_proc.poll() is None: _terminate(rx_proc) - if tx_side == "devourer": - attach_to_kernel(tx_dut) - if rx_side == "devourer": - attach_to_kernel(rx_dut) + # Pull DUTs back to the host (so the next cell can choose freely). + if kh.is_remote: + kh.release_dut(tx_dut) + kh.release_dut(rx_dut) + time.sleep(1.5) + # Re-attach to host kernel where applicable. + attach_to_host_kernel(tx_dut) + attach_to_host_kernel(rx_dut) # --------------------------------------------------------------------------- -# Matrix driver — runs the 4 cells in baseline-first order. +# Matrix. # --------------------------------------------------------------------------- @@ -567,18 +797,14 @@ def run_matrix( duration: float, threshold: int, tmpdir: Path, + kh: KernelHost, abort_on_baseline_fail: bool = True, ) -> dict[tuple[str, str], CellResult]: cells = [ - # Baseline first — if the rig itself is broken, the other cells' - # results are uninterpretable. Default is to abort the matrix when - # this fails; override with --no-baseline-abort for partial-rig - # diagnostics (e.g. one chipset has no working kernel driver but - # devourer-only cells are still worth running). - ("kernel", "kernel"), - ("devourer", "kernel"), - ("kernel", "devourer"), - ("devourer", "devourer"), + ("kernel", "kernel"), # baseline — rig sanity + ("devourer", "kernel"), # does devourer emit valid frames? + ("kernel", "devourer"), # does devourer RX a known-good frame? + ("devourer", "devourer"), # end-to-end devourer ] results: dict[tuple[str, str], CellResult] = {} for tx_side, rx_side in cells: @@ -587,14 +813,8 @@ def run_matrix( flush=True) try: r = run_cell( - devourer_root, - tx_dut, - rx_dut, - tx_side, - rx_side, - channel, - duration, - tmpdir, + devourer_root, tx_dut, rx_dut, tx_side, rx_side, + channel, duration, tmpdir, kh, ) except Exception as e: print(f" ✗ cell crashed: {e}", flush=True) @@ -608,31 +828,24 @@ def run_matrix( and abort_on_baseline_fail ): print( - "BASELINE cell failed — the rig itself isn't moving frames. " - "Aborting remaining cells (channel busy? antennas? " - "wrong kernel driver?). Re-run with --no-baseline-abort " - "to attempt the remaining cells anyway.", - file=sys.stderr, - flush=True, + "BASELINE cell failed — the rig isn't moving frames. Aborting " + "remaining cells. Re-run with --no-baseline-abort to attempt " + "the rest anyway (useful when one chipset has no working " + "kernel driver on this rig).", + file=sys.stderr, flush=True, ) for remaining in cells[1:]: results[remaining] = CellResult( - hits=0, - tx_attempts=0, - tx_failures=0, - duration_s=0.0, - notes="skipped (baseline failed)", + hits=0, tx_attempts=0, tx_failures=0, + duration_s=0.0, notes="skipped (baseline failed)", ) break return results def emit_markdown( - tx_dut: Dut, - rx_dut: Dut, - channel: int, - duration: float, - threshold: int, + tx_dut: Dut, rx_dut: Dut, channel: int, duration: float, + threshold: int, kh: KernelHost, results: dict[tuple[str, str], CellResult], ) -> str: out = [] @@ -640,6 +853,8 @@ def emit_markdown( f"{time.strftime('%Y-%m-%d %H:%M:%S')}\n") out.append(f"- TX adapter: `{tx_dut.vidpid}` ({tx_dut.chipset})") out.append(f"- RX adapter: `{rx_dut.vidpid}` ({rx_dut.chipset})") + out.append(f"- Kernel host: " + f"{'VM ' + kh.vm_name + ' via ' + kh.ssh_target if kh.is_remote else 'local'}") out.append(f"- Cell duration: {duration:.0f}s") out.append(f"- Pass threshold: ≥ {threshold} hits\n") out.append("| | TX = devourer | TX = kernel |") @@ -648,10 +863,7 @@ def emit_markdown( cells = [] for tx_side in ("devourer", "kernel"): r = results.get((tx_side, rx_side)) - if r is None: - cells.append("—") - else: - cells.append(r.fmt(threshold)) + cells.append(r.fmt(threshold) if r else "—") out.append(f"| RX = {rx_side} | {cells[0]} | {cells[1]} |") return "\n".join(out) + "\n" @@ -666,32 +878,23 @@ def main(): description="Cross-driver regression matrix for devourer.", ) ap.add_argument( - "--devourer-root", - type=Path, + "--devourer-root", type=Path, default=Path(__file__).resolve().parent.parent, - help="repo root with build/WiFiDriverDemo + build/WiFiDriverTxDemo " - "(default: parent of this script)", + help="repo root with build/WiFiDriverDemo + build/WiFiDriverTxDemo", ) ap.add_argument( - "--channel", - type=int, - default=36, - help="Wi-Fi channel to test on (default 36 = 5GHz quiet, pick a busy " - "5GHz channel like 100 if your AP is there)", + "--channel", type=int, default=36, + help="Wi-Fi channel (default 36; pick a busy channel like 100 if your " + "AP is on it — higher hit counts mean less variance)", ) ap.add_argument( - "--duration", - type=float, - default=15.0, + "--duration", type=float, default=15.0, help="seconds each cell injects/receives (default 15)", ) ap.add_argument( - "--pass-threshold", - type=int, - default=1, + "--pass-threshold", type=int, default=1, help="min hits for a cell to pass (default 1 — generous because air " - "interference + short windows make absolute counts unreliable; " - "bump to 5-10 for higher-confidence runs on a quiet channel)", + "interference makes absolute counts unreliable)", ) ap.add_argument( "--tx-pid", @@ -702,31 +905,49 @@ def main(): help="USB PID hex of RX adapter (default: second auto-detected DUT)", ) ap.add_argument( - "--keep-logs", - action="store_true", + "--keep-logs", action="store_true", help="don't delete the per-cell log files after the run", ) ap.add_argument( - "--no-baseline-abort", - action="store_true", - help="run all 4 cells even if kernel-kernel baseline fails (useful " - "when one chipset has no working kernel driver but devourer-only " - "cells are still worth checking)", + "--no-baseline-abort", action="store_true", + help="run all 4 cells even if kernel-kernel baseline fails", + ) + ap.add_argument( + "--vm-name", + default=os.environ.get("DEVOURER_VM_NAME", ""), + help="libvirt domain to run kernel cells in (env: DEVOURER_VM_NAME). " + "If unset, kernel cells run locally on the host.", + ) + ap.add_argument( + "--vm-ssh", + default=os.environ.get("DEVOURER_VM_SSH", ""), + help="ssh target (user@host) for the VM (env: DEVOURER_VM_SSH). " + "Required if --vm-name is set.", ) args = ap.parse_args() - preflight(args.devourer_root) + if args.vm_name and not args.vm_ssh: + sys.stderr.write("--vm-name requires --vm-ssh\n") + sys.exit(2) + if args.vm_ssh and not args.vm_name: + sys.stderr.write("--vm-ssh requires --vm-name\n") + sys.exit(2) + + kh = (KernelHost.via_ssh(args.vm_ssh, args.vm_name) + if args.vm_name else KernelHost.local()) + + preflight(args.devourer_root, kh) duts = discover_duts() if len(duts) < 2: sys.stderr.write( - f"Need at least 2 supported DUTs plugged in, found {len(duts)}:\n" + f"Need at least 2 supported DUTs plugged in on host, found " + f"{len(duts)}:\n" ) for d in duts: sys.stderr.write(f" - {d.vidpid} ({d.chipset}) at {d.sysfs_id}\n") - sys.stderr.write( - "Plug another compatible adapter or extend SUPPORTED_DUTS table.\n" - ) + sys.stderr.write("Plug another compatible adapter or extend " + "SUPPORTED_DUTS table.\n") sys.exit(2) def pick(pid_arg, default_idx): @@ -746,25 +967,29 @@ def pick(pid_arg, default_idx): print(f"TX: {tx_dut.vidpid} ({tx_dut.chipset}) at {tx_dut.sysfs_id}") print(f"RX: {rx_dut.vidpid} ({rx_dut.chipset}) at {rx_dut.sysfs_id}") + print(f"Kernel host: " + f"{'VM ' + kh.vm_name + ' (' + kh.ssh_target + ')' if kh.is_remote else 'local'}") print(f"Channel: {args.channel} Duration/cell: {args.duration}s " f"Pass threshold: ≥{args.pass_threshold} hits\n") + # Clean baseline: pull any leftover DUTs back from the VM so we start + # the matrix with both DUTs on the host. + kh.release_all_known_duts([tx_dut, rx_dut]) + with tempfile.TemporaryDirectory(prefix="devourer-regress-") as td: tmpdir = Path(td) results = run_matrix( devourer_root=args.devourer_root, - tx_dut=tx_dut, - rx_dut=rx_dut, - channel=args.channel, - duration=args.duration, + tx_dut=tx_dut, rx_dut=rx_dut, + channel=args.channel, duration=args.duration, threshold=args.pass_threshold, - tmpdir=tmpdir, + tmpdir=tmpdir, kh=kh, abort_on_baseline_fail=not args.no_baseline_abort, ) print() md = emit_markdown( tx_dut, rx_dut, args.channel, args.duration, - args.pass_threshold, results, + args.pass_threshold, kh, results, ) print(md) if args.keep_logs: @@ -773,8 +998,6 @@ def pick(pid_arg, default_idx): kept.unlink() kept.symlink_to(tmpdir) print(f"(logs kept at {kept} — symlink, valid until next run)") - # Detach from cleanup by exiting before TemporaryDirectory wipes it. - # NOTE: this is sticky; the symlink will dangle next run. os._exit(0) diff --git a/tests/setup_vm.sh b/tests/setup_vm.sh new file mode 100755 index 0000000..d01f58c --- /dev/null +++ b/tests/setup_vm.sh @@ -0,0 +1,189 @@ +#!/usr/bin/env bash +# Provision a libvirt VM for kernel-side regression testing. +# +# Why a VM: aircrack-ng/rtl8812au (the out-of-tree driver with the best +# 8812/8814/8821 chipset coverage) lags newer kernels by 6-12 months. As of +# kernel 6.15+ the OOT driver hits API breakages (timer_*, cfg80211 callback +# signatures, etc.) that require ongoing patching. Putting the kernel-side +# tests in a pinned-kernel VM (Ubuntu 22.04 LTS = kernel 5.15) lets the host +# upgrade freely without breaking the test rig. +# +# Prerequisites on the host: +# - libvirtd + virsh +# - virt-install +# - xorriso (for cloud-init seed ISO) +# - jammy-base.qcow2 cloud image at /var/lib/libvirt/images/ (or set BASE) +# - SSH key for the calling user (defaults to ~/.ssh/id_rsa.pub) +# +# Usage: +# tests/setup_vm.sh # full provision (creates VM + waits for SSH) +# tests/setup_vm.sh --teardown # destroy + undefine the VM +# +# After provisioning: +# tests/setup_vm.sh --status # show VM IP, ssh hint, USB passthrough state + +set -euo pipefail + +VM_NAME="${VM_NAME:-devourer-testrig}" +VM_RAM_MB="${VM_RAM_MB:-2048}" +VM_VCPUS="${VM_VCPUS:-2}" +VM_DISK_GB="${VM_DISK_GB:-20}" +BASE_IMAGE="${BASE_IMAGE:-/var/lib/libvirt/images/jammy-base.qcow2}" +LIBVIRT_IMAGES="${LIBVIRT_IMAGES:-/var/lib/libvirt/images}" +SSH_PUBKEY="${SSH_PUBKEY:-$HOME/.ssh/id_rsa.pub}" +WORK_DIR="${WORK_DIR:-$HOME/devourer-testrig-setup}" + +cmd="${1:-provision}" + +vm_ip() { + sudo virsh domifaddr "$VM_NAME" 2>/dev/null \ + | awk '/ipv4/ {print $4}' | cut -d/ -f1 | head -1 +} + +case "$cmd" in + --teardown|teardown) + sudo virsh destroy "$VM_NAME" 2>/dev/null || true + sudo virsh undefine "$VM_NAME" --remove-all-storage 2>/dev/null || true + echo "destroyed VM $VM_NAME" + exit 0 + ;; + --status|status) + state=$(sudo virsh domstate "$VM_NAME" 2>/dev/null || echo "missing") + echo "VM: $VM_NAME ($state)" + ip=$(vm_ip) + echo "IP: ${ip:-(none — DHCP not assigned)}" + if [ -n "${ip:-}" ]; then + echo "SSH: ssh dima@$ip" + fi + echo "USB passthrough (current):" + sudo virsh dumpxml "$VM_NAME" 2>/dev/null \ + | grep -A2 '&2 + exit 1 +fi + +if [ ! -f "$BASE_IMAGE" ]; then + echo "BASE_IMAGE not found at $BASE_IMAGE" >&2 + echo "Download an Ubuntu 22.04 cloud image:" >&2 + echo " wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img" >&2 + echo " sudo mv jammy-server-cloudimg-amd64.img $BASE_IMAGE" >&2 + exit 1 +fi + +if sudo virsh dominfo "$VM_NAME" >/dev/null 2>&1; then + echo "VM $VM_NAME already exists. Use --teardown first, or set VM_NAME=other-name." >&2 + exit 1 +fi + +mkdir -p "$WORK_DIR" +cd "$WORK_DIR" + +cat > user-data < /etc/motd +EOF + +cat > meta-data </dev/null +sudo cp seed.iso "$LIBVIRT_IMAGES/${VM_NAME}-seed.iso" +sudo chown libvirt-qemu:libvirt-qemu "$LIBVIRT_IMAGES/${VM_NAME}-seed.iso" + +sudo qemu-img create -f qcow2 -F qcow2 \ + -b "$BASE_IMAGE" \ + "$LIBVIRT_IMAGES/${VM_NAME}.qcow2" "${VM_DISK_GB}G" >/dev/null +sudo chown libvirt-qemu:libvirt-qemu "$LIBVIRT_IMAGES/${VM_NAME}.qcow2" + +sudo virt-install \ + --name "$VM_NAME" \ + --memory "$VM_RAM_MB" \ + --vcpus "$VM_VCPUS" \ + --disk "path=$LIBVIRT_IMAGES/${VM_NAME}.qcow2,format=qcow2,bus=virtio" \ + --disk "path=$LIBVIRT_IMAGES/${VM_NAME}-seed.iso,device=cdrom" \ + --os-variant ubuntu22.04 \ + --network network=default,model=virtio \ + --controller usb,model=qemu-xhci,index=0 \ + --graphics none \ + --noautoconsole \ + --import >/dev/null + +echo "VM created: $VM_NAME" +echo "waiting for DHCP lease..." +for i in $(seq 1 30); do + ip=$(vm_ip) + if [ -n "$ip" ]; then + echo "got IP: $ip" + break + fi + sleep 3 +done + +if [ -z "${ip:-}" ]; then + echo "no IP after 90s; check 'virsh console $VM_NAME'" >&2 + exit 1 +fi + +echo "waiting for cloud-init to finish (installs aircrack-ng driver, ~5-10 min)..." +ssh -o StrictHostKeyChecking=accept-new -o ConnectTimeout=5 \ + -o UserKnownHostsFile=/dev/null \ + dima@"$ip" "cloud-init status --wait" 2>&1 | tail -3 + +echo +echo "=== VM ready ===" +echo "ssh dima@$ip" +echo +echo "Verify aircrack-ng driver:" +echo " ssh dima@$ip 'sudo modprobe 88XXau && lsmod | grep 88XXau'" +echo +echo "Hot-plug a DUT into the VM (example for 8814AU):" +echo " cat > /tmp/usb-8814.xml << 'XML'" +echo " " +echo " " +echo " " +echo " " +echo " " +echo " " +echo " XML" +echo " sudo virsh attach-device $VM_NAME /tmp/usb-8814.xml --live" +echo +echo "Teardown: $0 --teardown"