RTL8814AU: drop REG_CR=0 post-fwdl write that wedges bulk-OUT#49
Merged
Conversation
FirmwareDownload_8814A's post-fwdl CPU kick zeroes REG_CR (0x0100) just after MCUFWDL=0x79. This clears all 8 enable bits in byte 0 (HCI TX/RX DMA, TXDMA, RXDMA, PROTOCOL, SCHEDULE, MACTXEN, MACRXEN). The later `REG_CR |= MACTXEN|MACRXEN` at HalModule.cpp:241 only re-sets bits 6+7, leaving the DMA-enable bits 0..5 at zero — so the chip's TX/RX DMA engines never come up. bulk-OUT URBs queue at EP 0x02 but the FIFO never drains; URBs sit until libusb's 500 ms async timeout cancels them (-ENOENT), producing the catastrophic submit-failure pattern reported in #36. Kernel rtw88_8814au never writes REG_CR=0 during post-fwdl. The "byte-for-byte rtw88-mirror" comment block above this code was wrong about this specific address. Bisected today by gating the 7 divergent post-fwdl writes individually behind env vars; only 0x0100 reproduces the wedge. Verification: - Local devourer-TX 12 s on 8814AU: 2203/2203 OK (was 0.4% completion) - 8812AU + 8821AU sanity: unchanged (different code path) - tests/regress.py --full-matrix: 8814 devourer-TX cells [2,4,6,8] now show 0 fail annotation (was 4700+ failures each) The fix is sufficient for #36 but does not restore 8814AU on-air emission — chips ACK URBs cleanly but no frames hit air. That is a separate gate (TX descriptor or rate config) and out of scope here. Closes #36. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 26, 2026
josephnef
added a commit
that referenced
this pull request
May 26, 2026
In RtlJaguarDevice::send_packet the SET_TX_DESC_*_8812 macros are
bit-identical to the SET_TX_DESC_*_8814A macros (verified against
hal/rtl8814a_xmit.h), so devourer can keep using the 8812 macro set
on 8814A. But a usbmon byte-diff against a working VM-passthrough
88XXau monitor-injection session (qemu USB-host-passthrough → VM
kernel 88XXau → bulk-OUT URBs back through host xhci) shows three
field-value mismatches on 8814A:
Dword 0 bit 31 — 8812 calls it OWN, 8814A calls it DISQSELSEQ.
88XXau leaves bit 31 = 0 for monitor-injected frames; devourer's
SET_TX_DESC_OWN_8812(..., 1) sets it to 1, which on 8814A means
DISQSELSEQ=1 (disable queue-select-based sequence numbering).
Dword 2 bits 24-29 (GID) — 88XXau leaves at 0 for injection;
devourer writes 0x3F.
Dword 4 bits 18-23 (DATA_RETRY_LIMIT) — 88XXau leaves at 0 for
injection; devourer writes 12 (RETRY_LIMIT_ENABLE stays 1 in both).
Skip those writes on 8814A so the emitted descriptor byte-matches
aircrack-ng's reference monitor-injection format. Add a
DEVOURER_TX_LEGACY_8812_DESC=1 env-gate to restore the old behaviour
without rebuilding, in case anything downstream depends on it.
This does NOT resolve #50 (8814AU on-air silence has a separate root
cause that vendor-control-write replay cannot reach — both sessions on
2026-05-26 ruled out 9 distinct hypotheses including a binary
URB-flag diff, see comment-4546974748). The change is purely about
descriptor correctness — aligning devourer's TX descriptor format
with the byte-level reference that the working kernel driver produces.
8812AU and 8821AU paths are bit-for-bit identical to current master
(is_8814a is false there and all writes fire as before). Smoke-tested
on the live bench:
8812AU: 760 submits / 760 complete / 0 fail
8814AU (new): 3572 submits / 3572 complete / 0 fail (vs current
master's behaviour, which is identical at libusb level
because devourer's descriptor differences from 88XXau
are no-ops at the bulk-OUT path post-PR-#49)
8814AU (DEVOURER_TX_LEGACY_8812_DESC=1): same as without env
Refs #50 (partial — descriptor alignment only, not the on-air gate).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
josephnef
added a commit
that referenced
this pull request
May 26, 2026
In RtlJaguarDevice::send_packet the SET_TX_DESC_*_8812 macros are
bit-identical to the SET_TX_DESC_*_8814A macros (verified against
hal/rtl8814a_xmit.h), so devourer can keep using the 8812 macro set
on 8814A. But a usbmon byte-diff against a working VM-passthrough
88XXau monitor-injection session (qemu USB-host-passthrough → VM
kernel 88XXau → bulk-OUT URBs back through host xhci) shows three
field-value mismatches on 8814A:
Dword 0 bit 31 — 8812 calls it OWN, 8814A calls it DISQSELSEQ.
88XXau leaves bit 31 = 0 for monitor-injected frames; devourer's
SET_TX_DESC_OWN_8812(..., 1) sets it to 1, which on 8814A means
DISQSELSEQ=1 (disable queue-select-based sequence numbering).
Dword 2 bits 24-29 (GID) — 88XXau leaves at 0 for injection;
devourer writes 0x3F.
Dword 4 bits 18-23 (DATA_RETRY_LIMIT) — 88XXau leaves at 0 for
injection; devourer writes 12 (RETRY_LIMIT_ENABLE stays 1 in both).
Skip those writes on 8814A so the emitted descriptor byte-matches
aircrack-ng's reference monitor-injection format. Add a
DEVOURER_TX_LEGACY_8812_DESC=1 env-gate to restore the old behaviour
without rebuilding, in case anything downstream depends on it.
This does NOT resolve #50 (8814AU on-air silence has a separate root
cause that vendor-control-write replay cannot reach — both sessions on
2026-05-26 ruled out 9 distinct hypotheses including a binary
URB-flag diff, see comment-4546974748). The change is purely about
descriptor correctness — aligning devourer's TX descriptor format
with the byte-level reference that the working kernel driver produces.
8812AU and 8821AU paths are bit-for-bit identical to current master
(is_8814a is false there and all writes fire as before). Smoke-tested
on the live bench:
8812AU: 760 submits / 760 complete / 0 fail
8814AU (new): 3572 submits / 3572 complete / 0 fail (vs current
master's behaviour, which is identical at libusb level
because devourer's descriptor differences from 88XXau
are no-ops at the bulk-OUT path post-PR-#49)
8814AU (DEVOURER_TX_LEGACY_8812_DESC=1): same as without env
Refs #50 (partial — descriptor alignment only, not the on-air gate).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
josephnef
added a commit
that referenced
this pull request
May 30, 2026
…#58) ## Summary `tests/regress.py --channel` defaulted to `36` (5GHz UNII-1), and every matrix invocation in `README.md` + `CLAUDE.md` examples used `--channel 100` (5GHz UNII-2-extended). This hid a long-standing fact: **devourer's 5GHz code path has broken cells for 8814 RX, 8821 TX, and 8821 RX that all pass at 2.4GHz**. The "RTL8814AU... RX solid" line in `CLAUDE.md` was correct AT 2.4GHz but appeared to contradict matrix output captured at 5GHz — which is why PR bodies #34, #42, and #49 all record "8814 RX devourer still broken" but those cells work fine at ch6. ## What this changes - `tests/regress.py` — default `--channel` → `6`. Help text spells out that 5GHz has known broken cells. - `tests/README.md` — example invocations drop the explicit `--channel 100`. Added a "Channel / band asymmetry" entry to Known Limitations. - `CLAUDE.md` — regress.py examples drop `--channel 100`. Adds a paragraph explaining the band asymmetry. ## What this does NOT change - The actual 5GHz code-path issues — separate investigation (follow-up PR will tackle 8814 RX at 5G, 8821 TX/RX at 5G). - The persistent 8814AU TX gate — 0 hits at both bands; unchanged. - The 8812AU code paths, which work at both bands. ## Empirical evidence — single-pair matrix at both bands, master `9e5287e` post-PR-57 VM mode (`devourer-testrig` + `aircrack-ng/88XXau`), 12s per cell, `--no-baseline-abort`. ### TX=8812, RX=8814 | cell | ch100 | ch6 | |---|---|---| | kernel TX → kernel RX (baseline) | 292 ✓ | 339 ✓ | | devourer TX → kernel RX | 4839 ✓ | 5279 ✓ | | **kernel TX → devourer RX** | **0 ✗** | **300 ✓** | | **devourer TX → devourer RX** | **0 ✗** | **5500 ✓** | ### TX=8821, RX=8812 | cell | ch100 | ch6 | |---|---|---| | kernel TX → kernel RX (baseline) | 108 ✓ | 336 ✓ | | **devourer TX → kernel RX** | **0 ✗** | **5544 ✓** | | kernel TX → devourer RX | 100 ✓ | 300 ✓ | | **devourer TX → devourer RX** | **0 ✗ (105 fail)** | **5500 ✓** | ### TX=8812, RX=8821 | cell | ch100 (extrapolated from full-matrix) | ch6 | |---|---|---| | kernel TX → kernel RX (baseline) | 348 ✓ | 345 ✓ | | devourer TX → kernel RX | 5517 ✓ | 5279 ✓ | | **kernel TX → devourer RX** | **0 ✗** | **300 ✓** | | **devourer TX → devourer RX** | **0 ✗** | **5200 ✓** | ### TX=8814, RX=anything (8814 TX gate — broken on both bands) `0 hits` at both ch100 and ch6 for every cell where devourer TX is on 8814AU. Pre-existing gate, not addressed here. See kaeru cite `RTL8814AU libusb-userspace bulk-OUT does not produce on-air TX`. ## Why ch6 as default - The OpenIPC long-range-video use case typically runs at 2.4GHz. - Out-of-the-box matrix runs should pass for the chips that work — otherwise contributors get false-failure noise. - The 5GHz issues are real but separate; the new help text + Known Limitations entry tell users how to surface them deliberately. ## Test plan - [x] `python3 -c 'import tests.regress'` clean import - [x] `python3 tests/regress.py --help` renders the new help text - [x] Single-pair matrix at `--channel 6` runs end-to-end and passes for 8812/8821 chip combos (table above) - [x] Single-pair matrix at `--channel 100` reproduces the historical 5GHz broken cells (table above) - [x] `--full-matrix --channel 100` matches prior PR bodies' tables (confirms the change doesn't alter 5GHz behavior — it only flips the default) ## Follow-up Separate PR will investigate why devourer's 5GHz path is broken for 8814 RX / 8821 TX / 8821 RX. Probably a band-switch register sequence missing somewhere in `RadioManagementModule::PHY_SwitchWirelessBand8812` or the per-channel BB setup. Saved as kaeru cite `devourer 5GHz vs 2.4GHz cell asymmetry — matrix --channel 100 default hides working 2.4G state` for the next session. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FirmwareDownload_8814Awas writingREG_CR (0x0100) = 0immediately afterMCUFWDL=0x79. This clears all 8 enable bits in byte 0 — including the DMA-enable bits (0..5).REG_CR |= MACTXEN | MACRXENatHalModule.cpp:241is a 2-bit OR; it sets bits 6+7 but leaves bits 0..5 at zero. So the chip's TX/RX DMA engines never come up: bulk-OUT URBs queue at EP0x02but the FIFO has no drain path. URBs sit at the chip until libusb's 500 ms async timeout cancels them (-ENOENT), giving the catastrophic submit-failure pattern reported in RTL8814AU: devourer TX degrades to LIBUSB_ERROR_IO after USB passthrough cycles #36.rtw88_8814aunever writesREG_CR=0during post-fwdl. The "byte-for-byte rtw88-mirror" comment block above this code is wrong on this specific address.0x010d,0x0100,0x1330,0x0230,0x022c,REG_BCN_CTRL,0x0210) behind env vars; only0x0100reproduces the wedge.Scope
LIBUSB_TRANSFER_TIMED_OUTsubmit failures on devourer-TX 8814AU after USB cycling).RTL8814AU devourer-RXin matrix is also still broken (cells 11/12/19/20/23/24 = 0 hits) — pre-existing, unrelated.Test plan
WiFiDriverTxDemo12 s on0bda:8813: 2203/2203 OK, 0 fail (was 815 submits / 575 fail = 0.4% completion on master).RTL8812AUWiFiDriverTxDemosanity: 796/796/0 unchanged (different code path).RTL8821AUWiFiDriverTxDemosanity: 991/991/0 unchanged (different code path).sudo python3 tests/regress.py --full-matrix --channel 100 --vm-name devourer-testrig --vm-ssh josephnef@...(the original RTL8814AU: devourer TX degrades to LIBUSB_ERROR_IO after USB passthrough cycles #36 repro): 8814 devourer-TX cells[2,4,6,8]now show0 hits / 4500 TX(no(N fail)annotation, indicatingtx_failures == 0perregress.py:494-495). Before fix: each cell showed(4700+ fail). 8812/8821 devourer-TX cells unchanged (5927–6884 hits, identical to pre-fix).🤖 Generated with Claude Code