Tracking outstanding items from ISSUES.md that have been investigated and scoped but not yet implemented. Each section is independently actionable.
1. Remove client-side firmware ledger; make device-side verify-flash authoritative
Severity: CORRECTNESS — the disk ledger can drift from reality.
The current crates/fbuild-deploy/src/firmware_ledger.rs records SHA256 hashes of firmware.bin / bootloader.bin / partitions.bin after each successful deploy and consults that record on the next deploy to short-circuit the flash. This is fast (no device round-trip) but wrong when anything else writes to flash between fbuild deploys:
- Manual
esptool write-flash from a terminal
- Arduino IDE flash
- OTA update
- Another fbuild instance on a different machine using the same device
In any of those cases the ledger says ''current'' but the device runs a different image, and fbuild silently skips a needed deploy.
Fix: Delete the ledger entirely. The device-side verify-flash pre-check (already wired into the daemon's deploy handler — see ISSUES.md ''Performance: Fast deploy via verify-then-skip'') becomes the sole authoritative pre-check. It uses esptool's FLASH_MD5SUM stub command, so the device tells us byte-for-byte whether each region matches. Measured cost: ~6 s for a 2.4 MB ESP32-S3 image; the 76% speedup over a full re-flash is preserved.
Concrete steps:
- Delete
crates/fbuild-deploy/src/firmware_ledger.rs
- Remove the
firmware_ledger field from DaemonContext (crates/fbuild-daemon/src/context.rs)
- Remove the ledger pre-check and post-deploy
record_deployment call from crates/fbuild-daemon/src/handlers/operations.rs
- Remove the
compute_boot_parts_hashes helper added in Issue B4 (no longer needed — verify-flash covers all 3 regions)
- Update
fbuild-deploy/src/lib.rs to drop the firmware_ledger re-export
- Mark Issue B4 in
ISSUES.md as superseded by device-side verify
Side effect: the deploy handler no longer needs to compute SHA256s of source files / build flags / boot+parts artefacts. That work disappears entirely. The verify-flash call (~6 s when device matches) replaces the ledger-skip path (~0 s when ledger says match).
2. Structured error returns from send_op in fbuild-python (Issue F follow-up)
Severity: USABILITY — Python callers can't branch on failure modes.
send_op in crates/fbuild-python/src/lib.rs currently returns a bare bool and prints [fbuild] operation failed: ... / [fbuild] stderr: ... to stderr. Python consumers (FastLED, autoresearch) can only check ''did it succeed?''; they cannot distinguish ''port not found'' from ''build failed'' from ''timeout'' programmatically.
Fix: Either return a result struct (OperationResult { success, message, exit_code, stdout, stderr }) or raise a typed Python exception (FbuildDeployError, FbuildBuildError, FbuildPortError). Breaking API change — schedule for the next minor version bump.
3. Native async API in fbuild-python
Severity: USABILITY — async callers pay for thread-executor wrapping.
Today the Python bindings expose only synchronous methods. FbuildSerialAdapter._run_in_thread wraps every call in a thread executor so async callers don't block their event loop, but this generates an ''asyncio'' warning under some configurations and adds latency.
Fix: Add AsyncSerialMonitor, AsyncDaemon, etc. that use PyO3's pyo3-asyncio (or the newer pyo3-async-runtimes) to expose async def methods callable directly under asyncio.run(...). Existing sync API stays for compatibility.
4. B5a hardening leftovers: SO_LINGER 0 + SetConsoleCtrlHandler
Severity: ROBUSTNESS — covered by listener-level B5a fix; these close remaining edge cases.
The listener-level B5a fix (SO_EXCLUSIVEADDRUSE on Windows + bind retry + stale-PID cleanup) is in. Two deeper hooks remain deferred:
SO_LINGER 0 on accepted client sockets — currently axum's accept loop doesn't expose per-connection socket options. After a hard kill, the dangling CLOSE_WAIT state on accepted sockets can outlive the daemon. Setting SO_LINGER 0 on every accepted socket would force an immediate RST on close instead of the FIN/CLOSE_WAIT/TIME_WAIT dance. Requires hooking into the axum accept loop or using a custom IncomingStream.
SetConsoleCtrlHandler on Windows — tokio::signal::ctrl_c() covers CTRL_C_EVENT but not CTRL_CLOSE_EVENT (window close), CTRL_LOGOFF_EVENT, or CTRL_SHUTDOWN_EVENT. The daemon currently dies without running its graceful shutdown path on those events.
Regression test exists at crates/fbuild-daemon/tests/port_recovery.rs (run with --ignored).
5. espflash native library integration (replace esptool subprocess)
Severity: PERFORMANCE — could cut verify cost from ~6 s to ~1.5 s.
The current verify-flash pre-check shells out to esptool (Python). Cost breakdown for the ~5.9 s verify of a 2.4 MB image:
| Phase |
Estimated cost |
| Python interpreter startup |
~1 s |
| Subprocess spawn + esptool init |
~0.5 s |
| Connect + reset chip into bootloader |
~1 s |
| SYNC handshake + stub flasher upload |
~1 s |
| Baud rate change |
~0.5 s |
FLASH_MD5SUM execution (3 regions, 2.4 MB) |
~1 s |
| Cleanup + reset back to app |
~0.5 s |
The actual MD5 work is ~1 s; the rest is process / Python overhead.
Alternative: the espflash crate (4.3.0) is a Rust-native ESP32 flasher protocol implementation maintained by ESP-RS. It exposes the SLIP protocol, stub flasher loading, and FLASH_MD5SUM natively. Add espflash = { version = ''4'', default-features = false, features = [''serialport''] } and reuse the daemon's existing serial port lease instead of spawning a subprocess.
Estimated win: ~5.9 s → ~1.5–2 s for verify. Subsequent verifies in the same session could reuse the loaded stub flasher and drop further to <1 s.
Effort: medium. The espflash library API is documented but not as stable as the CLI. Might need an adapter layer to convert between fbuild's Esp32Deployer and espflash's Flasher type. Worth a spike.
6. Sector-selective flashing (only write regions that differ)
Severity: PERFORMANCE — minor win, but trivial to add once #1 is done.
Currently, when verify-flash reports a mismatch, the daemon falls through to write-flash which writes all three regions (bootloader + partitions + firmware), even if only one differs. For the common case ''only firmware changed,'' we waste ~1 s rewriting bootloader.bin and partitions.bin.
Fix: parse the verify-flash output (or run three separate verify calls with --diff) to determine which regions matched, then call write-flash with only the offset/file pairs for the mismatched regions. Saves ~1 s on the typical ''firmware-only'' deploy.
Should be tackled after #1 (ledger removal) so the flow is: device tells us what differs → we write only what differs → device verifies the write.
Cross-references
Tracking outstanding items from
ISSUES.mdthat have been investigated and scoped but not yet implemented. Each section is independently actionable.1. Remove client-side firmware ledger; make device-side
verify-flashauthoritativeSeverity: CORRECTNESS — the disk ledger can drift from reality.
The current
crates/fbuild-deploy/src/firmware_ledger.rsrecords SHA256 hashes offirmware.bin/bootloader.bin/partitions.binafter each successful deploy and consults that record on the next deploy to short-circuit the flash. This is fast (no device round-trip) but wrong when anything else writes to flash between fbuild deploys:esptool write-flashfrom a terminalIn any of those cases the ledger says ''current'' but the device runs a different image, and fbuild silently skips a needed deploy.
Fix: Delete the ledger entirely. The device-side
verify-flashpre-check (already wired into the daemon's deploy handler — see ISSUES.md ''Performance: Fast deploy via verify-then-skip'') becomes the sole authoritative pre-check. It uses esptool'sFLASH_MD5SUMstub command, so the device tells us byte-for-byte whether each region matches. Measured cost: ~6 s for a 2.4 MB ESP32-S3 image; the 76% speedup over a full re-flash is preserved.Concrete steps:
crates/fbuild-deploy/src/firmware_ledger.rsfirmware_ledgerfield fromDaemonContext(crates/fbuild-daemon/src/context.rs)record_deploymentcall fromcrates/fbuild-daemon/src/handlers/operations.rscompute_boot_parts_hasheshelper added in Issue B4 (no longer needed —verify-flashcovers all 3 regions)fbuild-deploy/src/lib.rsto drop thefirmware_ledgerre-exportISSUES.mdas superseded by device-side verifySide effect: the deploy handler no longer needs to compute SHA256s of source files / build flags / boot+parts artefacts. That work disappears entirely. The verify-flash call (~6 s when device matches) replaces the ledger-skip path (~0 s when ledger says match).
2. Structured error returns from
send_opin fbuild-python (Issue F follow-up)Severity: USABILITY — Python callers can't branch on failure modes.
send_opincrates/fbuild-python/src/lib.rscurrently returns a barebooland prints[fbuild] operation failed: .../[fbuild] stderr: ...to stderr. Python consumers (FastLED, autoresearch) can only check ''did it succeed?''; they cannot distinguish ''port not found'' from ''build failed'' from ''timeout'' programmatically.Fix: Either return a result struct (
OperationResult { success, message, exit_code, stdout, stderr }) or raise a typed Python exception (FbuildDeployError,FbuildBuildError,FbuildPortError). Breaking API change — schedule for the next minor version bump.3. Native async API in fbuild-python
Severity: USABILITY — async callers pay for thread-executor wrapping.
Today the Python bindings expose only synchronous methods.
FbuildSerialAdapter._run_in_threadwraps every call in a thread executor so async callers don't block their event loop, but this generates an ''asyncio'' warning under some configurations and adds latency.Fix: Add
AsyncSerialMonitor,AsyncDaemon, etc. that use PyO3'spyo3-asyncio(or the newerpyo3-async-runtimes) to exposeasync defmethods callable directly underasyncio.run(...). Existing sync API stays for compatibility.4. B5a hardening leftovers:
SO_LINGER 0+SetConsoleCtrlHandlerSeverity: ROBUSTNESS — covered by listener-level B5a fix; these close remaining edge cases.
The listener-level B5a fix (
SO_EXCLUSIVEADDRUSEon Windows + bind retry + stale-PID cleanup) is in. Two deeper hooks remain deferred:SO_LINGER 0on accepted client sockets — currently axum's accept loop doesn't expose per-connection socket options. After a hard kill, the danglingCLOSE_WAITstate on accepted sockets can outlive the daemon. SettingSO_LINGER 0on every accepted socket would force an immediateRSTon close instead of the FIN/CLOSE_WAIT/TIME_WAIT dance. Requires hooking into the axum accept loop or using a customIncomingStream.SetConsoleCtrlHandleron Windows —tokio::signal::ctrl_c()coversCTRL_C_EVENTbut notCTRL_CLOSE_EVENT(window close),CTRL_LOGOFF_EVENT, orCTRL_SHUTDOWN_EVENT. The daemon currently dies without running its graceful shutdown path on those events.Regression test exists at
crates/fbuild-daemon/tests/port_recovery.rs(run with--ignored).5.
espflashnative library integration (replaceesptoolsubprocess)Severity: PERFORMANCE — could cut verify cost from ~6 s to ~1.5 s.
The current verify-flash pre-check shells out to
esptool(Python). Cost breakdown for the ~5.9 s verify of a 2.4 MB image:FLASH_MD5SUMexecution (3 regions, 2.4 MB)The actual MD5 work is ~1 s; the rest is process / Python overhead.
Alternative: the
espflashcrate (4.3.0) is a Rust-native ESP32 flasher protocol implementation maintained by ESP-RS. It exposes the SLIP protocol, stub flasher loading, andFLASH_MD5SUMnatively. Addespflash = { version = ''4'', default-features = false, features = [''serialport''] }and reuse the daemon's existing serial port lease instead of spawning a subprocess.Estimated win: ~5.9 s → ~1.5–2 s for verify. Subsequent verifies in the same session could reuse the loaded stub flasher and drop further to <1 s.
Effort: medium. The
espflashlibrary API is documented but not as stable as the CLI. Might need an adapter layer to convert between fbuild'sEsp32Deployerand espflash'sFlashertype. Worth a spike.6. Sector-selective flashing (only write regions that differ)
Severity: PERFORMANCE — minor win, but trivial to add once #1 is done.
Currently, when
verify-flashreports a mismatch, the daemon falls through towrite-flashwhich writes all three regions (bootloader + partitions + firmware), even if only one differs. For the common case ''only firmware changed,'' we waste ~1 s rewriting bootloader.bin and partitions.bin.Fix: parse the
verify-flashoutput (or run three separate verify calls with--diff) to determine which regions matched, then callwrite-flashwith only the offset/file pairs for the mismatched regions. Saves ~1 s on the typical ''firmware-only'' deploy.Should be tackled after #1 (ledger removal) so the flow is: device tells us what differs → we write only what differs → device verifies the write.
Cross-references