Outstanding work: ledger removal, send_op errors, async API, B5a hardening, espflash native, sector-selective flashing

Tracking outstanding items from `ISSUES.md` that have been investigated and scoped but not yet implemented. Each section is independently actionable.

## 1. Remove client-side firmware ledger; make device-side `verify-flash` authoritative

**Severity:** CORRECTNESS — the disk ledger can drift from reality.

The current `crates/fbuild-deploy/src/firmware_ledger.rs` records SHA256 hashes of `firmware.bin` / `bootloader.bin` / `partitions.bin` after each successful deploy and consults that record on the next deploy to short-circuit the flash. This is fast (no device round-trip) but **wrong** when anything else writes to flash between fbuild deploys:

- Manual `esptool write-flash` from a terminal
- Arduino IDE flash
- OTA update
- Another fbuild instance on a different machine using the same device

In any of those cases the ledger says ''current'' but the device runs a different image, and fbuild silently skips a needed deploy.

**Fix:** Delete the ledger entirely. The device-side `verify-flash` pre-check (already wired into the daemon's deploy handler — see ISSUES.md ''Performance: Fast deploy via verify-then-skip'') becomes the **sole** authoritative pre-check. It uses esptool's `FLASH_MD5SUM` stub command, so the device tells us byte-for-byte whether each region matches. Measured cost: ~6 s for a 2.4 MB ESP32-S3 image; the 76% speedup over a full re-flash is preserved.

**Concrete steps:**

1. Delete `crates/fbuild-deploy/src/firmware_ledger.rs`
2. Remove the `firmware_ledger` field from `DaemonContext` (`crates/fbuild-daemon/src/context.rs`)
3. Remove the ledger pre-check and post-deploy `record_deployment` call from `crates/fbuild-daemon/src/handlers/operations.rs`
4. Remove the `compute_boot_parts_hashes` helper added in Issue B4 (no longer needed — `verify-flash` covers all 3 regions)
5. Update `fbuild-deploy/src/lib.rs` to drop the `firmware_ledger` re-export
6. Mark Issue B4 in `ISSUES.md` as **superseded by device-side verify**

**Side effect:** the deploy handler no longer needs to compute SHA256s of source files / build flags / boot+parts artefacts. That work disappears entirely. The verify-flash call (~6 s when device matches) replaces the ledger-skip path (~0 s when ledger says match).

---

## 2. Structured error returns from `send_op` in fbuild-python (Issue F follow-up)

**Severity:** USABILITY — Python callers can't branch on failure modes.

`send_op` in `crates/fbuild-python/src/lib.rs` currently returns a bare `bool` and prints `[fbuild] operation failed: ...` / `[fbuild] stderr: ...` to stderr. Python consumers (FastLED, autoresearch) can only check ''did it succeed?''; they cannot distinguish ''port not found'' from ''build failed'' from ''timeout'' programmatically.

**Fix:** Either return a result struct (`OperationResult { success, message, exit_code, stdout, stderr }`) or raise a typed Python exception (`FbuildDeployError`, `FbuildBuildError`, `FbuildPortError`). Breaking API change — schedule for the next minor version bump.

---

## 3. Native async API in fbuild-python

**Severity:** USABILITY — async callers pay for thread-executor wrapping.

Today the Python bindings expose only synchronous methods. `FbuildSerialAdapter._run_in_thread` wraps every call in a thread executor so async callers don't block their event loop, but this generates an ''asyncio'' warning under some configurations and adds latency.

**Fix:** Add `AsyncSerialMonitor`, `AsyncDaemon`, etc. that use PyO3's `pyo3-asyncio` (or the newer `pyo3-async-runtimes`) to expose `async def` methods callable directly under `asyncio.run(...)`. Existing sync API stays for compatibility.

---

## 4. B5a hardening leftovers: `SO_LINGER 0` + `SetConsoleCtrlHandler`

**Severity:** ROBUSTNESS — covered by listener-level B5a fix; these close remaining edge cases.

The listener-level B5a fix (`SO_EXCLUSIVEADDRUSE` on Windows + bind retry + stale-PID cleanup) is in. Two deeper hooks remain deferred:

- **`SO_LINGER 0` on accepted client sockets** — currently axum's accept loop doesn't expose per-connection socket options. After a hard kill, the dangling `CLOSE_WAIT` state on accepted sockets can outlive the daemon. Setting `SO_LINGER 0` on every accepted socket would force an immediate `RST` on close instead of the FIN/CLOSE_WAIT/TIME_WAIT dance. Requires hooking into the axum accept loop or using a custom `IncomingStream`.
- **`SetConsoleCtrlHandler` on Windows** — `tokio::signal::ctrl_c()` covers `CTRL_C_EVENT` but **not** `CTRL_CLOSE_EVENT` (window close), `CTRL_LOGOFF_EVENT`, or `CTRL_SHUTDOWN_EVENT`. The daemon currently dies without running its graceful shutdown path on those events.

Regression test exists at `crates/fbuild-daemon/tests/port_recovery.rs` (run with `--ignored`).

---

## 5. `espflash` native library integration (replace `esptool` subprocess)

**Severity:** PERFORMANCE — could cut verify cost from ~6 s to ~1.5 s.

The current verify-flash pre-check shells out to `esptool` (Python). Cost breakdown for the ~5.9 s verify of a 2.4 MB image:

| Phase | Estimated cost |
|---|---|
| Python interpreter startup | ~1 s |
| Subprocess spawn + esptool init | ~0.5 s |
| Connect + reset chip into bootloader | ~1 s |
| SYNC handshake + stub flasher upload | ~1 s |
| Baud rate change | ~0.5 s |
| `FLASH_MD5SUM` execution (3 regions, 2.4 MB) | ~1 s |
| Cleanup + reset back to app | ~0.5 s |

The actual MD5 work is **~1 s**; the rest is process / Python overhead.

**Alternative:** the [`espflash`](https://crates.io/crates/espflash) crate (4.3.0) is a Rust-native ESP32 flasher protocol implementation maintained by ESP-RS. It exposes the SLIP protocol, stub flasher loading, and `FLASH_MD5SUM` natively. Add `espflash = { version = ''4'', default-features = false, features = [''serialport''] }` and reuse the daemon's existing serial port lease instead of spawning a subprocess.

**Estimated win:** ~5.9 s → ~1.5–2 s for verify. Subsequent verifies in the same session could reuse the loaded stub flasher and drop further to <1 s.

**Effort:** medium. The `espflash` library API is documented but not as stable as the CLI. Might need an adapter layer to convert between fbuild's `Esp32Deployer` and espflash's `Flasher` type. Worth a spike.

---

## 6. Sector-selective flashing (only write regions that differ)

**Severity:** PERFORMANCE — minor win, but trivial to add once #1 is done.

Currently, when `verify-flash` reports a mismatch, the daemon falls through to `write-flash` which writes **all three regions** (bootloader + partitions + firmware), even if only one differs. For the common case ''only firmware changed,'' we waste ~1 s rewriting bootloader.bin and partitions.bin.

**Fix:** parse the `verify-flash` output (or run three separate verify calls with `--diff`) to determine which regions matched, then call `write-flash` with only the offset/file pairs for the **mismatched** regions. Saves ~1 s on the typical ''firmware-only'' deploy.

Should be tackled **after** #1 (ledger removal) so the flow is: device tells us what differs → we write only what differs → device verifies the write.

---

## Cross-references

- ISSUES.md ''Issue B4: Firmware ledger'' — to be marked SUPERSEDED by #1
- ISSUES.md ''Performance: Fast deploy via verify-then-skip'' — pre-check is in place; #1 makes it the *only* check
- ISSUES.md ''Issue F: send_op silently swallows errors'' — #2 is the structured-error follow-up
- ISSUES.md ''Issue B5a: SO_REUSEADDR is a workaround'' — #4 is the hardening leftovers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outstanding work: ledger removal, send_op errors, async API, B5a hardening, espflash native, sector-selective flashing #18

1. Remove client-side firmware ledger; make device-side `verify-flash` authoritative

2. Structured error returns from `send_op` in fbuild-python (Issue F follow-up)

3. Native async API in fbuild-python

4. B5a hardening leftovers: `SO_LINGER 0` + `SetConsoleCtrlHandler`

5. `espflash` native library integration (replace `esptool` subprocess)

6. Sector-selective flashing (only write regions that differ)

Cross-references

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phase	Estimated cost
Python interpreter startup	~1 s
Subprocess spawn + esptool init	~0.5 s
Connect + reset chip into bootloader	~1 s
SYNC handshake + stub flasher upload	~1 s
Baud rate change	~0.5 s
`FLASH_MD5SUM` execution (3 regions, 2.4 MB)	~1 s
Cleanup + reset back to app	~0.5 s

Outstanding work: ledger removal, send_op errors, async API, B5a hardening, espflash native, sector-selective flashing #18

Description

1. Remove client-side firmware ledger; make device-side verify-flash authoritative

2. Structured error returns from send_op in fbuild-python (Issue F follow-up)

3. Native async API in fbuild-python

4. B5a hardening leftovers: SO_LINGER 0 + SetConsoleCtrlHandler

5. espflash native library integration (replace esptool subprocess)

6. Sector-selective flashing (only write regions that differ)

Cross-references

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Remove client-side firmware ledger; make device-side `verify-flash` authoritative

2. Structured error returns from `send_op` in fbuild-python (Issue F follow-up)

4. B5a hardening leftovers: `SO_LINGER 0` + `SetConsoleCtrlHandler`

5. `espflash` native library integration (replace `esptool` subprocess)