Skip to content

fix: close WAL torn-record window in Append(), handle torn tails in Replay() (PILOT-82)#2

Open
matthew-pilot wants to merge 2 commits into
mainfrom
openclaw/pilot-82-20260528-024846
Open

fix: close WAL torn-record window in Append(), handle torn tails in Replay() (PILOT-82)#2
matthew-pilot wants to merge 2 commits into
mainfrom
openclaw/pilot-82-20260528-024846

Conversation

@matthew-pilot
Copy link
Copy Markdown
Collaborator

What failed

wal/wal.go Append() wrote the 4-byte length prefix and the data payload as two separate Write() calls. A crash/power-loss between them left a valid length on disk with no corresponding data — a torn tail. Replay() then hit io.ReadFull() on the missing data and returned a hard error, which could prevent the rendezvous server from starting after recovery.

Why this fix

  1. Append() now allocates a combined [4-byte len][data] buffer and writes it in a single Write() call, narrowing the crash window to one syscall boundary.
  2. Replay() now treats io.EOF / io.ErrUnexpectedEOF on the data read as a torn tail from a partial write — breaks cleanly and returns the entries replayed so far rather than failing the entire recovery.

Verification

  • go build ./... — clean
  • go vet ./... — clean
  • go test ./... — all 19 packages pass, 0 regressions
  • New TestWALReplayTornTail confirms recovery from a truncated WAL

Scope

  • 1 file (wal/wal.go): +14/-6 functional lines
  • 1 test file (wal/zz_wal_test.go): +57 lines (new test)
  • Backward-compatible — no format change, existing WAL files replay identically

Closes PILOT-82

Matthew Pilot added 2 commits May 28, 2026 02:49
…Replay()

Append() previously wrote the 4-byte length prefix and the data payload
as two separate Write() calls — a crash between them left a valid length
on disk with no corresponding data. Replay() then hit io.ReadFull on the
missing data and returned a hard error, blocking recovery startup.

Changes:
- Append() now allocates a combined [4-byte len][data] buffer and writes
  it in a single Write() call, narrowing the crash window to one syscall
- Replay() now treats io.EOF / io.ErrUnexpectedEOF on the data read as a
  torn tail from a partial write, breaking cleanly and returning the
  entries replayed so far rather than failing the entire recovery

Backward-compatible with existing WAL files (format unchanged).

Closes PILOT-82
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant