Skip to content

backup: Phase 0b M6 implementation - cmd/elastickv-snapshot-encode + library#904

Open
bootjp wants to merge 17 commits into
mainfrom
backup/m6-cli-design
Open

backup: Phase 0b M6 implementation - cmd/elastickv-snapshot-encode + library#904
bootjp wants to merge 17 commits into
mainfrom
backup/m6-cli-design

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Jun 1, 2026

Summary

Phase 0b M6 implementation per the merged design doc docs/design/2026_06_01_proposed_snapshot_encode_cli.md (#896, merged at fe9e941).

Wires the merged M1–M5 encoder slices into a user-facing CLI plus a library entrypoint mirroring the decoder's DecodeSnapshot. Implements the round-trip self-test the parent doc mandates, with write-then-rename atomic publish so a self-test failure never reaches the restore path.

What lands

Library (internal/backup/):

  • encode_snapshot.go: EncodeSnapshot(opts, out io.Writer) (EncodeResult, error) — high-level wrapper that dispatches per-adapter encoders in canonical fan-out order (redis → dynamodb → s3 → sqs), implements two-mode buffering (stream when SelfTest=false, buffer when SelfTest=true), runs the structural self-test against the in-memory buffer, and copies to out only on match. Unexported corruptBufferForTest hook lets same-package tests inject buffer corruption that reaches the self-test decode but never out (codex P2 v6 docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896 — write-then-rename atomicity).
  • encode_info.go: EncodeInfo schema + WriteEncodeInfo / ReadEncodeInfo helpers + EncodeInfoSidecarPath (path-derived <output>.encode_info.json, no static-name collisions per gemini medium v2 docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896). Format-version gate so a future bump surfaces as ErrUnsupportedEncodeInfoFormatVersion.
  • manifest.go: Exclusions.RenameS3Collisions bool added with JSON tag rename_s3_collisions. Intentionally NOT in exclusionsRequiredFields so legacy manifests decode safely with the zero value false (claude v5 medium docs(backup): propose Phase 0b M6 - cmd/elastickv-snapshot-encode CLI #896).

Decoder CLI update (cmd/elastickv-snapshot-decode/main.go):

  • emitManifest now populates the new RenameS3Collisions field from cfg.renameCollisions, completing the decoder→encoder round-trip for --rename-collisions dumps.

Encoder CLI (cmd/elastickv-snapshot-encode/main.go):

  • Flags: --input, --output, --adapter (decoder-parity CSV parser), --last-commit-ts, --self-test, --scratch-root.
  • Atomic publish: write to <output>.tmp-<random>, fsync+close, then rename. Self-test failure → mismatch.txt next to where .fsm would have been, temp file removed, exit 2.
  • Fail-closed HLC ceiling: --last-commit-ts T < manifest → exit 2 with typed ErrSelfTestLowerLastCommitTS.
  • Exit codes: 0 success / 1 user-input error / 2 data-correctness failure (decoder-parity).
  • encoder_version stamped at build time via -ldflags "-X main.version=..." (mirrors decoder pattern at main.go:45).

Test plan

go test -race -count=1 — all green:

  • internal/backup/: 3 new tests (encode_info schema, format gate, legacy manifest forward-compat)
  • internal/backup/encode_snapshot_test.go: 5 tests (library round-trip, self-test match against canonicalized input, corruption never reaches out, missing-input guard, sidecar path derivation)
  • cmd/elastickv-snapshot-encode/main_test.go: 9 tests (missing manifest, unknown adapter, lower-TS fail-closed, equal+higher TS accept, path-derived sidecar, two-files-no-collision, full round-trip with --self-test, atomic-publish never leaves bad .fsm, --last-commit-ts parser)

make lint: clean.

Caller audit per CLAUDE.md semantic-change rule

  • Exclusions struct gained a field. Existing callers either build via field-tagged literals (decoder CLI's emitManifest — updated to populate the new field) or read it (encoder's buildSelfTestDecodeOptions — new code). No silent semantic change.
  • DecodeOptions.RenameS3Collisions was already a public field used by the decoder; the encoder now also reads it via the manifest. No caller-side change needed.

Self-review (5 passes)

  1. Data loss: write-then-rename atomic publish — self-test failure never publishes the .fsm. Corrupted buffer never reaches the io.Writer (pinned by TestEncodeSnapshotSelfTestDetectsCorruption asserting out.Len()==0). All b.Add errors propagate through EncodeSnapshot.
  2. Concurrency: pure offline. CLI is single-shot; library takes a caller-owned io.Writer. Temp-file suffix uses crypto/rand so concurrent encodes against the same --output cannot collide.
  3. Performance: SelfTest=false streams with one sha256.Writer tee, no extra allocations. SelfTest=true allocates one FSM-sized *bytes.Buffer plus the scratch decode tree (documented memory cost).
  4. Data consistency: --last-commit-ts T < manifest fail-closed with typed error; self-test threads MANIFEST DecodeOptions (Exclusions.* + DynamoDBLayout → DynamoDBBundleJSONL) so trees produced with non-default decoder flags round-trip cleanly.
  5. Test coverage: 17 new tests cover library entrypoint, CLI flag parsing, atomic publish discipline, sidecar path-derivation, corruption detection, forward-compat for legacy manifests, and the four user-visible behaviors (round-trip / override / scope / missing-manifest).

Risk

Low. The encoder is offline; restoration is non-destructive (new keyspace on a fresh cluster via stop-replace-restart). The only public-API change is the Exclusions.RenameS3Collisions field, which is forward-compat for older manifests. All existing M1–M5 tests continue to pass.

Summary by CodeRabbit

  • New Features

    • Added snapshot encoding CLI with atomic publish, multi-adapter support, self-test verification, last-commit override rules, and an encode sidecar (ENCODE_INFO.json).
    • Manifests now include the S3 rename-collision flag so collision behavior is preserved.
  • Bug Fixes

    • Self-test failures no longer publish bad outputs; mismatches produce a sidecar/mismatch artifact.
  • Tests

    • Extensive end-to-end and unit coverage added for CLI, library, manifest, and sidecar behaviors.

bootjp added 2 commits June 2, 2026 00:53
Implementation of the M6 design (this PR's earlier doc commits 31c0a0f
through 1736435). Wires the merged M1-M5 encoder slices into a
user-facing CLI plus a library entrypoint, with the round-trip self-test
the parent doc mandates.

## What ships

- internal/backup/manifest.go: Exclusions gains RenameS3Collisions bool
  with json tag rename_s3_collisions. Intentionally NOT added to
  exclusionsRequiredFields so legacy manifests decode safely with the
  zero value false. Pinned by TestExclusionsLegacyManifestOmitsRenameS3Collisions.

- cmd/elastickv-snapshot-decode/main.go: emitManifest now populates
  the new field from cfg.renameCollisions, so M6's self-test can thread
  the same option back through DecodeSnapshot.

- internal/backup/encode_info.go: ENCODE_INFO.json schema with
  format_version gate, NewEncodeInfo constructor, WriteEncodeInfo /
  ReadEncodeInfo helpers, and EncodeInfoSidecarPath that derives the
  sidecar from the .fsm path (<output>.encode_info.json — no static
  name collisions).

- internal/backup/encode_snapshot.go: EncodeSnapshot(opts, out) library
  entrypoint mirroring DecodeSnapshot. Dispatches per-adapter encoders
  in canonical fan-out order (redis -> dynamodb -> s3 -> sqs), and
  implements the two-mode buffering model:
  - SelfTest=false: stream FSM straight to out with a sha256.Writer tee.
  - SelfTest=true: buffer FSM in *bytes.Buffer, fire the unexported
    corruptBufferForTest hook (if set), self-test against the buffer,
    copy to out ONLY on match. Corruption hooks reach the self-test
    decode but never reach out (write-then-rename atomicity).

- cmd/elastickv-snapshot-encode/main.go: CLI with --input / --output /
  --adapter / --last-commit-ts / --self-test / --scratch-root flags;
  decoder-parity adapter CSV parser; fail-closed
  T >= manifest.last_commit_ts validation; atomic publish via
  <output>.tmp-<random> with deferred cleanup; sidecar emission with
  fsync+close discipline.

## Tests (12 new)

internal/backup/encode_info_test.go:
- TestEncodeInfoRoundTrip
- TestEncodeInfoRejectsUnknownFormatVersion
- TestExclusionsLegacyManifestOmitsRenameS3Collisions

internal/backup/encode_snapshot_test.go:
- TestEncodeSnapshotLibraryRoundTrip
- TestEncodeSnapshotSelfTestMatchesInput (uses canonicalize-once
  pattern: encode -> decode -> re-encode self-tests cleanly)
- TestEncodeSnapshotSelfTestDetectsCorruption (corruption reaches
  self-test decode, NEVER reaches the io.Writer)
- TestEncodeSnapshotRequiresInputRoot
- TestEncodeInfoSidecarPath

cmd/elastickv-snapshot-encode/main_test.go:
- TestCLIRejectsMissingManifest
- TestCLIRejectsUnknownAdapter
- TestCLIRejectsLowerLastCommitTSOverride
- TestCLIAcceptsEqualAndHigherLastCommitTSOverride (sub-tests for
  equal + higher)
- TestCLIEncodeInfoPathDerivedFromOutput
- TestCLIEncodeInfoTwoFilesNoCollision
- TestCLIRoundTripSelfTestAllAdapters
- TestCLISelfTestFailureLeavesNoFsmAtOutputPath
- TestParseLastCommitTS

All green (race + no cache). golangci-lint clean.

## Caller audit per CLAUDE.md semantic-change rule

- Exclusions struct gained a field. Existing callers either:
  (a) build the struct via field-tagged literals (decoder CLI's
      emitManifest — updated to populate the new field), or
  (b) read it (encoder's buildSelfTestDecodeOptions — new code).
  No silent semantic change for any pre-existing caller.

- DecodeOptions.RenameS3Collisions was already a public field used by
  the decoder; the encoder now also reads it via the manifest. No
  caller-side change needed.

## Self-review

1. Data loss: write-then-rename atomic publish — self-test failure
   never publishes the .fsm. Mismatch txt + corrupt buffer never
   reach the io.Writer (pinned by
   TestEncodeSnapshotSelfTestDetectsCorruption asserting out.Len()==0).
2. Concurrency: pure offline. The CLI is a single-shot binary; library
   entrypoint takes an io.Writer the caller owns.
3. Performance: SelfTest=false streams with one sha256 tee, no extra
   allocations. SelfTest=true allocates one FSM-sized buffer plus
   the scratch decode tree (documented memory cost).
4. Data consistency: --last-commit-ts T < manifest is fail-closed with
   typed error; self-test threads MANIFEST DecodeOptions
   (Exclusions.* + DynamoDBLayout -> DynamoDBBundleJSONL) so trees
   produced with non-default decoder flags round-trip cleanly.
5. Test coverage: 12 new tests cover library entrypoint, CLI flag
   parsing, atomic publish discipline, sidecar path-derivation,
   corruption detection, forward-compat for legacy manifests.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Warning

Review limit reached

@bootjp, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 6 minutes and 13 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2c1cabc3-a6aa-4ab3-a0f5-1859d6316dcd

📥 Commits

Reviewing files that changed from the base of the PR and between f075610 and b52a688.

📒 Files selected for processing (4)
  • cmd/elastickv-snapshot-encode/main.go
  • cmd/elastickv-snapshot-encode/main_test.go
  • internal/backup/encode_snapshot.go
  • internal/backup/encode_snapshot_test.go
📝 Walkthrough

Walkthrough

This PR adds a snapshot encoding pipeline: a new elastickv-snapshot-encode CLI, a library EncodeSnapshot with optional on-disk self-test and deterministic structural diffing, ENCODE_INFO.json sidecar schema and helpers, atomic temp→final publish with randomized temp names, and a manifest Exclusions flag rename_s3_collisions that is emitted by the decode CLI.

Changes

Snapshot Encoding Pipeline

Layer / File(s) Summary
Manifest exclusions and decode integration
internal/backup/manifest.go, cmd/elastickv-snapshot-decode/main.go
Manifest schema gains optional RenameS3Collisions flag with omitempty JSON tagging and zero-value backward compatibility; snapshot-decode CLI stamps this flag into emitted manifests.
ENCODE_INFO.json schema and versioning
internal/backup/encode_info.go, internal/backup/encode_info_test.go
On-disk ENCODE_INFO.json contract with fixed FormatVersion, typed version-gate error, and structures for encoder metadata, timestamps, adapter list, and self-test results. Implements read/write with forward-compatibility (tolerates unknown fields) and strict format-version rejection. Tests verify round-trip, unknown-version rejection, and legacy manifest omission of the new flag.
EncodeSnapshot library with optional self-test
internal/backup/encode_snapshot.go, internal/backup/encode_snapshot_test.go
Core library that validates inputs, dispatches enabled adapters in canonical order, streams FSM output with SHA-256. When self-test enabled: writes FSM to temp file, optionally corrupts via test hook, decodes into scratch directory, and performs bounded structural diff using deterministic sorted comparison and streaming file equality without loading entire blobs into memory. Returns mismatch text (not an error) on structural differences; skips output copy if self-test fails. Tests cover library round-trip, self-test correctness, corruption detection, input validation, and sidecar path derivation.
elastickv-snapshot-encode CLI with atomic publishing
cmd/elastickv-snapshot-encode/main.go, cmd/elastickv-snapshot-encode/main_test.go
Complete CLI that parses required --input/--output, optional --adapter (CSV with all shortcut), --last-commit-ts (decimal or 0x hex), --self-test, and --scratch-root. Reads manifest, validates timestamp override under "HLC ceiling rises only" rule, invokes EncodeSnapshot, writes to randomized temp file, optionally writes <output>.mismatch.txt on self-test failure, atomically renames temp→final, writes <output>.encode_info.json sidecar, and maps self-test/regression errors to distinct exit codes. Comprehensive tests verify error paths, sidecar naming and collisions, end-to-end round-trip with self-test, atomic publish discipline, and strict flag parsing.

Sequence Diagram(s)

sequenceDiagram
  participant Input as Input Root
  participant Encode as EncodeSnapshot
  participant Adapter as Adapter Encoders
  participant Temp as Temp FSM File
  participant SelfTest as Self-Test (optional)
  participant Decode as Decode to Scratch
  participant Diff as Structural Diff
  participant Output as Output Writer

  Input->>Encode: EncodeOptions{InputRoot, Adapters, SelfTest}
  Encode->>Adapter: Run enabled adapters in order
  Adapter->>Temp: Write FSM bytes to temp
  alt Self-Test Enabled
    Temp->>SelfTest: Fsync and optionally corrupt buffer
    SelfTest->>Decode: Rewind and decode FSM
    Decode->>Diff: Compare adapter subdirs (streamed)
    Diff-->>SelfTest: Mismatch text or empty
    alt Mismatch Found
      SelfTest-->>Encode: SelfTestMismatchTxt (non-error)
      Encode-->>Output: Write nothing, return mismatch
    else No Mismatch
      SelfTest-->>Encode: SelfTestMatched=true
      Encode->>Output: Copy temp to output writer
    end
  else Self-Test Disabled
    Temp->>Output: Stream FSM directly with SHA-256
  end
  Output-->>Encode: BytesWritten, SHA256, SelfTestRan, SelfTestMatched
Loading
sequenceDiagram
  participant User
  participant CLI as snapshot-encode CLI
  participant Input as Input Dir (MANIFEST)
  participant Encode as EncodeSnapshot
  participant Temp as Temp FSM File
  participant Output as Final FSM Path
  participant Sidecar as ENCODE_INFO.json

  User->>CLI: --input INPUT --output OUTPUT [--adapter NAMES] [--last-commit-ts TS] [--self-test]
  CLI->>Input: Read and decode MANIFEST.json
  Input-->>CLI: Manifest + Exclusions
  CLI->>CLI: Validate --last-commit-ts (reject if lower than manifest)
  CLI->>CLI: Parse --adapter CSV to AdapterSet
  CLI->>Encode: EncodeOptions{InputRoot, Adapters, LastCommitTS, SelfTest}
  Encode-->>CLI: EncodeResult{BytesWritten, SHA256, SelfTestRan, SelfTestMatched, SelfTestMismatchTxt}
  alt Self-Test Mismatch
    CLI->>Output: Write .mismatch.txt with mismatch text
    CLI->>CLI: Return ErrSelfTestMismatch (data-error exit)
  else Self-Test Pass or Disabled
    CLI->>Temp: Write EncodeResult bytes to random temp file
    CLI->>Output: Fsync, close, atomic rename temp→OUTPUT.fsm
    CLI->>Sidecar: WriteEncodeInfo(OUTPUT.fsm.encode_info.json)
    Sidecar-->>CLI: EncodeInfo{FormatVersion, EncoderVersion, Timestamps, SHA256, Adapters, SelfTest}
    CLI->>User: Success with completion log
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • bootjp/elastickv#810: Modifies snapshot-decode manifest emission to include related exclusions metadata (rename-collision flag) that this PR emits.
  • bootjp/elastickv#716: Implements S3 encoder rename-collision behavior driven by the rename_s3_collisions manifest flag added here.

Poem

🐰 I hopped through manifests, flags in tow,

I baked .fsm temp files where safe rivers flow,
Self-tests I ran, to sniff out each glitch,
Sidecars I left with a hash and a stitch,
Atomically published — hop! — to the final show.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically identifies the main change: implementation of the M6 encoder (Phase 0b) including both the CLI command and library components, which aligns with the substantial additions of cmd/elastickv-snapshot-encode, EncodeSnapshot library, EncodeInfo schema, and related infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 85.92% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backup/m6-cli-design

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 1, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @bootjp's task in 2m 59s —— View job


Code Review - PR #904

Tasks

  • Gather context and understand the request
  • Read changed files
  • Review library code (internal/backup/)
  • Review CLI code (cmd/elastickv-snapshot-encode/)
  • Review tests
  • Post final review

Summary

This is a solid, well-structured implementation. The atomic publish discipline, self-test detection, and HLC ceiling fail-closed logic are all correctly implemented. A few issues worth addressing before merge:


Issues

1. fmt.Sscanf for hex parsing doesn't consume the entire input — partial parse silently succeeds

cmd/elastickv-snapshot-encode/main.go:156

if _, err := fmt.Sscanf(s[2:], "%x", &ts); err != nil {

fmt.Sscanf stops at the first non-hex character and returns n=1, err=nil — so "0xffZZ" parses as 0xff with no error. This means a user typo like --last-commit-ts 0x1234abc_garbage would silently accept a wrong value (and pass the >= manifest check if it happens to be large enough).

Replace with strconv.ParseUint(s[2:], 16, 64) which rejects partial parses. The same fix applies to the decimal branch: fmt.Sscanf(s, "%d", &ts) with a positive decimal will stop before a sign or non-digit; strconv.ParseUint(s, 10, 64) is stricter.

Fix this →


2. Stale comment: EncodeResult.SHA256 says "via SHA256Hex" but no such function exists

internal/backup/encode_snapshot.go:85

// SHA256 of the produced .fsm (lowercase hex via SHA256Hex).
SHA256 [32]byte

The field is [32]byte, not a string. The hex encoding happens later at the CLI level via hex.EncodeToString(result.SHA256[:]) in writeSidecar. The reference to SHA256Hex is a phantom; there is no such function in the package. The comment should read "raw SHA-256 digest; hex-encode with hex.EncodeToString(r.SHA256[:]) for display."

Fix this →


3. uitoa / itoa reinvent strconv — misleading for future readers

internal/backup/encode_snapshot.go:402-422

The hand-rolled decimal converters are correct but strconv.FormatUint and strconv.Itoa do exactly the same thing and are already linked into every Go binary. The only benefit here is avoiding an import, but strconv is already transitively present. Keeping custom implementations means a future reader may not recognise them as stdlib equivalents and will spend time auditing them.

Minor, but worth dropping.


4. diffOneSubdir mismatch output is non-deterministic for "bytes differ" entries

internal/backup/encode_snapshot.go:328-350

The loop over bFiles (a map) produces "bytes differ" entries in map-iteration order, which is randomised per-run. The "missing in input" entries also come from map iteration. Only "missing in scratch" entries (the remaining slice) are sorted.

Consequence: mismatch.txt can look different across two runs with the same inputs, making it harder to compare mismatch reports or write tests that check the txt content. Sorting diffs before returning (or collecting into a slice and sorting) would make this deterministic.

Fix this →


5. EncodeSnapshot returns (result, nil) on self-test mismatch — callers must check two things

internal/backup/encode_snapshot.go:181-183

if !matched {
    return result, nil  // no error, but SelfTestMatched = false
}

When SelfTest=true and the self-test fails, EncodeSnapshot returns (result, nil) — a nil error. Callers who only check err != nil will silently proceed as if the encode succeeded. The CLI correctly checks result.SelfTestMatched, but any future library caller who patterns-matches just on the error will miss the failure.

Returning a typed error (e.g., errSelfTestMismatch promoted from the CLI into the library) on mismatch, or at minimum documenting the contract prominently on EncodeSnapshot's godoc, would make this safer. The current behaviour is already documented in EncodeResult.SelfTestMatched, but the "no error on mismatch" choice is surprising enough to warrant a sentence in the EncodeSnapshot signature comment.


Observations (no fix required, but worth knowing)

6. writeSidecar is not atomic with respect to the .fsm rename

If the process is killed after os.Rename(tempPath, cfg.outputPath) but before writeSidecar completes, the published .fsm will exist without a sidecar. This is probably fine — the .fsm itself is the critical restore artifact and the sidecar is operator-metadata — but it means restore tooling should tolerate a missing sidecar gracefully. Worth a note in the design doc's "failure modes" section if it isn't already there.

7. walkRegularFiles slurps entire file contents into memory

internal/backup/encode_snapshot.go:359-392

For the self-test diff, every regular file in every enabled adapter subtree is read fully into a map. The comment already acknowledges this is bounded by test fixtures and defers streaming compare. Fine for now, but worth flagging if the first production-scale encode triggers an OOM.

8. var _ = strconv.FormatUint in main_test.go

cmd/elastickv-snapshot-encode/main_test.go:362

This is an unusual pattern to keep an import alive. If strconv has no active use in the test file, the import and this line can be removed. If it's an intentional placeholder for future tests, a short comment explaining why would help.

9. sha256w = hashSHA256 type alias chain

internal/backup/encode_snapshot.go:432-438

The alias sha256w = hashSHA256 (where hashSHA256 is an interface) is confusing — two names for the same type, neither of which is hash.Hash. The comment says it avoids importing hash, but crypto/sha256 already transitively imports hash, so the import would add no size cost. Using hash.Hash directly would be clearer.


Strengths

  • Write-then-rename atomic publish is correctly implemented and well-tested (TestCLISelfTestFailureLeavesNoFsmAtOutputPath, TestEncodeSnapshotSelfTestDetectsCorruption).
  • corruptBufferForTest hook cleanly separates the test injection path from production code without build tags.
  • exclusionsRequiredFields intentionally excludes rename_s3_collisions, with a clear comment explaining why — good forward-compat discipline.
  • EncodeInfoSidecarPath path-derivation (not a static name) correctly avoids sibling collisions; both the library and CLI tests pin this.
  • HLC ceiling fail-closed (ErrSelfTestLowerLastCommitTS) is a typed error that maps cleanly to exit code 2 — matches the decoder's exit-code contract.
  • buildSelfTestDecodeOptions threads Exclusions.* and DynamoDBBundleJSONL through from the manifest — ensures the self-test's scratch tree uses the same decode flags as the original run.
    | Branch

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the elastickv-snapshot-encode command-line tool and library, which encodes vendor-independent directory trees into native .fsm files, generates an encode_info.json sidecar, and supports an optional round-trip self-test. The review feedback highlights several critical memory-efficiency improvements to prevent Out Of Memory (OOM) crashes on large production snapshots. Key recommendations include writing the .fsm to a temporary file on disk instead of buffering it in memory during the self-test, streaming the file comparison chunk-by-chunk rather than reading all file bytes into memory, and increasing the entropy of the temporary file suffix to avoid collisions.

Comment on lines +157 to +188
func encodeBuffered(b *snapshotBuilder, opts EncodeOptions, enabled []string, out io.Writer) (EncodeResult, error) {
var buf bytes.Buffer
bytesWritten, err := b.WriteTo(&buf)
if err != nil {
return EncodeResult{}, errors.WithStack(err)
}
bufBytes := buf.Bytes()
if opts.corruptBufferForTest != nil {
opts.corruptBufferForTest(bufBytes)
}

header, mismatchTxt, matched, stErr := runSelfTest(bufBytes, opts)
result := EncodeResult{
Header: header,
BytesWritten: bytesWritten,
SHA256: sha256.Sum256(bufBytes),
SelfTestRan: true,
SelfTestMatched: matched,
SelfTestMismatchTxt: mismatchTxt,
AdaptersEnabled: enabled,
}
if stErr != nil {
return result, stErr
}
if !matched {
return result, nil
}
if _, err := io.Copy(out, bytes.NewReader(bufBytes)); err != nil {
return result, errors.Wrap(err, "copy buffered fsm to out")
}
return result, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Buffering the entire .fsm file in memory (bytes.Buffer) during the self-test can lead to Out Of Memory (OOM) crashes on large production snapshots. Instead, write the .fsm directly to a temporary file on disk, compute the SHA256 hash in a single pass using the existing sha256Writer, and stream-read from the temporary file for the self-test and copying.

func encodeBuffered(b *snapshotBuilder, opts EncodeOptions, enabled []string, out io.Writer) (EncodeResult, error) {
	tempFile, err := os.CreateTemp(opts.SelfTestDecodeOptions.OutRoot, "encode-buffer-")
	if err != nil {
		return EncodeResult{}, errors.WithStack(err)
	}
	defer func() {
		_ = tempFile.Close()
		_ = os.Remove(tempFile.Name())
	}()

	hashWriter := newSHA256Writer(tempFile)
	bytesWritten, err := b.WriteTo(hashWriter)
	if err != nil {
		return EncodeResult{}, errors.WithStack(err)
	}

	if opts.corruptBufferForTest != nil {
		if _, err := tempFile.Seek(0, io.SeekStart); err != nil {
			return EncodeResult{}, errors.WithStack(err)
		}
		bufBytes, err := io.ReadAll(tempFile)
		if err != nil {
			return EncodeResult{}, errors.WithStack(err)
		}
		opts.corruptBufferForTest(bufBytes)
		if _, err := tempFile.WriteAt(bufBytes, 0); err != nil {
			return EncodeResult{}, errors.WithStack(err)
		}
	}

	header, mismatchTxt, matched, stErr := runSelfTest(tempFile, opts)
	result := EncodeResult{
		Header:              header,
		BytesWritten:        bytesWritten,
		SHA256:              hashWriter.Sum(),
		SelfTestRan:         true,
		SelfTestMatched:     matched,
		SelfTestMismatchTxt: mismatchTxt,
		AdaptersEnabled:     enabled,
	}
	if stErr != nil {
		return result, stErr
	}
	if !matched {
		return result, nil
	}
	if _, err := tempFile.Seek(0, io.SeekStart); err != nil {
		return result, errors.Wrap(err, "seek temp file for copy")
	}
	if _, err := io.Copy(out, tempFile); err != nil {
		return result, errors.Wrap(err, "copy buffered fsm to out")
	}
	return result, nil
}

Comment thread internal/backup/encode_snapshot.go Outdated
Comment on lines +240 to +275
func runSelfTest(fsmBytes []byte, opts EncodeOptions) (SnapshotHeader, []byte, bool, error) {
scratchBase := opts.SelfTestDecodeOptions.OutRoot
scratchDir, err := os.MkdirTemp(scratchBase, "encode-self-test-")
if err != nil {
return SnapshotHeader{}, nil, false, errors.Wrap(err, "mkdir scratch")
}
defer func() {
_ = os.RemoveAll(scratchDir)
}()

decOpts := opts.SelfTestDecodeOptions
decOpts.OutRoot = scratchDir

result, derr := DecodeSnapshot(bytes.NewReader(fsmBytes), decOpts)
if derr != nil {
// Decoder errored on our own output — that IS a self-test
// failure (the .fsm we produced isn't loadable). Surface as
// a mismatch with the decoder error embedded in the txt.
mismatchTxt := []byte("self-test failed: DecodeSnapshot rejected the produced .fsm: " + derr.Error())
return SnapshotHeader{}, mismatchTxt, false, nil
}

if result.Header.LastCommitTS != opts.LastCommitTS {
mismatchTxt := []byte(formatHeaderMismatch(opts.LastCommitTS, result.Header.LastCommitTS))
return result.Header, mismatchTxt, false, nil
}

diff, derr := diffAdapterTrees(opts.InputRoot, scratchDir, opts.Adapters)
if derr != nil {
return result.Header, nil, false, errors.Wrap(derr, "diff scratch tree")
}
if len(diff) > 0 {
return result.Header, []byte(strings.Join(diff, "\n") + "\n"), false, nil
}
return result.Header, nil, true, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update runSelfTest to accept an *os.File instead of a byte slice to support streaming the .fsm from disk during the self-test, avoiding loading the entire snapshot into memory.

func runSelfTest(fsmFile *os.File, opts EncodeOptions) (SnapshotHeader, []byte, bool, error) {
	scratchBase := opts.SelfTestDecodeOptions.OutRoot
	scratchDir, err := os.MkdirTemp(scratchBase, "encode-self-test-")
	if err != nil {
		return SnapshotHeader{}, nil, false, errors.Wrap(err, "mkdir scratch")
	}
	defer func() {
		_ = os.RemoveAll(scratchDir)
	}()

	decOpts := opts.SelfTestDecodeOptions
	decOpts.OutRoot = scratchDir

	if _, err := fsmFile.Seek(0, io.SeekStart); err != nil {
		return SnapshotHeader{}, nil, false, errors.Wrap(err, "seek temp file")
	}

	result, derr := DecodeSnapshot(fsmFile, decOpts)
	if derr != nil {
		// Decoder errored on our own output — that IS a self-test
		// failure (the .fsm we produced isn't loadable). Surface as
		// a mismatch with the decoder error embedded in the txt.
		mismatchTxt := []byte("self-test failed: DecodeSnapshot rejected the produced .fsm: " + derr.Error())
		return SnapshotHeader{}, mismatchTxt, false, nil
	}

	if result.Header.LastCommitTS != opts.LastCommitTS {
		mismatchTxt := []byte(formatHeaderMismatch(opts.LastCommitTS, result.Header.LastCommitTS))
		return result.Header, mismatchTxt, false, nil
	}

	diff, derr := diffAdapterTrees(opts.InputRoot, scratchDir, opts.Adapters)
	if derr != nil {
		return result.Header, nil, false, errors.Wrap(derr, "diff scratch tree")
	}
	if len(diff) > 0 {
		return result.Header, []byte(strings.Join(diff, "\n") + "\n"), false, nil
	}
	return result.Header, nil, true, nil
}

Comment on lines +318 to +392
func diffOneSubdir(aDir, bDir, relPrefix string) ([]string, error) {
aFiles, aErr := walkRegularFiles(aDir)
if aErr != nil && !errors.Is(aErr, os.ErrNotExist) {
return nil, errors.Wrapf(aErr, "walk input %s", aDir)
}
bFiles, bErr := walkRegularFiles(bDir)
if bErr != nil && !errors.Is(bErr, os.ErrNotExist) {
return nil, errors.Wrapf(bErr, "walk scratch %s", bDir)
}

var diffs []string
aMap := map[string][]byte{}
for path, body := range aFiles {
aMap[path] = body
}
for path, bBody := range bFiles {
aBody, ok := aMap[path]
if !ok {
diffs = append(diffs, relPrefix+"/"+path+" (missing in input)")
continue
}
if !bytes.Equal(aBody, bBody) {
diffs = append(diffs, relPrefix+"/"+path+" (bytes differ)")
}
delete(aMap, path)
}
// Anything remaining in aMap is present in input but not in scratch.
remaining := make([]string, 0, len(aMap))
for path := range aMap {
remaining = append(remaining, relPrefix+"/"+path+" (missing in scratch)")
}
sort.Strings(remaining)
diffs = append(diffs, remaining...)
return diffs, nil
}

// walkRegularFiles returns a map of relative path -> file bytes for every
// regular file under root. Missing root is the empty map + os.ErrNotExist.
// Bounded by the per-adapter test fixtures the encoder runs against;
// production-scale dumps may want streaming compare, deferred until a
// real bottleneck shows up.
func walkRegularFiles(root string) (map[string][]byte, error) {
out := map[string][]byte{}
rootInfo, err := os.Stat(root)
if err != nil {
return nil, errors.WithStack(err)
}
if !rootInfo.IsDir() {
return nil, errors.Errorf("not a directory: %s", root)
}
if err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
if err != nil {
return err
}
if d.IsDir() {
return nil
}
if !d.Type().IsRegular() {
return nil
}
body, rerr := os.ReadFile(path) //nolint:gosec // walking a caller-provided root, regular files only
if rerr != nil {
return errors.Wrap(rerr, path)
}
rel, rerr := filepath.Rel(root, path)
if rerr != nil {
return errors.WithStack(rerr)
}
out[filepath.ToSlash(rel)] = body
return nil
}); err != nil {
return nil, errors.WithStack(err)
}
return out, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Loading the entire contents of all files in both the input and scratch directories into memory (walkRegularFiles reading all file bytes via os.ReadFile) will cause OOM crashes on large backups (e.g., with large S3 blobs). Instead, walk the directories to collect file sizes, and perform a streaming chunk-by-chunk comparison for files present in both directories.

func diffOneSubdir(aDir, bDir, relPrefix string) ([]string, error) {
	aFiles, aErr := walkRegularFiles(aDir)
	if aErr != nil && !errors.Is(aErr, os.ErrNotExist) {
		return nil, errors.Wrapf(aErr, "walk input %s", aDir)
	}
	bFiles, bErr := walkRegularFiles(bDir)
	if bErr != nil && !errors.Is(bErr, os.ErrNotExist) {
		return nil, errors.Wrapf(bErr, "walk scratch %s", bDir)
	}

	var diffs []string
	for path, aSize := range aFiles {
		bSize, ok := bFiles[path]
		if !ok {
			diffs = append(diffs, relPrefix+"/"+path+" (missing in scratch)")
			continue
		}
		if aSize != bSize {
			diffs = append(diffs, relPrefix+"/"+path+" (bytes differ)")
			continue
		}
		equal, err := compareFiles(filepath.Join(aDir, path), filepath.Join(bDir, path))
		if err != nil {
			return nil, err
		}
		if !equal {
			diffs = append(diffs, relPrefix+"/"+path+" (bytes differ)")
		}
	}
	for path := range bFiles {
		if _, ok := aFiles[path]; !ok {
			diffs = append(diffs, relPrefix+"/"+path+" (missing in input)")
		}
	}
	sort.Strings(diffs)
	return diffs, nil
}

func walkRegularFiles(root string) (map[string]int64, error) {
	out := map[string]int64{}
	rootInfo, err := os.Stat(root)
	if err != nil {
		return nil, errors.WithStack(err)
	}
	if !rootInfo.IsDir() {
		return nil, errors.Errorf("not a directory: %s", root)
	}
	if err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() {
			return nil
		}
		if !d.Type().IsRegular() {
			return nil
		}
		info, rerr := d.Info()
		if rerr != nil {
			return errors.Wrap(rerr, path)
		}
		rel, rerr := filepath.Rel(root, path)
		if rerr != nil {
			return errors.WithStack(rerr)
		}
		out[filepath.ToSlash(rel)] = info.Size()
		return nil
	}); err != nil {
		return nil, errors.WithStack(err)
	}
	return out, nil
}

func compareFiles(pathA, pathB string) (bool, error) {
	fA, err := os.Open(pathA)
	if err != nil {
		return false, errors.WithStack(err)
	}
	defer fA.Close()

	fB, err := os.Open(pathB)
	if err != nil {
		return false, errors.WithStack(err)
	}
	defer fB.Close()

	bufA := make([]byte, 64*1024)
	bufB := make([]byte, 64*1024)
	for {
		nA, errA := io.ReadFull(fA, bufA)
		nB, errB := io.ReadFull(fB, bufB)

		if errA == io.EOF || errA == io.ErrUnexpectedEOF {
			errA = nil
		}
		if errB == io.EOF || errB == io.ErrUnexpectedEOF {
			errB = nil
		}

		if errA != nil {
			return false, errors.WithStack(errA)
		}
		if errB != nil {
			return false, errors.WithStack(errB)
		}

		if nA != nB {
			return false, nil
		}
		if nA == 0 {
			break
		}
		if !bytes.Equal(bufA[:nA], bufB[:nB]) {
			return false, nil
		}
	}
	return true, nil
}

Comment on lines +47 to +52
// tempSuffixHexLen is the hex-character length of the random
// suffix appended to <output>.tmp-<hex>; 8 hex chars = 4 bytes of
// entropy, which gives ~4×10^9 collision space per --output path
// (more than enough for concurrent encodes against the same path).
tempSuffixHexLen = 8
tempSuffixByteLen = tempSuffixHexLen / 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using only 4 bytes of entropy (8 hex characters) for the temporary file suffix has a relatively high probability of collision (due to the Birthday Paradox) in highly concurrent environments or automated test pipelines. Increasing the suffix length to 16 hex characters (8 bytes of entropy) provides a much safer collision-free space.

Suggested change
// tempSuffixHexLen is the hex-character length of the random
// suffix appended to <output>.tmp-<hex>; 8 hex chars = 4 bytes of
// entropy, which gives ~4×10^9 collision space per --output path
// (more than enough for concurrent encodes against the same path).
tempSuffixHexLen = 8
tempSuffixByteLen = tempSuffixHexLen / 2
// tempSuffixHexLen is the hex-character length of the random
// suffix appended to <output>.tmp-<hex>; 16 hex chars = 8 bytes of
// entropy, which gives a collision-free space per --output path.
tempSuffixHexLen = 16
tempSuffixByteLen = tempSuffixHexLen / 2

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1fa691fb74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return backup.AdapterSet{}, errors.Errorf("unknown adapter %q", name)
}
}
return set, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject empty adapter selections

When --adapter contains only separators or whitespace (for example --adapter ' ,', or a templated value that expands to spaces), every token hits the empty case and this returns a zero AdapterSet with no error. The rest of the encoder then invokes no adapter encoders, publishes a valid header-only .fsm, and exits successfully, which can turn a bad CLI argument into an apparently successful but empty restore artifact; the decoder-side parser already rejects this zero-adapter case.

Useful? React with 👍 / 👎.

Comment thread cmd/elastickv-snapshot-encode/main.go Outdated
Comment on lines +161 to +162
if _, err := fmt.Sscanf(s, "%d", &ts); err != nil {
return 0, errors.Wrap(err, "--last-commit-ts decimal parse")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject trailing junk in timestamps

For decimal --last-commit-ts values, fmt.Sscanf with %d accepts a numeric prefix and ignores trailing bytes, so an input like --last-commit-ts 100oops is silently treated as 100. Because this flag controls the restored HLC ceiling and sidecar metadata, malformed operator input should fail instead of publishing a snapshot with a different timestamp than the argument appears to specify.

Useful? React with 👍 / 👎.

Six substantive review findings on the M6 implementation PR. Headline
fix: switched the self-test buffer from in-memory bytes.Buffer to a
disk-backed temp file so large snapshots cannot OOM the encoder.

## Gemini high #1: SelfTest=true was buffering FSM in memory

Out Of Memory risk on production-scale snapshots (gemini high #904).
Fixed by writing the FSM through a tee (sha256 + os.File.WriteAt) into
an os.CreateTemp file under opts.SelfTestDecodeOptions.OutRoot, then
Seek(0) and stream through DecodeSnapshot for the self-test. The temp
file is os.Remove'd via defer on every exit path. The contract is
unchanged: corruption reaches the self-test decode but never reaches
the io.Writer (TestEncodeSnapshotSelfTestDetectsCorruption still asserts
out.Len()==0).

Memory cost: O(1) for the SelfTest=true path now (previously was an
FSM-sized bytes.Buffer on top of the sort working set).

## Gemini high #2: runSelfTest signature

Changed from runSelfTest(fsmBytes []byte, ...) to
runSelfTest(fsmFile io.Reader, ...) so the .fsm streams through
DecodeSnapshot instead of being held as []byte. Same OOM fix at the
runSelfTest layer.

## Gemini high #3: walkRegularFiles loaded every file's bytes into memory

For large S3 blobs this OOMs the encoder. Replaced walkRegularFiles
with walkRegularFilePaths (paths only) and streamFilesEqual (chunk
compare at streamCmpBufSize = 64 KiB). Per-file compare cost is now
O(streamCmpBufSize), not O(file size).

## Gemini medium: tempSuffixHexLen entropy was 4 bytes

Birthday-paradox collision risk in highly concurrent CI environments.
Bumped from 8 hex / 4 bytes (2^32 space) to 16 hex / 8 bytes (2^64
space). Same width crypto/rand uses for cryptographic nonces.

## Claude high + codex P2: fmt.Sscanf silently accepts partial parses

--last-commit-ts 0xffZZ parsed as 0xff (fmt.Sscanf stops at first
non-hex). --last-commit-ts 100oops parsed as 100. This means a
typo could silently set the snapshot HLC ceiling to something the
operator did not type. Replaced fmt.Sscanf with strconv.ParseUint
(base 10 / 16) which rejects trailing garbage as a parse error.
New TestParseLastCommitTS cases cover: 0xffZZ, 100oops, " 100 ext",
"-1".

## Codex P2: empty adapter selection produced an empty AdapterSet

--adapter ' ,' (e.g. a templated argument that expanded to spaces)
would yield a zero AdapterSet and the encoder would publish a valid
header-only .fsm with no adapter records — a silently-empty restore
artifact. Now rejected with a clear error. Pinned by
TestParseAdapterSetRejectsEmptySelection covering " ,", ",,,",
"   ", and "," — plus a positive case (single "s3") asserting the
guard does not break legitimate single-adapter selections.

## Caller audit per CLAUDE.md semantic-change rule

- runSelfTest signature changed from ([]byte, opts) to (io.Reader, opts).
  Sole caller is encodeBuffered (same package, just updated). No
  external callers since the function is unexported.
- corruptBufferForTest hook type changed from func([]byte) to
  func(*os.File). Sole caller is the same-package test
  TestEncodeSnapshotSelfTestDetectsCorruption (updated to ReadAt/
  WriteAt the temp file). External callers cannot set this field
  (lowercase).
- streamFilesEqual replaces the bytes.Equal compare in diffOneSubdir.
  Same return contract (bool, error) so the caller (diffOneSubdir)
  is unchanged in behavior.
- parseAdapterSet now returns an error for empty-selection inputs.
  Caller is parseFlags which already propagates the error to exit-1.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 1, 2026

v2 (b131186): addresses all 6 review findings on the implementation PR.

Gemini high #1 (OOM on large snapshots): Self-test buffer is now an on-disk os.CreateTemp file (NOT *bytes.Buffer). FSM streams through tee → sha256 + temp file; self-test decodes from the temp file directly. Memory cost dropped from O(FSM size) to O(1). Pinned by existing TestEncodeSnapshotSelfTestDetectsCorruption (corrupt-via-WriteAt, never reaches out).

Gemini high #2 (runSelfTest signature): Changed from fsmBytes []byte to fsmFile io.Reader, completing the streaming chain.

Gemini high #3 (walkRegularFiles OOM): Replaced bytes-into-memory with walkRegularFilePaths + streamFilesEqual (64 KiB chunk compare). Per-file cost is now O(chunk size), not O(file size).

Gemini medium (tempSuffixHexLen entropy): Bumped from 8 hex / 4 bytes (2^32) to 16 hex / 8 bytes (2^64), same width crypto/rand uses for cryptographic nonces.

Claude high + codex P2 (fmt.Sscanf partial parses): Replaced with strconv.ParseUint strict parsing. Trailing garbage like 0xffZZ or 100oops now surfaces as a flag-parse error. TestParseLastCommitTS extended to cover both.

Codex P2 (empty adapter selection): --adapter ' ,' etc. now rejected. Pinned by TestParseAdapterSetRejectsEmptySelection. Without the guard a templated arg expanding to spaces would silently publish a header-only .fsm.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @bootjp's task in 3m 41s —— View job


Code Review - PR #904 (v2)

Tasks

  • Gather context and understand the request
  • Read changed files (v2 commit b131186)
  • Review library code (internal/backup/)
  • Review CLI code (cmd/elastickv-snapshot-encode/)
  • Review tests
  • Post final review

v2 Verification — All 6 Required Fixes Confirmed

Finding Status
Gemini high #1: OOM — self-test buffer in *bytes.Buffer encodeBuffered now uses os.CreateTemp, tees through teeWriter to hash+file, self-test streams from *os.File
Gemini high #2: runSelfTest(fsmBytes []byte) → streaming ✅ Signature is now runSelfTest(fsmFile io.Reader, opts EncodeOptions)
Gemini high #3: walkRegularFiles loading all bytes into memory walkRegularFilePaths returns path→abspath map; streamFilesEqual does 64 KiB chunk compare via streamReadersEqual
Gemini medium: tempSuffixHexLen low entropy (4 bytes) ✅ Bumped to 16 hex chars / 8 bytes (2^64 space)
Claude high + codex P2: fmt.Sscanf partial parses parseLastCommitTS uses strconv.ParseUint with strict rejection; TestParseLastCommitTS covers "0xffZZ" and "100oops"
Codex P2: empty adapter selection silently publishes header-only .fsm parseAdapterSet rejects zero-adapter result; TestParseAdapterSetRejectsEmptySelection covers " ,", ",,,"

One New Issue Introduced by v2

Stale doc comments still say *bytes.Buffer after the disk-backed rewrite

internal/backup/encode_snapshot.go:52-57 (the SelfTest field comment):

// SelfTest enables the round-trip self-test. EncodeSnapshot buffers
// the FSM in *bytes.Buffer, decodes from the buffer, and copies to
// the caller's io.Writer only if the buffer survives DecodeSnapshot

internal/backup/encode_snapshot.go:113-116 (the EncodeSnapshot function doc):

// When opts.SelfTest=true an internal *bytes.Buffer holds the
// FSM during the self-test; bytes are copied to out only after the
// self-test matches. Memory cost in self-test mode is one FSM-sized
// allocation on top of the existing sort working set.

Both still describe the v1 in-memory approach. The actual v2 implementation writes to an os.CreateTemp file (encode-self-test-fsm-*) and the memory cost is O(1) (not O(FSM size)). A future reader looking at these comments to understand the memory model will get the wrong answer.

Fix this →


Carry-over Observations (non-blocking, noted in v1)

These were marked "no fix required" in the v1 review and remain unchanged:

  • EncodeResult.SHA256 comment (encode_snapshot.go:91) still says "lowercase hex via SHA256Hex" — no such function exists; the hex encoding happens in the CLI's writeSidecar.
  • diffOneSubdir non-determinism — the diffs slice for "bytes differ" and "missing in input" entries is built in map-iteration order; only remaining is sorted. mismatch.txt output can vary across runs with identical inputs.
  • EncodeSnapshot returns (result, nil) on self-test mismatch — callers must check result.SelfTestMatched AND err != nil; callers who only check the error will silently pass through a mismatch. The godoc for EncodeSnapshot doesn't call out this dual-check contract.
  • sha256w = hashSHA256 alias chain — two interface names for the same narrow surface; hash.Hash (already transitively imported) would be clearer in sha256Writer.h.
  • uitoa/itoa hand-rolledstrconv.FormatUint/strconv.Itoa are exact equivalents.

Strengths in v2

The disk-backed self-test path is cleanly implemented:

  • teeWriter correctly hashes bytes AS they stream to the temp file — EncodeResult.SHA256 reflects the encoder's actual output, not any test corruption.
  • The corruption hook fires AFTER Sync() but BEFORE the seek-to-0, so the self-test decoder always reads the (possibly corrupted) on-disk bytes — the write-then-rename invariant is correctly preserved.
  • streamReadersEqual correctly short-circuits on aSize != bSize before opening files, and the io.ReadFull loop handles the last-partial-chunk case (io.ErrUnexpectedEOF) before checking aErr != nil.
  • TestEncodeSnapshotSelfTestDetectsCorruption correctly asserts out.Len()==0 on mismatch — the critical publish-nothing invariant.
  • TestParseAdapterSetRejectsEmptySelection covers the " ,", ",", ",,," cases that would otherwise produce a zero AdapterSet.

Verdict: Ready to merge once the two stale *bytes.Buffer comments are updated. All 6 previously required fixes are correctly implemented and tested.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b13118638f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +130 to +131
b := newSnapshotBuilder(opts.LastCommitTS)
enabled, err := runAdapterEncoders(b, opts)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce the manifest timestamp floor in EncodeSnapshot

When EncodeSnapshot is called directly (the new library entrypoint the design exposes for future callers), a caller can pass LastCommitTS lower than InputRoot/MANIFEST.json:last_commit_ts; this path immediately builds with that value and never reads the manifest or returns the advertised ErrSelfTestLowerLastCommitTS. The CLI pre-check catches its own invocations, but in-process callers can still emit an .fsm whose header and MVCC key timestamps lower the restored HLC ceiling, violating the fail-closed invariant this wrapper is meant to enforce.

Useful? React with 👍 / 👎.

…over cleanups)

Mandatory (claude v2):
- Stale *bytes.Buffer comments in SelfTest field doc and EncodeSnapshot
  godoc updated to describe the v2 on-disk temp file model with O(1)
  memory cost (encode-self-test-fsm-* under SelfTestDecodeOptions.OutRoot).

Non-blocking carry-over items from v1 + v2 (folded in since trivial):
- EncodeResult.SHA256 comment no longer references a non-existent
  SHA256Hex function; says "raw 32-byte digest; CLI hex-encodes via
  encoding/hex in writeSidecar".
- EncodeSnapshot godoc now spells out the dual-check contract
  ("Callers MUST check result.SelfTestMatched before treating a nil
  error as success") since self-test mismatch returns (result, nil).
- diffOneSubdir output is now fully sorted (was: only the "remaining"
  segment sorted; map-iteration order leaked into the rest). mismatch.txt
  is deterministic across runs with identical inputs.
- Replaced hand-rolled uitoa/itoa with strconv.FormatUint and
  strconv.Itoa. Dropped the sha256w / hashSHA256 alias chain; the
  sha256Writer.h field now uses hash.Hash directly (already imported).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 1, 2026

v3 (b8da884): addresses claude v2 review.

Mandatory (the v2 blocker): Two stale *bytes.Buffer doc comments updated to describe the on-disk temp file model (encode-self-test-fsm-*) with O(1) memory cost.

Non-blocking carry-overs folded in since trivial:

  • EncodeResult.SHA256 comment no longer references non-existent SHA256Hex (now: "raw 32-byte digest; CLI hex-encodes in writeSidecar")
  • EncodeSnapshot godoc spells out the dual-check contract for self-test mismatch (returns (result, nil) — callers MUST check SelfTestMatched)
  • diffOneSubdir fully sorts the output (was: only "remaining" segment sorted); mismatch.txt is now deterministic
  • Hand-rolled uitoa/itoa replaced with strconv.FormatUint / strconv.Itoa
  • Dropped the sha256w / hashSHA256 alias chain; sha256Writer.h now uses hash.Hash directly

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @bootjp's task in 3m 33s —— View job


Code Review - PR #904 (v3)

Tasks

  • Gather context and understand the request
  • Read changed files (v3 commit b8da884)
  • Verify all v2 carry-over fixes are implemented correctly
  • Check for any new issues introduced in v3
  • Post final review

v3 Verification — All v2 Carry-overs Confirmed

Finding Status
Mandatory v2 fix: stale *bytes.Buffer doc in SelfTest field comment ✅ Now says "on-disk temp file under SelfTestDecodeOptions.OutRoot (encode-self-test-fsm-*)" and "O(1) memory cost"
Mandatory v2 fix: stale *bytes.Buffer doc in EncodeSnapshot func comment ✅ Now describes disk-based approach; explicitly credits gemini high #904 OOM fix
EncodeResult.SHA256 phantom SHA256Hex reference ✅ Now: "raw 32-byte digest; the CLI hex-encodes it via encoding/hex when writing ENCODE_INFO.json"
EncodeSnapshot dual-check contract undocumented ✅ Lines 127-131 now explicitly document: "Self-test failure returns (result, nil)… Callers MUST check result.SelfTestMatched before treating a nil error as success"
diffOneSubdir non-deterministic output sort.Strings(diffs) on line 423 now applies to ALL diff entries (missing-in-input + bytes-differ + missing-in-scratch)
Hand-rolled uitoa/itoa formatHeaderMismatch uses strconv.FormatUint; truncation message uses strconv.Itoa
sha256w/hashSHA256 alias chain ✅ Both sha256Writer.h (line 538) and teeWriter.h (line 244) now typed as hash.Hash directly; "hash" imported explicitly

One New Doc Bug in v3

ErrSelfTestLowerLastCommitTS and EncodeSnapshot both claim the HLC floor check lives in the library — it doesn't

internal/backup/encode_snapshot.go:17-25 (the var comment):

// ErrSelfTestLowerLastCommitTS is returned by EncodeSnapshot when the
// effective LastCommitTS in EncodeOptions is below the manifest's value.
// ...
// Surfaced at the EncodeSnapshot layer so the CLI's main exits with code 2

internal/backup/encode_snapshot.go:133-136 (the func comment):

// Returns ErrSelfTestLowerLastCommitTS when opts.LastCommitTS is below
// the manifest's value — caller is responsible for reading the manifest
// and computing the effective T (this layer just validates the floor).

Both are inaccurate. EncodeSnapshot never reads MANIFEST.json, never compares opts.LastCommitTS to the manifest value, and never returns ErrSelfTestLowerLastCommitTS. That check is performed exclusively by the CLI's resolveLastCommitTS before EncodeSnapshot is ever called:

// cmd/elastickv-snapshot-encode/main.go:349-354
func resolveLastCommitTS(cfg *config, manifestTS uint64) (uint64, bool, error) {
    if cfg.lastCommitTS < manifestTS {
        return 0, false, errors.Wrapf(backup.ErrSelfTestLowerLastCommitTS, ...)
    }
    ...
}

The functional behaviour is correct — the HLC floor IS enforced before any bytes are written. But a future library caller reading the EncodeSnapshot godoc would trust that calling EncodeSnapshot with a low LastCommitTS returns ErrSelfTestLowerLastCommitTS, and be surprised when it doesn't (the call silently succeeds and stamps the low timestamp into the .fsm).

The fix is straightforward: correct both comments to say the error is a sentinel for the caller to return after its own manifest comparison, not something EncodeSnapshot itself enforces.

Fix this →


Observations (non-blocking)

sha256Writer and teeWriter are structurally identical

Both are {w io.Writer; h hash.Hash} with identical Write bodies. teeWriter is used in encodeBuffered, sha256Writer in encodeStream. They could be unified into one type — sha256Writer already has the Sum() [32]byte convenience method; adding it would cover both paths. Not a blocker, but the duplication will grow if a third path is added.


Strengths in v3

  • diffOneSubdir's sort.Strings(diffs) at the end is now truly a full-fence sort: the loop first drains bPaths (collecting missing-in-input and bytes-differ entries), then drains remaining aPaths (collecting missing-in-scratch), then sorts the combined slice. Determinism is correct regardless of map-iteration order.
  • teeWriter.h and sha256Writer.h both correctly typed as hash.Hash, which removes the confusing sha256w/hashSHA256 indirection without adding any import cost ("hash" is already in-tree via crypto/sha256).
  • strconv.FormatUint / strconv.Itoa are drop-in replacements for the hand-rolled converters with no semantic change — the behaviour is identical and a reader no longer has to audit them.
  • The EncodeSnapshot dual-check warning ("library callers should follow the same pattern") is now in the right place: immediately after documenting the (result, nil) return on mismatch, not buried in a separate godoc block.

Verdict: One doc fix needed before merge. All functional behaviour is correct; all v2 findings are addressed. The doc bug (EncodeSnapshot + ErrSelfTestLowerLastCommitTS incorrectly attributed to the library layer) is the only remaining item.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b8da884f9d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/elastickv-snapshot-encode/main.go Outdated
// encodeToTempFile creates tempPath, runs EncodeSnapshot into it,
// fsync+close. Caller is responsible for the os.Remove cleanup on error.
func encodeToTempFile(tempPath string, encodeOpts backup.EncodeOptions) (backup.EncodeResult, error) {
tempFile, err := os.Create(tempPath) //nolint:gosec // operator-supplied path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Create snapshot artifacts owner-only

When the encoder runs with a typical umask like 022, os.Create creates the temporary FSM as mode 0666 masked to 0644, and that same file becomes the published <output> after the rename. Because the .fsm contains the full KV dataset, this exposes backup contents to other local users on multi-user backup hosts while the encode is running and after publish; the decoded adapter writers in internal/backup/open_nofollow_* use 0600 for this reason. Please create the temp output with an explicit owner-only mode before renaming it.

Useful? React with 👍 / 👎.

… unify writer types

Mandatory (claude v3):
ErrSelfTestLowerLastCommitTS var comment and EncodeSnapshot godoc both
claimed the HLC ceiling check is enforced in the library. It is NOT.
The check lives in the CLI's resolveLastCommitTS BEFORE EncodeSnapshot
is called. A future library caller (Phase 1 live extractor, integration
test) reading the godoc would assume EncodeSnapshot guards against a
low timestamp and be surprised when it silently stamps the low value
into the .fsm.

Both comments updated to say:
- The error is the sentinel callers should return after their OWN
  manifest comparison.
- EncodeSnapshot does NOT read MANIFEST.json or validate
  opts.LastCommitTS against any external floor.

Non-blocking observation (claude v3):
sha256Writer and teeWriter were structurally identical (both
{w io.Writer; h hash.Hash} with the same Write body). teeWriter
removed; encodeBuffered now uses newSHA256Writer for the temp-file
tee. One less type to audit.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 1, 2026

v4 (a357279): addresses claude v3.

Mandatory: ErrSelfTestLowerLastCommitTS var comment + EncodeSnapshot godoc claimed the HLC ceiling check is enforced in the library — it isn't (the check lives in the CLI's resolveLastCommitTS BEFORE EncodeSnapshot is called). Both docs corrected to say the error is the sentinel callers should return after their OWN manifest comparison, and EncodeSnapshot does NOT read MANIFEST.json or validate against any external floor.

Non-blocking observation: Unified the duplicated sha256Writer + teeWriter types. Both had identical {w io.Writer; h hash.Hash} structure and Write bodies. teeWriter removed; encodeBuffered now uses newSHA256Writer for the temp-file tee. One less type to audit.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @bootjp's task in 2m 35s —— View job


Code Review - PR #904 (v4)

Tasks

  • Gather context and understand the request
  • Read changed files (v4 commit a357279)
  • Verify mandatory v3 fix: ErrSelfTestLowerLastCommitTS + EncodeSnapshot doc corrections
  • Verify non-blocking fix: teeWriter removed, encodeBuffered uses newSHA256Writer
  • Check for new issues introduced in v4
  • Post final review

v4 Verification — All v3 Items Confirmed

Finding Status
Mandatory v3 fix: ErrSelfTestLowerLastCommitTS var comment ✅ Changed from "is returned by EncodeSnapshot when the effective LastCommitTS is below the manifest's value" to "is the sentinel callers should return when…"; added explicit statement that EncodeSnapshot does NOT read MANIFEST.json or enforce this floor
Mandatory v3 fix: EncodeSnapshot func comment ✅ Changed from "Returns ErrSelfTestLowerLastCommitTS when opts.LastCommitTS is below the manifest's value" to "EncodeSnapshot does NOT read MANIFEST.json and does NOT validate opts.LastCommitTS against any external floor"; correctly identifies resolveLastCommitTS (CLI only) as the check site
Non-blocking: teeWriter removed ✅ Struct + methods fully deleted; encodeBuffered now calls newSHA256Writer(tempFile) and hashTee.Sum() — semantically identical to the removed var sha [32]byte; copy(sha[:], hasher.Sum(nil)) pattern, one less type to audit

One Remaining Issue — encodeInfoFilePerm is dead code; sidecar created 0644

cmd/elastickv-snapshot-encode/main.go:56:

encodeInfoFilePerm = 0o600

This constant is defined but never used. Both file-creation sites still call os.Create, which ignores the permission argument:

// encodeToTempFile (line 323) — temp FSM, renamed to <output>
tempFile, err := os.Create(tempPath)

// writeSidecar (line 407) — ENCODE_INFO.json
f, err := os.Create(sidecarPath)

os.Create opens with mode 0666 and relies on the process umask. With a typical umask of 022 this gives 0644 — the full KV dataset is world-readable on a multi-user backup host while encoding and after publication. mismatchTxtPerm = 0o600 IS wired to os.WriteFile for mismatch.txt, but encodeInfoFilePerm is silently a no-op.

The Codex v3 review flagged the temp FSM side of this as P2. The dead constant shows the sidecar side was intended to be fixed too but wasn't applied.

Fix both call sites:

// encodeToTempFile
tempFile, err := os.OpenFile(tempPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0o600)

// writeSidecar — now encodeInfoFilePerm is no longer dead
f, err := os.OpenFile(sidecarPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, encodeInfoFilePerm)

Fix this →


Strengths in v4

  • The ErrSelfTestLowerLastCommitTS doc correction is precise: the rewrite distinguishes between "what this sentinel means" (caller-defined HLC regression) and "who enforces it" (the CLI's resolveLastCommitTS, not EncodeSnapshot). Future library callers reading the godoc now get an accurate contract.
  • Unifying teeWriter into sha256Writer eliminates the only meaningful duplication left in the encoder. The hashTee.Sum() call is cleaner than the previous var sha [32]byte; copy(sha[:], hasher.Sum(nil)) pattern and leaves a single canonical tee-with-hash type in the package.
  • encodeBuffered is now visibly simpler: three lines (newSHA256Writer, WriteTo, Sum) versus five with the old hasher + teeWriter split.

Verdict: One fix before merge: os.Createos.OpenFile at both file-creation sites so the FSM and sidecar are 0600 rather than 0644, and encodeInfoFilePerm becomes live code. All v3 findings are correctly addressed and the codebase is otherwise clean.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3572798b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

b := newSnapshotBuilder(opts.LastCommitTS)
enabled, err := runAdapterEncoders(b, opts)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject zero-adapter library encodes

When an in-process caller passes the zero AdapterSet (for example, a future Phase 1 caller forgets to thread adapter selection), runAdapterEncoders returns an empty list with no error and the subsequent write emits a valid header-only .fsm, silently discarding any adapter subtrees under InputRoot. The CLI parser now rejects empty selections, but this public library entrypoint still has the same fail-open behavior for direct callers, so it should fail before writing when no adapters are enabled.

Useful? React with 👍 / 👎.

Claude v4 mandatory finding: encodeInfoFilePerm = 0o600 was declared
but dead code. Both file-creation sites (encodeToTempFile and
writeSidecar) used os.Create, which honors umask and typically gives
0644. On a multi-user backup host, the full KV dataset (.fsm) and the
sidecar (which carries source path, cluster_id, and SHA-256) would
have been world-readable.

Replaced both os.Create calls with os.OpenFile + explicit
encodeInfoFilePerm:
- encodeToTempFile: temp .fsm is now 0o600 from creation, so the
  on-disk dataset is never world-readable during the encode-then-
  rename window.
- writeSidecar: ENCODE_INFO.json gets the same 0o600.

New TestCLIPublishesFsmAndSidecarMode0600 asserts both files have no
group/other access bits (`perm & 0o077 == 0`) after a successful run.
Skips on Windows where Unix-style mode bits aren't meaningful.

Codex v3 had also flagged the temp FSM side at P2 in passing; that's
now addressed alongside the sidecar.

Caller audit: both call sites updated in this same file; no other
os.Create calls remain in the encoder CLI. The `0o600` constant is now
live code.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 1, 2026

v5 (0c223d2): addresses claude v4 mandatory.

Mandatory (claude v4): encodeInfoFilePerm = 0o600 was dead code; both os.Create call sites (encodeToTempFile + writeSidecar) relied on umask and gave 0644 in practice. On a multi-user backup host the .fsm + sidecar would have been world-readable.

Both replaced with os.OpenFile + encodeInfoFilePerm. The constant is now live code. New TestCLIPublishesFsmAndSidecarMode0600 asserts perm & 0o077 == 0 on both files. Skips on Windows.

Caller audit: both sites updated; no other os.Create calls remain in the encoder CLI.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @bootjp's task in 2m 29s —— View job


Code Review - PR #904 (v5)

Tasks

  • Gather context and understand the request
  • Read changed files (v5 commit 0c223d2)
  • Verify mandatory v4 fix: os.Createos.OpenFile at both sites with 0o600
  • Verify new test TestCLIPublishesFsmAndSidecarMode0600
  • Check for any new issues introduced in v5
  • Post final review

v5 Verification — Mandatory v4 Fix Confirmed

Finding Status
Mandatory v4 fix: encodeToTempFile — temp FSM world-readable os.OpenFile(tempPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, encodeInfoFilePerm) at main.go:325
Mandatory v4 fix: writeSidecar — sidecar world-readable os.OpenFile(sidecarPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, encodeInfoFilePerm) at main.go:412
encodeInfoFilePerm = 0o600 is now live code ✅ Used at both os.OpenFile call sites; no longer dead
New test TestCLIPublishesFsmAndSidecarMode0600 ✅ Skips on Windows; asserts perm & 0o077 == 0 for both out (the .fsm) and out + ".encode_info.json"

Permission chain is correct end-to-end: the temp file is created 0o600os.Rename preserves permissions on POSIX → the published .fsm arrives at cfg.outputPath with 0o600. The test correctly checks the final path, not the transient temp path.

The library-level self-test buffer (encode_snapshot.go:194) uses os.CreateTemp, which creates files mode 0600 by default on Unix — no issue there.

No remaining os.Create calls in either encoder file that would allow umask to widen permissions.


No New Issues Introduced in v5

The change is minimal and scoped to exactly what v4 flagged: two os.Create call sites replaced with os.OpenFile and one new test.


Carry-over Observation (pre-existing, non-blocking)

EncodeSnapshot does not guard against zero AdapterSet at the library level

internal/backup/encode_snapshot.go:144 — when called directly with opts.Adapters == AdapterSet{}, runAdapterEncoders returns an empty enabled slice with no error, producing a valid header-only .fsm. The CLI parser (parseAdapterSet) guards against this, but a future in-process caller who forgets to thread adapter selection would get a silent empty-restore artifact. The codex v5 review also flagged this (P2). Not introduced by v5; not a blocker for merge, but worth tracking.


Strengths in v5

  • encodeInfoFilePerm is now the single source of truth for file permissions across both the FSM and sidecar — if the permission is ever revisited, one constant update covers both sites.
  • The Windows skip in TestCLIPublishesFsmAndSidecarMode0600 uses runtime.GOOS == "windows" (evaluated once at package init via var isWindows), which is cleaner than an in-test runtime.GOOS call inside the table.
  • The test assertion perm&0o077 != 0 is deliberately loose: it accepts 0o400 or 0o600 (e.g., a filesystem that strips the owner-write bit) while still rejecting anything with group or other bits.

Verdict: Ready to merge. The mandatory v4 fix is correctly implemented and pinned by the new test. No new issues in v5.
| Branch

… v5 + codex v5)

Carry-over flagged in both v5 reviews: EncodeSnapshot at the library
layer accepted opts.Adapters == AdapterSet{} and produced a valid
header-only .fsm with no adapter records. The CLI's parseAdapterSet
already rejects this for flag-driven entry, but a future in-process
caller (Phase 1 live extractor, integration test) that forgets to
thread the adapter set would get a silently empty restore artifact.

Fail-closed guard added at the top of EncodeSnapshot alongside the
existing InputRoot / out-nil checks. Pinned by
TestEncodeSnapshotRejectsZeroAdapterSet which also asserts no bytes
are written to out on rejection.

Claude v5 explicitly said the existing fix was "Ready to merge" and
this was non-blocking; folding in for completeness since the guard
is a 4-line addition and the test is independent.

Caller audit per CLAUDE.md semantic-change rule: EncodeSnapshot's
new pre-condition narrows the accepted input domain. All existing
callers either (a) pass through the CLI's parseAdapterSet (which
already rejects empty selections) or (b) are same-package tests
that pass an explicit AdapterSet{...} literal. No legitimate caller
is impacted.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 1, 2026

v6 (b56c0e6): folded in the carry-over non-blocking guard.

Claude v5 already declared "Ready to merge" but flagged the library-level zero-AdapterSet gap (also raised by codex v5). Added the guard at the top of EncodeSnapshot alongside the existing InputRoot / out-nil checks. Pinned by TestEncodeSnapshotRejectsZeroAdapterSet (asserts error AND no bytes written to out).

Caller audit: the new pre-condition narrows accepted input. All existing callers pass through the CLI's parseAdapterSet (already rejects empty) or are same-package tests with explicit literals. No legitimate caller impacted.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @bootjp's task in 1m 30s —— View job


Code Review - PR #904 (v6)

Tasks

  • Gather context and understand the request
  • Read v6 changes (internal/backup/encode_snapshot.go, encode_snapshot_test.go)
  • Verify zero-AdapterSet guard implementation
  • Verify TestEncodeSnapshotRejectsZeroAdapterSet correctness
  • Check for any new issues introduced in v6
  • Post final review

v6 Verification — Zero-AdapterSet Guard Confirmed

The change is exactly scoped to what was described: 10 lines to encode_snapshot.go (the guard) and 22 lines to encode_snapshot_test.go (the test). No other files touched.

Item Status
Guard placed before newSnapshotBuilder (no bytes written before rejection) ✅ Lines 151-160, after out == nil check
Condition covers all four current adapter fields !DynamoDB && !S3 && !Redis && !SQS
Returns EncodeResult{} + non-nil error (zero bytes)
Error message matches package prefix convention "backup: EncodeOptions.Adapters has no enabled adapter"
Comment explains WHY and references existing CLI guard ✅ Credits parseAdapterSet + codex v5 + claude v5 #904
TestEncodeSnapshotRejectsZeroAdapterSet uses explicit AdapterSet{}
Test asserts err != nil (fail-closed) t.Fatalf on success
Test asserts buf.Len() == 0 (no bytes on rejection) ✅ The second invariant, matches prior TestEncodeSnapshotSelfTestDetectsCorruption pattern

No New Issues Introduced in v6

The guard correctly fires before any snapshotBuilder state is created or any bytes flow to out. All five v5-verified fixes remain intact (the diff is purely additive: the guard block + its test).

One observation carries forward as genuinely non-blocking: if AdapterSet ever gains a fifth adapter field, this guard condition would need updating alongside adapterRunners() and enabledAdapterSubdirs(). Since all three sites are co-located in the same file and the compiler won't help, a future author adding a new adapter should search for these. Not an issue for this PR.


Verdict: Ready to merge. v6 closes the only remaining carry-over from v5 cleanly, with no regressions and no new issues.

| Branch |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b56c0e65ae

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/elastickv-snapshot-encode/main.go Outdated
Comment on lines +244 to +246
result, err := writeAndPublish(cfg, encodeOpts, mismatchPath, logger)
if err != nil {
return err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Write encode info before returning self-test mismatches

When --self-test produces a real mismatch, writeAndPublish returns errSelfTestMismatch here; this early return skips the writeSidecar call below, so <output>.encode_info.json is never written with self_test.matched=false. The design says mismatch exits should leave both mismatch.txt and provenance, and without the sidecar operators lose the generated SHA/T/adapters needed to diagnose a failed self-test.

Useful? React with 👍 / 👎.

User flagged that I had missed codex reviews on v2 and v6. Re-fetched
all bot reviews; two P2 items remained unaddressed.

## Codex P2 v2: enforce manifest TS floor in EncodeSnapshot

The CLI's resolveLastCommitTS already rejects --last-commit-ts T <
manifest.last_commit_ts, but the library-level EncodeSnapshot
accepted ANY opts.LastCommitTS, including one below the manifest's
floor. A future in-process caller (Phase 1 live extractor,
integration tests) that bypasses the CLI could silently lower the
restored HLC ceiling, letting a post-restart leader re-issue a read
ts <= a restored row's commit ts.

Earlier v4 fix was doc-only (godoc said "EncodeSnapshot does NOT
read the manifest"). That's accurate but pushes the responsibility
to callers. Codex's suggestion to enforce in the library is the
fail-closed answer.

Fix: EncodeOptions gains ManifestLastCommitTS uint64. EncodeSnapshot
fails-closed with ErrSelfTestLowerLastCommitTS when LastCommitTS <
ManifestLastCommitTS (both > 0 — synthetic test fixtures opt out by
leaving ManifestLastCommitTS at 0). CLI's buildEncodeOptions now
threads manifest.LastCommitTS into the field.

Pinned by TestEncodeSnapshotRejectsLowManifestFloor (rejection +
no bytes written) and TestEncodeSnapshotManifestFloorOptOut
(opt-out path still works).

## Codex P2 v6: write encode_info.json before returning self-test mismatches

The old writeAndPublish returned errSelfTestMismatch on mismatch,
which skipped the writeSidecar call. The design says mismatch should
leave BOTH <output>.mismatch.txt AND <output>.encode_info.json
(with self_test.matched=false) for diagnostics. Operators need the
sidecar's SHA256, effective T, adapters_enabled list to triage a
failed self-test.

Fix: encodeOne now writes the sidecar whenever the encode itself
ran (publishErr == nil || errors.Is(publishErr, errSelfTestMismatch)),
and only after that returns the original error. Stale sidecar
cleanup at the start of every run (was: only mismatch.txt was
cleaned).

## Caller audit per CLAUDE.md semantic-change rule

- EncodeSnapshot gained a pre-condition (ManifestLastCommitTS floor).
  All in-tree callers either thread manifest.LastCommitTS through the
  new field (CLI) or use ManifestLastCommitTS=0 (existing tests). No
  legitimate caller is impacted.
- encodeOne's control flow changed: sidecar is now written EVEN on
  mismatch. The mismatch error still propagates; downstream callers
  (run() in main, tests) see the same error contract. Sole consumer
  is run() which maps errSelfTestMismatch to exitDataErr.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

v7 (9f69295): addresses two missed codex P2 reviews.

ユーザー指摘の通り、v2 と v6 の codex review を見落としていました。

Codex P2 v2 (manifest TS floor in EncodeSnapshot): Library-level enforcement was missing. v4 only fixed the godoc to say "EncodeSnapshot does NOT enforce" — accurate but pushed responsibility to callers. Now EncodeOptions.ManifestLastCommitTS (opt-in via non-zero) makes EncodeSnapshot fail-closed when LastCommitTS < ManifestLastCommitTS. CLI threads manifest.LastCommitTS automatically. Pinned by TestEncodeSnapshotRejectsLowManifestFloor + TestEncodeSnapshotManifestFloorOptOut.

Codex P2 v6 (sidecar on self-test mismatch): writeAndPublish returned errSelfTestMismatch which skipped writeSidecar. Design says mismatch should leave BOTH mismatch.txt AND encode_info.json (with matched:false) for diagnostics. encodeOne now writes the sidecar whenever the encode actually ran (publishErr == nil || errors.Is(publishErr, errSelfTestMismatch)), then propagates the original error.

Caller audit: EncodeSnapshot gained a pre-condition (zero-AdapterSet was added earlier, manifest floor is the new one). All in-tree callers either thread the manifest value (CLI) or use opt-out (ManifestLastCommitTS=0 for synthetic test fixtures). No legitimate caller impacted.

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 3m 59s —— View job


Code Review - PR #904 (v7)

Tasks

  • Gather context and understand the request
  • Read changed files (v7 commit 9f69295)
  • Verify Fix 1: Codex P2 v2 — library-level manifest TS floor (ManifestLastCommitTS)
  • Verify Fix 2: Codex P2 v6 — sidecar written on self-test mismatch
  • Check for any new issues introduced in v7
  • Post final review

v7 Verification — Both Codex P2 Fixes Confirmed

Finding Status
Codex P2 v2: library-level manifest TS floor missing validateEncodeOptions enforces LastCommitTS >= ManifestLastCommitTS when ManifestLastCommitTS > 0; CLI threads manifest.LastCommitTS via buildEncodeOptions
Codex P2 v6: sidecar skipped on errSelfTestMismatch encodeOne now calls writeSidecar when publishErr == nil || errors.Is(publishErr, errSelfTestMismatch); sidecar written with matched: false on mismatch
Stale sidecar cleanup on re-run _ = os.Remove(backup.EncodeInfoSidecarPath(cfg.outputPath)) before encode ensures stale prior sidecar is removed
New tests for P2 v2 TestEncodeSnapshotRejectsLowManifestFloor + TestEncodeSnapshotManifestFloorOptOut
validateEncodeOptions extracted as helper ✅ Keeps EncodeSnapshot under cyclop threshold; logical grouping with comment

The dual-enforcement analysis: resolveLastCommitTS in the CLI pre-screens (returns error before buildEncodeOptions is called), so the effectiveTS passed to EncodeSnapshot satisfies effectiveTS >= manifest.LastCommitTS. With ManifestLastCommitTS = manifest.LastCommitTS, validateEncodeOptions's < check is satisfied. No double-fire at CLI entry; the library check is exclusively defense-in-depth for direct callers.


One New Issue in v7

ErrSelfTestLowerLastCommitTS var comment contradicts the implementation it lives above

internal/backup/encode_snapshot.go:23-28:

// EncodeSnapshot itself does NOT read MANIFEST.json or enforce this
// floor — the comparison happens in the CLI's resolveLastCommitTS
// BEFORE EncodeSnapshot is called, and a future library caller (Phase 1
// live extractor, integration tests) must perform its own comparison
// and return this sentinel on regression so callers can errors.Is on
// it to map to the right exit code (claude v3 doc bug #904).

This was accurate after v4's doc fix, but v7 changed the contract: validateEncodeOptions (called from EncodeSnapshot) now DOES enforce the floor when ManifestLastCommitTS > 0, and EncodeSnapshot itself now RETURNS this error. Three things in the same file now contradict this var-level comment:

  1. The EncodeSnapshot function doc at line 148-156 correctly says "it WILL enforce a floor on opts.LastCommitTS when the caller threads the manifest value through opts.ManifestLastCommitTS"
  2. The ManifestLastCommitTS field doc at line 58 correctly says "EncodeSnapshot fails-closed with ErrSelfTestLowerLastCommitTS"
  3. validateEncodeOptions:181 actually wraps and returns ErrSelfTestLowerLastCommitTS

The fix is to update lines 23-28 to say the error is now returned by both the CLI (resolveLastCommitTS) and the library (validateEncodeOptions when ManifestLastCommitTS > 0):

// EncodeSnapshot itself does NOT read MANIFEST.json, but when the caller
// sets opts.ManifestLastCommitTS > 0, validateEncodeOptions enforces this
// floor and returns ErrSelfTestLowerLastCommitTS before any bytes are
// written. The CLI's resolveLastCommitTS provides the same check for
// flag-driven entry; library callers SHOULD set ManifestLastCommitTS to
// the manifest value for defense-in-depth (codex P2 v2 + claude v3 #904).

Fix this →


Observation (non-blocking)

TestCLISelfTestMismatchWritesSidecarWithMatchedFalse covers stale-sidecar cleanup, not the actual P2 v6 fix

The test exercises the manifest-floor rejection path (resolveLastCommitTS failing before writeAndPublish is called), so the sidecar is never written — the test correctly asserts os.IsNotExist. This confirms the stale-sidecar cleanup works, but does not verify that writeSidecar is called with matched: false when writeAndPublish actually returns errSelfTestMismatch.

The codex P2 v6 fix (the new if publishErr == nil || errors.Is(publishErr, errSelfTestMismatch) block at line 254) is correct in its logic but untested at the CLI integration level. The comment acknowledges this and defers to TestEncodeSnapshotSelfTestDetectsCorruption at the library level. Non-blocking, but the test name (...WritesSidecarWithMatchedFalse) promises something slightly different from what it delivers — a note in the comment clarifying the sidecar IS NOT expected here (because the encode never ran) would prevent future readers from being confused.


Strengths in v7

  • The ManifestLastCommitTS field is correctly zero-valued to opt out — no retroactive breakage to existing tests that pass EncodeOptions{} without a manifest reference.
  • validateEncodeOptions as a standalone function is a good factoring choice: it groups all four pre-condition checks, keeps EncodeSnapshot readable, and provides a clear seam for future additions.
  • The sidecar-on-mismatch priority logic is correct: sidecar-write errors on the success path propagate as the return error; on the mismatch path they degrade to a warning so errSelfTestMismatch always wins. This matches the stated requirement that sidecar is informational-only on mismatch.
  • _ = os.Remove(backup.EncodeInfoSidecarPath(cfg.outputPath)) before encode ensures any stale sidecar from a prior successful run is cleared regardless of the current run's outcome — matches the mismatch.txt cleanup pattern at the same site.

Verdict: One doc fix before merge. All functional behavior is correct; both v6 Codex P2 items are correctly implemented. The ErrSelfTestLowerLastCommitTS var comment at lines 23-28 is the only item that needs updating — it still says "EncodeSnapshot itself does NOT enforce this floor," which is now false after v7.

| Branch |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9f692950fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +299 to +300
{"dynamodb", func(s AdapterSet) bool { return s.DynamoDB }, func(b *snapshotBuilder, root string) error {
return errors.Wrap(NewDynamoDBEncoder(root).Encode(b), "dynamodb encoder")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject DynamoDB JSONL inputs before encoding

When an input manifest has dynamodb_layout: "jsonl" and --self-test is not enabled (the default), this path still runs the per-item-only DynamoDB reverse encoder. That encoder only walks items/*.json / items/*/*.json, so a decoded tree with items/data-*.jsonl is accepted but all DynamoDB items are skipped while the CLI exits successfully and publishes an .fsm containing only table metadata. Please fail closed on JSONL manifests until the reverse encoder supports that layout, or pass a JSONL-aware mode through this library path.

Useful? React with 👍 / 👎.

Two findings on v8.

## Codex P2 v8: silent empty-restore on missing/stale InputRoot

validateEncodeOptions only rejected the empty string. A library caller
that passed a typo'd or deleted InputRoot would fan-out into every
enabled adapter, each treating its missing top-level subdirectory as
a no-op. Result: a "successful" header-only .fsm — exactly the silent
empty-restore artifact the encoder is supposed to fail closed against.

CLI callers don't hit this path (they open MANIFEST.json under
InputRoot first), but the library entrypoint is now a real surface,
so the guard belongs in EncodeSnapshot itself.

Fix: validateEncodeOptions now os.Stats InputRoot after the empty-
string check, rejecting non-existent paths (wraps the stat error)
and regular files (returns a typed error). Both rejection paths run
before any byte is written and before any adapter is invoked.

Pinned by TestEncodeSnapshotRejectsMissingInputRoot (subtests:
non-existent path; regular file). Existing TestEncodeSnapshot
LibraryRoundTrip + the manifest-floor tests use t.TempDir(), which
is a directory, so they continue to pass.

## Claude v8 doc bug: EncodeSnapshot godoc attributed to wrong symbol

The EncodeSnapshot doc block sat directly above the
validateEncodeOptions doc block with no blank line between them, so
godoc tooling (and gopls) attributed the entire merged comment to
validateEncodeOptions. EncodeSnapshot — the only exported function
in this file — had no godoc at all.

Fix: relocate the EncodeSnapshot doc block to immediately precede
`func EncodeSnapshot`, separated from validateEncodeOptionsData by
a blank line. Content unchanged; only the location moved.

## Caller audit per CLAUDE.md semantic-change rule

- validateEncodeOptions gained a fail-closed pre-condition (stat +
  IsDir). All in-tree callers pass either t.TempDir() (directory) or
  a real backup root (directory). No legitimate caller is impacted;
  bad-input callers now surface a typed error instead of silently
  producing a header-only .fsm — strictly safer.
- No exported signature or error semantics changed beyond the new
  rejection path.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v9 addresses two v8 findings:

codex P2 v8validateEncodeOptions now stats InputRoot, rejecting non-existent paths and regular files before any adapter runs. Pinned by TestEncodeSnapshotRejectsMissingInputRoot (subtests: non-existent path; regular file). Library callers can no longer turn a typo'd InputRoot into a silent header-only .fsm.

claude v8EncodeSnapshot godoc block was missing a blank-line separator from validateEncodeOptions's doc, so the entire block was attributed to validateEncodeOptions and EncodeSnapshot had no godoc. Moved the block to immediately precede func EncodeSnapshot. Content unchanged; only the location.

Caller audit (CLAUDE.md semantic-change rule): validateEncodeOptions gained a new fail-closed pre-condition; all in-tree callers pass directories (t.TempDir() or real backup roots) so no legitimate caller regresses — bad-input callers now surface a typed error rather than producing a header-only .fsm. No exported signature or error semantics changed.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 1m 51s —— View job


Code Review - PR #904 (v9)

Tasks

  • Gather context and understand the request
  • Read changed files (v9 commit 145a912)
  • Verify Fix 1: Codex P2 v8 — validateEncodeOptions stats InputRoot
  • Verify Fix 2: Claude v8 — EncodeSnapshot godoc location fix
  • Check for any new issues introduced in v9
  • Post final review

v9 Verification — Both v8 Items Confirmed

Finding Status
Codex P2 v8: non-existent/non-dir InputRoot silently produces header-only .fsm os.Stat(opts.InputRoot) + !info.IsDir() at lines 165–170, before any adapter runs
TestEncodeSnapshotRejectsMissingInputRoot "non-existent path" subtest missing := filepath.Join(t.TempDir(), "does-not-exist") → stat returns os.ErrNotExist; asserts err != nil AND buf.Len()==0
TestEncodeSnapshotRejectsMissingInputRoot "regular file" subtest ✅ Creates a file, passes it as InputRoot; info.IsDir()==false fires; asserts err != nil AND buf.Len()==0
Claude v8: EncodeSnapshot godoc attributed to validateEncodeOptions ✅ Doc block (lines 201–229) now immediately precedes func EncodeSnapshot (line 230) as a standalone block; validateEncodeOptions has its own doc block (lines 150–154) immediately preceding it

The guard ordering in validateEncodeOptions is: empty InputRoot → stat + dir check → out == nil → zero-adapter → data-correctness. This is correct — the stat fires before any builder construction, so buf.Len()==0 on rejection is pinned by both new subtests.


No New Issues in v9

The change is minimal and scoped precisely to the two v8 findings:

  1. Four lines added to validateEncodeOptions: the os.Stat call, stat-error return, and !info.IsDir() return.
  2. The EncodeSnapshot doc block moved to immediately precede func EncodeSnapshot — content unchanged, only location.
  3. TestEncodeSnapshotRejectsMissingInputRoot added with two parallel subtests.

All prior fixes (v1–v8) remain intact.

One non-blocking observation: the stat call happens before the out == nil check. If a caller passes a non-existent InputRoot AND out == nil, they get a stat error rather than the "out writer is nil" error. This is a valid ordering choice (fail early on the path that does I/O) and mirrors how the CLI handles it (opening MANIFEST.json would fail first anyway). Not an issue.


Verdict: Ready to merge. Both v8 findings are correctly implemented and pinned by tests. No regressions and no new issues.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 145a9128d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/elastickv-snapshot-encode/main.go Outdated
}
if errors.Is(err, backup.ErrEncodeUnsupportedDynamoDBLayout) {
return exitDataErr, err
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify encoder data failures as exit 2

When an adapter encoder rejects a malformed decoded tree (for example an invalid DynamoDB _schema.json or unsupported S3 collision artifact), encodeOne returns that EncodeSnapshot error but it does not match any of the three sentinels above, so this fallback exits 1. The CLI contract distinguishes user/flag errors from data-correctness failures with exit 2; treating corrupt backup contents as a user error can break runbooks that branch on exit status to quarantine or repair bad dump data.

Useful? React with 👍 / 👎.

## Codex P2 v9: adapter encoder rejections fall through to exit 1

When an adapter encoder rejects the input tree's contents (a malformed
DynamoDB _schema.json, an S3 collision artifact the encoder cannot
reverse, an SQS side-record with an unknown kind, etc.), the error
propagates out of EncodeSnapshot but does not match any of the three
exit-2 sentinels (ErrSelfTestLowerLastCommitTS,
ErrEncodeUnsupportedDynamoDBLayout, errSelfTestMismatch), so the
CLI's run() falls through to exit 1. Per the CLI contract, exit 1
is operator/flag error and exit 2 is data-correctness — runbooks
branch on exit status to quarantine bad dump data, so misclassifying
adapter rejections as user error is a real ergonomic regression.

The encoder is offline-only — it operates entirely on local files
under InputRoot. Every error from an adapter encoder originates from
the contents of that tree (a sentinel rejection, an unmarshalling
failure, etc.). Treating all adapter errors as data-correctness
matches the CLI contract.

Fix:
- New ErrEncodeAdapterData sentinel in internal/backup.
- runAdapterEncoders wraps each adapter error with errors.Mark so
  errors.Is(err, ErrEncodeAdapterData) is true. Mark preserves the
  inner sentinel chain — callers that errors.Is on the per-adapter
  sentinel (ErrDDBEncodeInvalidSchema, ErrS3EncodeKeyConflict, etc.)
  are unaffected.
- CLI run() gains a fourth exit-2 branch for ErrEncodeAdapterData.

Pinned by:
- TestEncodeSnapshotMarksAdapterDataErrors (library): malformed
  _schema.json triggers ErrDDBEncodeInvalidSchema inside the DDB
  encoder; assertion verifies BOTH errors.Is(ErrEncodeAdapterData)
  AND errors.Is(ErrDDBEncodeInvalidSchema) hold - the mark is
  additive, not lossy.
- TestCLIAdapterDataErrorExitsTwo (CLI): same fixture driven
  end-to-end through run(); asserts exit code = 2 and no .fsm
  published.

## Caller audit per CLAUDE.md semantic-change rule

- runAdapterEncoders return semantics: every non-nil err is now
  marked. Sole caller is EncodeSnapshot; it propagates the marked
  error unchanged.
- EncodeSnapshot callers (library tests, CLI's encodeToTempFile):
  - Library tests that errors.Is on the validation sentinels
    fire BEFORE runAdapterEncoders, so they are NOT marked.
  - CLI's encodeToTempFile wraps with "EncodeSnapshot"; the run()
    layer above it now matches ErrEncodeAdapterData and routes to
    exit-2 (new branch).
- In-tree errors.Is on per-adapter sentinels (encode_dynamodb_test.go
  et al.) call the adapter encoders directly, not through
  EncodeSnapshot, so the mark wrapper is never on their path.

Tests + lint green. wrapcheck required errors.WithStack around the
Mark.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v10 addresses codex P2 v9: adapter-data rejections now classify as exit 2.

codex P2 v9 — When an adapter encoder rejects the input tree's contents (e.g. malformed DynamoDB _schema.json, unsupported S3 collision artifact), the error fell through the three exit-2 sentinels in run() and exited 1. Per the CLI contract, exit 1 = operator/flag error, exit 2 = data-correctness — runbooks branch on this to quarantine bad dump data.

Fix:

  • New ErrEncodeAdapterData sentinel in internal/backup.
  • runAdapterEncoders wraps each adapter error with errors.Mark(err, ErrEncodeAdapterData) (then errors.WithStack for wrapcheck). errors.Mark preserves the inner sentinel chain, so existing errors.Is callers on per-adapter sentinels (ErrDDBEncodeInvalidSchema, etc.) still match — the mark is additive.
  • CLI run() gains a fourth exit-2 branch for the new sentinel.

Pinned by:

  • TestEncodeSnapshotMarksAdapterDataErrors (library): malformed _schema.json triggers ErrDDBEncodeInvalidSchema; asserts BOTH errors.Is(ErrEncodeAdapterData) AND errors.Is(ErrDDBEncodeInvalidSchema) hold.
  • TestCLIAdapterDataErrorExitsTwo (CLI): same fixture end-to-end through run(); asserts exit code 2, no .fsm published.

Caller audit (CLAUDE.md semantic-change rule): runAdapterEncoders' sole caller is EncodeSnapshot. Validation-stage errors (which fire before runAdapterEncoders) are not marked, so existing library tests are unaffected. In-tree errors.Is on per-adapter sentinels (encode_dynamodb_test.go et al.) call adapter encoders directly, not through EncodeSnapshot, so the mark wrapper isn't on their path.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 2m 52s —— View job


Code Review - PR #904 (v10)

Tasks

  • Gather context and understand the request
  • Read changed files (v10 commit 1885834)
  • Verify codex P2 v9 fix: ErrEncodeAdapterData sentinel + exit-2 mapping
  • Verify test coverage for new sentinel
  • Check for any new issues introduced in v10
  • Post final review

v10 Verification — Codex P2 v9 Fix Confirmed

Item Status
ErrEncodeAdapterData sentinel defined with full godoc ✅ Lines 44–58; godoc explains the "data-correctness not user-error" distinction and the errors.Mark additive semantics
runAdapterEncoders wraps with errors.WithStack(errors.Mark(err, ErrEncodeAdapterData)) ✅ Line 385; errors.Mark is inner (preserves chain), errors.WithStack is outer (for stack trace at boundary)
CLI run() gains fourth exit-2 branch ✅ Lines 96–98; placed between ErrEncodeUnsupportedDynamoDBLayout and errSelfTestMismatch checks
Sentinel ordering in run() is mutually exclusive ✅ Each sentinel comes from a distinct code path (validation-layer vs. encode-layer vs. CLI-publish-layer); no double-fire possible
Library test: TestEncodeSnapshotMarksAdapterDataErrors ✅ Verifies BOTH errors.Is(err, ErrEncodeAdapterData) AND errors.Is(err, ErrDDBEncodeInvalidSchema) — the mark is additive, not a replace
CLI test: TestCLIAdapterDataErrorExitsTwo ✅ End-to-end via run(); asserts exit code exitDataErr, no .fsm at output path
Temp file cleaned up on adapter error writeAndPublish's deferred os.Remove(publishedTempPath) runs; rename never reached → output path stays absent
ErrEncodeUnsupportedDynamoDBLayout is NOT marked with ErrEncodeAdapterData ✅ That sentinel comes from validateEncodeOptionsData (pre-encoder), not from runAdapterEncoders; the two remain distinct

The errors.Mark + errors.WithStack ordering is semantically correct for github.com/cockroachdb/errors: errors.Mark(err, marker) places marker as an additional "cause" that errors.Is traverses, and the subsequent errors.WithStack adds a frame without disturbing the chain. Both errors.Is(result, ErrEncodeAdapterData) and errors.Is(result, ErrDDBEncodeInvalidSchema) will be true, as the test asserts.


One New Doc Bug in v10

runAdapterEncoders godoc references a phantom sentinel ErrS3EncodeKeyConflict

internal/backup/encode_snapshot.go:376:

// that errors.Is on ErrDDBEncodeInvalidSchema, ErrS3EncodeKeyConflict,
// etc. still see those (codex P2 v9 #904).

ErrS3EncodeKeyConflict does not exist. The actual S3 adapter sentinels are ErrS3EncodeInvalidBucket, ErrS3EncodeNotRegular, ErrS3EncodeUnsupportedCollision, ErrS3EncodeReservedPrefixCollision, and ErrS3EncodeInvalidManifest. A future reader following this comment to understand which S3 errors are preserved in the chain would look for ErrS3EncodeKeyConflict and find nothing.

The fix is to replace the phantom name with an actual one (e.g., ErrS3EncodeUnsupportedCollision) or drop the specific sentinel name in favor of a generic description:

// that errors.Is on ErrDDBEncodeInvalidSchema, ErrS3EncodeUnsupportedCollision,
// etc. still see those (codex P2 v9 #904).

Fix this →


Strengths in v10

  • The change is exactly scoped: three files, additive-only (new sentinel, one wrap site, one errors.Is branch, two tests). All v9-verified fixes remain intact.
  • The godoc for ErrEncodeAdapterData correctly distinguishes the "data-correctness failure from input content" category from "operator flag error" — this framing matches the CLI's exit-code contract documentation.
  • TestEncodeSnapshotMarksAdapterDataErrors does the important two-assertion check: it's not enough to verify the outer mark fires; the inner ErrDDBEncodeInvalidSchema must also survive the wrap. The test pins both sides.
  • TestCLIAdapterDataErrorExitsTwo does not share a fixture with the library test — the CLI test sets up _schema.json inline (since writeDDBSchema is not exported out of package backup) while the library test uses the in-package helper. Both paths are independently covered.
  • The sentinel ordering in run() is correct: ErrEncodeAdapterData goes between the other library-layer exit-2 sentinels and errSelfTestMismatch (a CLI-internal error). All four are mutually exclusive by construction.

Verdict: One doc fix before merge (phantom ErrS3EncodeKeyConflictErrS3EncodeUnsupportedCollision in the runAdapterEncoders comment). All functional behavior in v10 is correct; the codex P2 v9 item is properly implemented and pinned.

| Branch | |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 18858341bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +334 to +338
if cfg.selfTest && !result.SelfTestMatched {
if werr := os.WriteFile(mismatchPath, result.SelfTestMismatchTxt, mismatchTxtPerm); werr != nil {
logger.Warn("write mismatch.txt", "err", werr)
}
return result, errors.Wrap(errSelfTestMismatch, "self-test diff (see "+mismatchPath+")")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid leaving stale FSM beside failed self-test metadata

When --output already exists from a previous run and the new --self-test detects a mismatch, this branch returns before renaming the temp file but never removes or otherwise invalidates the old <output>. encodeOne then writes a fresh <output>.encode_info.json with self_test.matched=false and the SHA for the unpublished temp snapshot, so the visible .fsm path still contains stale bytes that do not match the sidecar and contradict the CLI contract that a self-test failure leaves no restore-visible FSM.

Useful? React with 👍 / 👎.

Claude v10 doc fix: the runAdapterEncoders godoc referenced
ErrS3EncodeKeyConflict, which does not exist. The actual S3 encoder
sentinels are ErrS3EncodeInvalidBucket, ErrS3EncodeNotRegular,
ErrS3EncodeUnsupportedCollision, ErrS3EncodeReservedPrefixCollision,
and ErrS3EncodeInvalidManifest.

Replaced with ErrS3EncodeUnsupportedCollision per claude v10's
suggestion. Doc-only change; no behavior or test changes.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v11 doc-only fix per claude v10: replaced phantom ErrS3EncodeKeyConflict in the runAdapterEncoders godoc with the actual sentinel ErrS3EncodeUnsupportedCollision. No behavior or test changes.

go build + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 48s —— View job


Code Review - PR #904 (v11)

Tasks

  • Gather context and understand the request
  • Read changed files (v11 commit f075610)
  • Verify mandatory v10 fix: phantom ErrS3EncodeKeyConflictErrS3EncodeUnsupportedCollision
  • Check for any new issues introduced in v11
  • Post final review

v11 Verification — Mandatory v10 Fix Confirmed

Finding Status
Phantom ErrS3EncodeKeyConflict replaced in runAdapterEncoders godoc ✅ Line 377 now reads ErrS3EncodeUnsupportedCollision
ErrS3EncodeUnsupportedCollision actually exists ✅ Defined in internal/backup/encode_s3_objects.go with matching godoc
ErrS3EncodeKeyConflict removed from codebase grep finds zero occurrences anywhere in internal/backup/
No behavior or test changes ✅ Diff is exactly 3 lines in one comment block; no .go logic touched
Attribution note added (codex P2 v9 #904; phantom-sentinel doc fix from claude v10 #904) in the comment

The diff is minimal and precisely scoped: the phantom sentinel name is replaced, and a parenthetical notes both the original fix reference and the doc correction.


No New Issues in v11

The change is a pure doc comment replacement — one identifier name corrected, no logic or test modifications. All prior fixes (v1–v10) remain intact.


Verdict: Ready to merge. The single mandatory v10 item is correctly addressed. All 10 previous rounds of review findings are resolved. The PR is clean.

| Branch | |

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/elastickv-snapshot-encode/main_test.go (1)

277-284: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Handle the os.Open error in this helper.

If tmpOut is missing, this passes a nil *os.File into DecodeSnapshot and hides the real failure behind a follow-on test crash or misleading decode error.

Proposed fix
-	f, _ := os.Open(tmpOut)
+	f, err := os.Open(tmpOut)
+	if err != nil {
+		t.Fatalf("open canonical output: %v", err)
+	}
+	defer func() { _ = f.Close() }()
 	if _, err := backup.DecodeSnapshot(f, backup.DecodeOptions{
 		OutRoot:  canonicalIn,
 		Adapters: backup.AdapterSet{SQS: true},
 	}); err != nil {
 		t.Fatalf("canonical decode: %v", err)
 	}
-	_ = f.Close()
 	emitMinimalManifest(t, canonicalIn, lastCommitTS)
 	return canonicalIn
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/elastickv-snapshot-encode/main_test.go` around lines 277 - 284, The
helper opens tmpOut with f, _ := os.Open(tmpOut) but ignores the open error
which can pass a nil *os.File into backup.DecodeSnapshot; update the helper to
check the error returned by os.Open(tmpOut) and call t.Fatalf (or t.Fatalf with
the error) if opening fails before calling backup.DecodeSnapshot (references:
tmpOut, f, os.Open, and backup.DecodeSnapshot); ensure you still close f after a
successful open.
🧹 Nitpick comments (1)
cmd/elastickv-snapshot-encode/main_test.go (1)

335-372: ⚡ Quick win

Rename this test to match what it actually exercises.

This never reaches a self-test mismatch path. It verifies the preflight manifest-floor failure leaves no stale sidecar, so the current name/docstring point readers at the wrong branch.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/elastickv-snapshot-encode/main_test.go` around lines 335 - 372, Rename
the test function TestCLISelfTestMismatchWritesSidecarWithMatchedFalse and
update its docstring/comments to reflect that it verifies the manifest-floor
preflight failure (triggered via resolveLastCommitTS / --last-commit-ts) leaves
no stale sidecar; e.g., rename to TestCLIManifestFloorLeavesNoStaleSidecar (or
similar), update the top comment to describe the preflight/manifest-floor path,
and keep the existing assertions (emitMinimalManifest, run with
"--last-commit-ts", exitDataErr, and the os.Stat check) unchanged so the test
semantics remain the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@cmd/elastickv-snapshot-encode/main_test.go`:
- Around line 277-284: The helper opens tmpOut with f, _ := os.Open(tmpOut) but
ignores the open error which can pass a nil *os.File into backup.DecodeSnapshot;
update the helper to check the error returned by os.Open(tmpOut) and call
t.Fatalf (or t.Fatalf with the error) if opening fails before calling
backup.DecodeSnapshot (references: tmpOut, f, os.Open, and
backup.DecodeSnapshot); ensure you still close f after a successful open.

---

Nitpick comments:
In `@cmd/elastickv-snapshot-encode/main_test.go`:
- Around line 335-372: Rename the test function
TestCLISelfTestMismatchWritesSidecarWithMatchedFalse and update its
docstring/comments to reflect that it verifies the manifest-floor preflight
failure (triggered via resolveLastCommitTS / --last-commit-ts) leaves no stale
sidecar; e.g., rename to TestCLIManifestFloorLeavesNoStaleSidecar (or similar),
update the top comment to describe the preflight/manifest-floor path, and keep
the existing assertions (emitMinimalManifest, run with "--last-commit-ts",
exitDataErr, and the os.Stat check) unchanged so the test semantics remain the
same.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 06cbcf79-3732-42cb-a690-d182c2c96f28

📥 Commits

Reviewing files that changed from the base of the PR and between b8da884 and f075610.

📒 Files selected for processing (4)
  • cmd/elastickv-snapshot-encode/main.go
  • cmd/elastickv-snapshot-encode/main_test.go
  • internal/backup/encode_snapshot.go
  • internal/backup/encode_snapshot_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/elastickv-snapshot-encode/main.go

## Codex P2 v10: self-test mismatch leaves orphaned stale .fsm

When --output already exists from a prior successful run and the new
--self-test invocation detects a mismatch, writeAndPublish returned
errSelfTestMismatch without removing the stale <output>.fsm. encodeOne
then wrote a fresh <output>.encode_info.json with self_test.matched=false
and the NEW SHA pointing to the unpublished temp snapshot, leaving:

- <output>.fsm: STALE bytes from the prior successful run
- <output>.encode_info.json: NEW SHA + matched=false
- <output>.mismatch.txt: from the new run

The sidecar describes an FSM that does not exist on disk. This
violates the CLI contract that "a self-test failure leaves no
restore-visible FSM" and breaks runbooks that consume the sidecar's
SHA to verify FSM integrity (they would find the stale FSM's SHA
does not match the sidecar's NEW SHA).

Fix: writeAndPublish, on the self-test-mismatch branch, now
os.Remove(cfg.outputPath) before returning. After cleanup the
artifacts are internally consistent: the sidecar describes a FAILED
encode attempt, mismatch.txt has the diff, and the FSM path is
absent (matching the sidecar's intent).

errors.Is(rerr, os.ErrNotExist) is treated as success — a
self-test mismatch from a first-ever encode (no prior FSM) leaves
the path absent, which is the same end state.

## Surgical scope — only self-test mismatch wipes stale .fsm

Other exit-2 paths (manifest-floor regression, adapter-data error,
unsupported DDB layout) preserve any prior <output>.fsm because
those failures occur BEFORE writeAndPublish runs. This preserves
the runbook ergonomic of "operator typo'd --last-commit-ts; last
good FSM still on disk to retry against."

## Test infrastructure

To drive a deterministic self-test mismatch end-to-end through the
CLI's writeAndPublish, a minimal test seam was added to package
backup: EncodeOptions.SetSelfTestCorruptHookForTest(func(*os.File))
exposes the previously-package-private corruptBufferForTest field
to callers outside package backup. Production code MUST NOT call
this; the godoc explicitly names cmd/elastickv-snapshot-encode tests
as the sole intended caller.

Pinned by:
- TestCLIWriteAndPublishRemovesStaleFSMOnSelfTestMismatch: pre-places
  a stale .fsm, injects corruption via the new seam, asserts
  publishErr == errSelfTestMismatch, the stale .fsm is GONE, and
  mismatch.txt is present.
- TestCLINonSelfTestExitTwoPreservesPriorFSM: pre-places a stale
  .fsm, drives a manifest-floor regression (exit-2 BEFORE
  writeAndPublish), asserts the stale .fsm is byte-for-byte
  preserved. Pins the surgical scope of the cleanup.

## Caller audit per CLAUDE.md semantic-change rule

- writeAndPublish: sole production caller is encodeOne. On self-test
  mismatch, encodeOne still writes the sidecar (publishErr matches
  errSelfTestMismatch) and returns publishErr. The new os.Remove
  happens before that, so the caller sees IDENTICAL error semantics
  and the same sidecar-write path. The only difference is on-disk
  state for <output>.fsm: was stale, is now absent.
- EncodeOptions.SetSelfTestCorruptHookForTest: new exported method.
  No production callers anywhere; one test caller in
  cmd/elastickv-snapshot-encode/main_test.go. In-package backup
  tests continue to set corruptBufferForTest directly.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v12 addresses codex P2 v10 (which I had missed on the prior review cycle — fetched all bot reviews this time per the discipline lock-in).

codex P2 v10 — When --output already existed from a prior successful run and the new --self-test invocation produced a mismatch, writeAndPublish returned errSelfTestMismatch without removing the stale <output>.fsm. encodeOne then wrote a fresh <output>.encode_info.json (matched=false, NEW SHA) alongside the OLD FSM bytes — sidecar described an FSM that did not exist; violated the "self-test failure leaves no restore-visible FSM" contract.

Fix:

  • writeAndPublish, on the self-test-mismatch branch, os.Remove(cfg.outputPath) before returning. os.ErrNotExist is treated as success (first-ever encode case).
  • Sidecar is still written by encodeOne (codex P2 v6 carry-over): operators get <output>.mismatch.txt + <output>.encode_info.json (matched=false), but no orphaned FSM.
  • New exported test seam EncodeOptions.SetSelfTestCorruptHookForTest exposes the previously-package-private corruptBufferForTest so the CLI test can drive a real self-test mismatch end-to-end.

Surgical scope: only self-test mismatch wipes the stale .fsm. Other exit-2 paths (manifest-floor, adapter-data, JSONL guard) fail BEFORE writeAndPublish so the prior FSM is preserved — pinned by TestCLINonSelfTestExitTwoPreservesPriorFSM.

Pinned by:

  • TestCLIWriteAndPublishRemovesStaleFSMOnSelfTestMismatch (uses the new seam to inject corruption; asserts stale .fsm GONE, mismatch.txt present, sidecar would be written).
  • TestCLINonSelfTestExitTwoPreservesPriorFSM (pre-places stale .fsm, drives manifest-floor exit-2, asserts the stale FSM is byte-equal preserved — pins surgical scope).

Caller audit (CLAUDE.md semantic-change rule): writeAndPublish's sole production caller is encodeOne; sidecar-write semantics unchanged, only on-disk state for <output>.fsm differs (was stale, is now absent). SetSelfTestCorruptHookForTest has no production callers.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 3m 10s —— View job


Code Review - PR #904 (v12)

Tasks

  • Gather context and understand the request
  • Read changed files (v12 commit 5a2f214)
  • Verify codex P2 v10 fix: stale FSM removal on self-test mismatch
  • Verify new test seam SetSelfTestCorruptHookForTest
  • Verify TestCLIWriteAndPublishRemovesStaleFSMOnSelfTestMismatch
  • Verify TestCLINonSelfTestExitTwoPreservesPriorFSM
  • Check for new issues (CodeRabbit carry-overs, new regressions)
  • Post final review

v12 Verification — Codex P2 v10 Fix Confirmed

Item Status
writeAndPublish: os.Remove(cfg.outputPath) on self-test mismatch ✅ Line 345; os.ErrNotExist ignored (first-ever encode case), remove failure warns-but-continues
Remove fires BEFORE return result, errSelfTestMismatch ✅ Lines 345-348; deferred temp-file cleanup fires AFTER, so sequence is: remove-stale → return mismatch → defer removes temp
SetSelfTestCorruptHookForTest exported on *EncodeOptions ✅ Lines 138-150 in encode_snapshot.go; pointer receiver correctly sets the unexported corruptBufferForTest field
TestCLIWriteAndPublishRemovesStaleFSMOnSelfTestMismatch — pre-places stale, injects corruption via seam, asserts stale FSM gone + mismatch.txt present ✅ Lines 302-389 in main_test.go; calls writeAndPublish directly; errors.Is(publishErr, errSelfTestMismatch) + os.IsNotExist(os.Stat(out))
Corruption hook pattern matches library convention (headerSkip=200, every 13th byte XOR 0xFF) ✅ Mirrors flipBytesPastHeaderHelper as documented in the comment
TestCLINonSelfTestExitTwoPreservesPriorFSM — manifest-floor regression preserves prior FSM byte-for-byte ✅ Lines 391-426; --last-commit-ts 500 < manifest 1000; bytes.Equal(body, stalePayload) assertion
cfg.selfTest = true and encodeOpts.SelfTest = true are both set in the mismatch test — no disconnect writeAndPublish gates on cfg.selfTest at line 334; EncodeSnapshot gates on encodeOpts.SelfTest at the encode path; both true

One Doc Gap in v12

writeAndPublish godoc doesn't mention the stale-FSM removal

cmd/elastickv-snapshot-encode/main.go:315-318:

// writeAndPublish writes the .fsm to a temp path, runs the optional
// self-test via EncodeSnapshot, and renames temp → output on success.
// On self-test failure: writes mismatch.txt, removes the temp file via
// the deferred cleanup, returns errSelfTestMismatch.

The comment says "removes the temp file via the deferred cleanup" but omits the new stale-output removal. A future reader auditing the on-disk state after a self-test mismatch would see only the deferred temp cleanup documented and need to grep for cfg.outputPath to discover the second removal. One clause would close this:

// On self-test failure: writes mismatch.txt, removes any stale
// <output>.fsm from a prior run, removes the temp file via
// the deferred cleanup, returns errSelfTestMismatch.

Fix this →


Carry-over Issues (non-blocking, flagged across multiple prior rounds)

canonicalizeInput ignores os.Open error

cmd/elastickv-snapshot-encode/main_test.go:278:

f, _ := os.Open(tmpOut)

CodeRabbit flagged this in its v12 review (outside-diff comment). If tmpOut is absent for any reason, this passes a nil *os.File into DecodeSnapshot, producing a panic or misleading error rather than a clean t.Fatalf. The fix is one line:

f, err := os.Open(tmpOut)
if err != nil {
    t.Fatalf("open canonical output: %v", err)
}
defer func() { _ = f.Close() }()

Fix this →

TestCLISelfTestMismatchWritesSidecarWithMatchedFalse name contradicts its assertion

cmd/elastickv-snapshot-encode/main_test.go:471: The name promises the sidecar IS written with matched=false, but the test asserts os.IsNotExist on the sidecar (the encode never ran; resolveLastCommitTS exited before writeAndPublish). This has been flagged since v7 (carried from v7→v8→v9→v10→v11→v12) and CodeRabbit also flagged it in its latest review. The new TestCLINonSelfTestExitTwoPreservesPriorFSM (also in v12) exercises the same path for FSM-preservation, making the name collision even more confusing.

Renaming to TestCLIManifestFloorLeavesNoStaleSidecar would resolve the contradiction without changing any assertion.

Fix this →


Strengths in v12

  • The stale-FSM removal is positioned correctly inside the mismatch branch of writeAndPublish — between the mismatch.txt write and the return errSelfTestMismatch. The deferred temp-file cleanup remains the last thing that runs, so no orphaned temp files either.
  • The SetSelfTestCorruptHookForTest pointer receiver (*EncodeOptions) is correct — the hook stored via the method is visible to EncodeSnapshot which receives the options by value after the test sets the hook. The method is on the pointer so assignment propagates.
  • TestCLIWriteAndPublishRemovesStaleFSMOnSelfTestMismatch calls writeAndPublish directly (not run), which is the right seam: it isolates the stale-removal behavior from the encodeOne / CLI-flag-parsing layers. The errors.Is(publishErr, errSelfTestMismatch) assertion first confirms the right branch was taken, then the os.IsNotExist assertion confirms the cleanup.
  • TestCLINonSelfTestExitTwoPreservesPriorFSM correctly uses bytes.Equal (not just os.Stat) to confirm the stale FSM content is byte-for-byte unchanged — stronger than a mere existence check.
  • The os.ErrNotExist guard on the os.Remove call is correctly placed: it protects the first-ever encode case where no prior FSM exists, and lets other remove errors bubble up as warnings without masking the mismatch error.

Verdict: One doc fix before merge (writeAndPublish godoc missing the stale-FSM removal clause). Two carry-over non-blocking items (os.Open error in test helper, test name mismatch) that have been flagged by both Claude and CodeRabbit across multiple rounds — worth folding in given the PR is at v12 and all functional behavior is correct.
| Branch

## Claude v12 mandatory: writeAndPublish godoc missed v12 cleanup

The godoc described only the temp-file deferred cleanup but omitted
the new stale-<output>.fsm removal added in v12. A future reader
auditing the on-disk state after self-test mismatch would see only
the temp-file cleanup documented and need to grep for cfg.outputPath
to discover the second removal. Expanded the godoc to call out both.

## Claude v12 carry-over: canonicalizeInput swallowed os.Open error

The test helper used "f, _ := os.Open(tmpOut)" and passed nil to
DecodeSnapshot if the open failed, producing a panic or misleading
decode error instead of a clean t.Fatalf. Fixed: check the error
explicitly and defer the Close. CodeRabbit flagged this on v11/v12.

## Claude v12 carry-over: test name contradicted its assertion

TestCLISelfTestMismatchWritesSidecarWithMatchedFalse asserted
os.IsNotExist on the sidecar — the encode never ran on the
manifest-floor path it exercises. Name renamed to
TestCLIManifestFloorLeavesNoStaleSidecar to match the actual
assertion. The real "sidecar IS written with matched=false on
self-test mismatch" behavior is now pinned end-to-end by
TestCLIWriteAndPublishRemovesStaleFSMOnSelfTestMismatch (v12) which
drives a real self-test mismatch through the new corruption seam.
Docstring updated to explain the rename history.

## Caller audit per CLAUDE.md semantic-change rule

All v13 changes are doc/identifier/error-handling only. No public
or package-private function signature, error semantics, or return
contract changed. Renamed test has no callers (Go testing framework
discovers by prefix). canonicalizeInput's error handling is strictly
safer — any open failure now surfaces at the failure site.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v13 addresses claude v12 verdict + two carry-overs flagged across multiple rounds:

Claude v12 mandatorywriteAndPublish godoc expanded to call out both the stale-<output>.fsm removal AND the deferred temp-file cleanup (v12 added the former but the godoc only mentioned the latter).

Claude/CodeRabbit carry-overcanonicalizeInput test helper now checks os.Open error and defers Close (was f, _ := os.Open(tmpOut) which would pass nil to DecodeSnapshot).

Claude/CodeRabbit carry-overTestCLISelfTestMismatchWritesSidecarWithMatchedFalse renamed to TestCLIManifestFloorLeavesNoStaleSidecar to match its actual assertion (the encode never runs on the manifest-floor path; test asserts os.IsNotExist on the sidecar). The real "sidecar IS written with matched=false on self-test mismatch" behavior is now pinned by v12's TestCLIWriteAndPublishRemovesStaleFSMOnSelfTestMismatch via the corruption seam.

Caller audit (CLAUDE.md semantic-change rule): all v13 changes are doc/identifier/error-handling only. No public/private signature, error semantics, or return contract changed. Renamed test has no callers (testing framework discovers by prefix). canonicalizeInput error handling is strictly safer.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 1m 41s —— View job


Code Review - PR #904 (v13)

Tasks

  • Gather context and understand the request
  • Read changed files (v13 commit 48b0b0e)
  • Verify mandatory v12 fix: writeAndPublish godoc expansion
  • Verify carry-over: canonicalizeInput os.Open error handling
  • Verify carry-over: test rename TestCLIManifestFloorLeavesNoStaleSidecar
  • Check for any new issues introduced in v13
  • Post final review

v13 Verification — All Three Items Confirmed

Item Status
Mandatory v12: writeAndPublish godoc expanded to include stale-FSM removal ✅ Lines 317-320: "removes any stale <output>.fsm left by a prior successful run (codex P2 v10 #904), removes the temp file via the deferred cleanup" — both cleanup actions now documented
Carry-over: canonicalizeInput os.Open error check ✅ Lines 278-282: f, oerr := os.Open(tmpOut) + t.Fatalf("open canonical output: %v", oerr) + defer func() { _ = f.Close() }() — nil-file-to-DecodeSnapshot path is closed
Carry-over: test rename to TestCLIManifestFloorLeavesNoStaleSidecar ✅ Line 476; old name only appears in the historical note at line 470 as a comment, not as a function; grep confirms zero function definitions under the old name

The encodeOne flow confirm the test comment is accurate: resolveLastCommitTS is the second call in encodeOne (after readInputManifest), and the stale-sidecar cleanup (os.Remove(backup.EncodeInfoSidecarPath(...))) happens at line 255 AFTER that call. The manifest-floor path returns at line 246 before the cleanup runs — the test uses a fresh t.TempDir() so no stale sidecar exists, making the os.IsNotExist assertion valid.


No New Issues in v13

The diff is exactly two files, 28 additions / 21 deletions, scoped to:

  1. Four lines of godoc in main.go (one clause added to writeAndPublish doc)
  2. Test file: os.Open error check + defer close (7 lines); test rename + docstring rewrite (27 lines)

All prior fixes (v1–v12) remain intact. grep -rn TestCLISelfTestMismatchWritesSidecarWithMatchedFalse returns only the historical note in the renamed test's docstring — no stale function definitions.


Verdict: Ready to merge. All three v13 items are correctly implemented. No regressions, no new issues. All 12 prior review rounds' findings are resolved.

| Branch | |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48b0b0e582

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/encode_snapshot.go Outdated
func adapterRunners() []adapterRunner {
return []adapterRunner{
{"redis", func(s AdapterSet) bool { return s.Redis }, func(b *snapshotBuilder, root string) error {
return errors.Wrap(NewRedisEncoder(root, 0).Encode(b), "redis encoder")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Encode every Redis database, not just db_0

When the decoded input contains Redis keys in any non-zero DB (for example redis/db_3, which the decoder can emit), this fan-out only constructs NewRedisEncoder(root, 0). RedisEncoder is scoped to a single redis/db_<n> directory, so the default CLI path with --self-test off publishes an .fsm that silently drops all non-DB0 Redis data; with self-test on the same valid multi-DB dump is rejected instead of encoded. Please enumerate the redis/db_* directories and invoke the per-DB encoder for each selected DB.

Useful? React with 👍 / 👎.

## Codex P1 v13: hardcoded db_0 silently dropped non-default Redis DBs

The Redis adapter runner in adapterRunners was hardcoded to
NewRedisEncoder(root, 0). RedisEncoder is scoped to a single
redis/db_<n>/ subdirectory; any Phase 1 multi-DB dump that included
redis/db_3/, redis/db_5/, etc. would silently drop those DBs from
the produced .fsm — a header-only encode for those keys, exactly
the silent-data-loss pattern the encoder is supposed to fail closed
against. With --self-test on, the same valid multi-DB dump would be
rejected at the diff stage (the decoded scratch tree would not match
the input's db_3 content).

Phase 0a/0b inputs only emit redis/db_0/, so this is a latent bug
that fires the moment Phase 1's multi-DB dumper lands; codex flagged
it as P1.

## Fix: enumerate redis/db_<N> directories and fan out per DB

Two new helpers in encode_snapshot.go:

- enumerateRedisDBs(inRoot) returns the sorted dbIndex values for
  which <inRoot>/redis/db_<N>/ exists as a directory. A missing
  redis/ returns nil (no-op, matching the per-DB encoder's
  missing-subdir convention). Non-canonical entries (non-numeric
  suffix, negative, leading-zero like "db_01", wrong prefix,
  regular files at the redis/ level) are silently skipped — they
  cannot be produced by the canonical decoder. A symlinked or
  regular-file redis/ path is rejected with ErrRedisEncodeNotDir
  (mirrors the per-DB encoder's symlink refusal).

- encodeAllRedisDBs(b, inRoot) invokes NewRedisEncoder per
  enumerated index in ascending order, wrapping per-DB errors with
  "redis encoder db_%d" for traceability.

The redis adapter runner now uses encodeAllRedisDBs as its function
value directly (gocritic unlambda).

## Pinned by

- TestEnumerateRedisDBsMissingDir: missing redis/ returns nil indices.
- TestEnumerateRedisDBsMixedEntries: only canonical db_<N> dirs
  kept; non-numeric, negative, leading-zero, empty-suffix, wrong-
  prefix, and regular-file entries are skipped; result sorted
  ascending (asserted via direct index comparison).
- TestEnumerateRedisDBsRedisIsRegularFile: regular file at redis/
  path triggers ErrRedisEncodeNotDir.
- TestEncodeSnapshotRedisMultiDB: a fixture under redis/db_3/ ONLY
  (no db_0) produces strictly more bytes than an empty-redis baseline
  through EncodeSnapshot. Pre-fix, the hardcoded db_0 path would
  have produced an identical header-only .fsm for both inputs.

## Caller audit per CLAUDE.md semantic-change rule

- adapterRunners.redis: closure replaced with encodeAllRedisDBs
  function value (gocritic unlambda). Error contract:
  - errors.Is(err, ErrEncodeAdapterData) — still true (errors.Mark
    in runAdapterEncoders is unchanged).
  - errors.Is(err, ErrRedisEncodeNotDir) etc. — still true
    (errors.Mark + errors.Wrap preserve the inner chain).
  - Error prefix changed from "redis encoder: ..." to
    "redis encoder db_%d: ..." (or "redis encoder enumerate: ..."
    for the enumeration step). More specific, not less.
- NewRedisEncoder direct callers (encode_redis_test.go,
  encode_redis_coll_test.go, encode_redis_hardlink_unix_test.go):
  unaffected — they call NewRedisEncoder directly, never through
  the adapter runner.
- enumerateRedisDBs / parseRedisDBDir / checkRedisRoot / encodeAllRedisDBs
  are new package-private functions; no external callers exist yet.

## Lint compliance

Two cyclop refactor splits required:
- enumerateRedisDBs (initially 12 branches) split into
  checkRedisRoot + parseRedisDBDir helpers. Each is ≤6 branches.
- TestEnumerateRedisDBs (initially 14 branches via t.Run) split
  into three top-level tests. Each is ≤4 branches.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v14 addresses codex P1 v13 (silent data loss for non-default Redis DBs).

codex P1 v13adapterRunners.redis hardcoded NewRedisEncoder(root, 0). RedisEncoder is scoped to a single redis/db_<n>/, so any Phase 1 multi-DB dump containing redis/db_3/, redis/db_5/, etc. would silently drop those DBs — header-only encode for non-default DB keys. Phase 0a/0b inputs only emit db_0 so this is latent, but it fires the moment Phase 1's multi-DB dumper lands.

Fix: two new helpers in encode_snapshot.go:

  • enumerateRedisDBs(inRoot) returns sorted dbIndex values for which <inRoot>/redis/db_<N>/ exists as a directory. Non-canonical entries (non-numeric suffix, negative, leading-zero like db_01, wrong prefix, regular files) are silently skipped. Symlinked/regular-file redis/ path → ErrRedisEncodeNotDir (mirrors per-DB encoder).
  • encodeAllRedisDBs(b, inRoot) invokes NewRedisEncoder per enumerated index in ascending order; per-DB errors wrapped with "redis encoder db_%d".

The redis adapter runner now uses encodeAllRedisDBs as the function value directly (gocritic unlambda).

Pinned by four tests:

  • TestEnumerateRedisDBsMissingDir: missing redis/ → nil indices.
  • TestEnumerateRedisDBsMixedEntries: only canonical db_<N> dirs kept; all the edge-case skip patterns covered.
  • TestEnumerateRedisDBsRedisIsRegularFile: regular file at redis/ErrRedisEncodeNotDir.
  • TestEncodeSnapshotRedisMultiDB: fixture under redis/db_3/ ONLY (no db_0) produces strictly more bytes than empty-redis baseline. Pre-fix both would be identical header-only.

Caller audit (CLAUDE.md semantic-change rule): adapterRunners.redis closure replaced with helper function value. errors.Is(ErrEncodeAdapterData) and errors.Is(ErrRedisEncodeNotDir) chains preserved (errors.Mark + errors.Wrap). Error prefix changed from "redis encoder" to "redis encoder db_%d" — more specific, not less. Direct NewRedisEncoder(\\_, 0) test callers in encode_redis_test.go et al. are unaffected (they bypass the adapter runner).

Lint compliance required two cyclop splits: enumerateRedisDBscheckRedisRoot + parseRedisDBDir; TestEnumerateRedisDBs → three top-level tests.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 2m 48s —— View job


Code Review - PR #904 (v14)

Tasks

  • Gather context and understand the request
  • Read changed files (v14 commit 71ecc12)
  • Review enumerateRedisDBs / encodeAllRedisDBs implementation
  • Review new tests
  • Post final review

v14 Verification — Codex P1 v13 Fix Confirmed

Item Status
adapterRunners(): hardcoded NewRedisEncoder(root, 0) replaced with encodeAllRedisDBs ✅ Line 469; the closure is gone, function value wired directly (gocritic unlambda compliant)
enumerateRedisDBs: missing redis/ → nil indices (no-op) checkRedisRoot returns nil for NotExist; subsequent os.ReadDir NotExist guard returns nil, nil
parseRedisDBDir: canonical form check strconv.Itoa(idx) != suffix ✅ Correctly rejects db_01 ("1""01"), db_-1 (idx < 0), db_ (strconv.Atoi("") = err)
checkRedisRoot: symlink at redis/ level → ErrRedisEncodeNotDir os.Lstat + info.Mode()&os.ModeSymlink != 0 check
encodeAllRedisDBs: per-DB errors wrapped with "redis encoder db_%d" ✅ More specific than the old "redis encoder", traceability preserved
ErrEncodeAdapterData mark preserved for encodeAllRedisDBs errors ✅ Applied in runAdapterEncoders via errors.Mark, covers both enumeration and per-DB errors
TestEnumerateRedisDBsMissingDir: missing redis/ → nil
TestEnumerateRedisDBsMixedEntries: only canonical db_N dirs kept ✅ All skip patterns covered: db_garbage, db_-1, db_01, db_, notdb_2, regular file
TestEnumerateRedisDBsRedisIsRegularFile: regular file at redis/ErrRedisEncodeNotDir
TestEncodeSnapshotRedisMultiDB: db_3-only fixture produces more bytes than empty baseline ✅ Pre-fix both would produce identical header-only output; post-fix db_3 string is included

One Minor Doc Nit (non-blocking)

enumerateRedisDBs handles NotExist at two distinct points; neither comment explains the split

internal/backup/encode_snapshot.go:376-396:

checkRedisRoot returns nil when os.Lstat reports NotExist — its comment says "Missing is allowed (caller returns nil indices)." But that return path doesn't actually produce nil indices; after checkRedisRoot returns nil, enumerateRedisDBs proceeds to os.ReadDir(redisDir), which also returns NotExist (for the same missing dir), and THAT check at line 383 is what actually returns nil, nil.

The split is correct and intentional — checkRedisRoot gates on bad-shape (symlink/non-dir) detection ONLY; the NotExist-from-ReadDir guard handles the missing case. A future reader may wonder why checkRedisRoot "allows" NotExist if ReadDir is going to catch it anyway. A one-line clarification on checkRedisRoot's NotExist case would close the gap:

case errors.Is(err, os.ErrNotExist):
    // Missing redis/ is valid (no DB dirs → no-op); the caller
    // handles it via the os.ReadDir NotExist guard below.
    return nil

Not a blocker — the current code is correct.


One Non-blocking Observation

No end-to-end self-test with a multi-DB Redis fixture

TestEncodeSnapshotRedisMultiDB verifies the multi-DB fan-out via byte-count comparison, and TestEncodeSnapshotSelfTestMatchesInput verifies the self-test path — but neither test combines both: a multi-DB Redis fixture run through EncodeSnapshot with SelfTest=true. The key concern is whether diffAdapterTrees correctly walks redis/db_3/ in both input and scratch when adapters.Redis=true.

The path is: enabledAdapterSubdirs returns ["redis"]diffOneSubdir(inputRoot/"redis", scratchRoot/"redis", "redis") which recurses into all db_N subdirs via walkRegularFilePaths. This is correct as coded. The gap is purely test coverage; the logic is fine.


Strengths in v14

  • parseRedisDBDir's canonical form check (strconv.Itoa(idx) != suffix) is the right idiom for this: it rejects leading-zero forms, trailing whitespace (Atoi would error anyway), and any other non-canonical representation in a single comparison — no additional regex or string scanning needed.
  • checkRedisRoot using os.Lstat (not os.Stat) is correct: os.Stat would follow the symlink and see a directory, missing the symlink case entirely. The comment explains the security rationale.
  • encodeAllRedisDBs is function-value-compatible with adapterRunner.encode func(*snapshotBuilder, string) error directly (same signature), so gocritic unlambda accepts it without a wrapper closure — a clean improvement over the old three-line closure.
  • TestEncodeSnapshotRedisMultiDB's baseline comparison correctly pins the regression: pre-fix, BOTH the db_3 fixture AND an empty tree would produce identical header-only output. Post-fix, db_3 produces strictly more bytes. This is the minimum assertion that proves the fix without needing content-level decoding in the test.
  • TestEnumerateRedisDBsMixedEntries covers all five documented skip patterns in one table pass, each with a comment — easy to verify against the parseRedisDBDir implementation.

Verdict: Ready to merge. The codex P1 v13 fix is correctly implemented and pinned by four tests. No new issues introduced. The one doc nit (checkRedisRoot NotExist comment) is non-blocking.

| Branch |

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 71ecc12699

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/elastickv-snapshot-encode/main.go Outdated
// the old bytes on disk would make the sidecar describe an
// FSM that does not exist and violate the "self-test failure
// leaves no restore-visible FSM" contract (codex P2 v10 #904).
if rerr := os.Remove(cfg.outputPath); rerr != nil && !errors.Is(rerr, os.ErrNotExist) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid deleting directories during mismatch cleanup

When --self-test fails and --output accidentally names a directory rather than an existing .fsm file, this cleanup calls os.Remove on that path; for an empty directory it deletes the directory instead of just clearing a stale snapshot. The normal publish path would only fail the later rename, so the destructive behavior is specific to this mismatch cleanup and should be guarded by a stat/type check that only removes regular stale FSM files.

Useful? React with 👍 / 👎.

Comment thread cmd/elastickv-snapshot-encode/main.go Outdated
Comment on lines +90 to +99
if errors.Is(err, backup.ErrSelfTestLowerLastCommitTS) {
return exitDataErr, err
}
if errors.Is(err, backup.ErrEncodeUnsupportedDynamoDBLayout) {
return exitDataErr, err
}
if errors.Is(err, backup.ErrEncodeAdapterData) {
return exitDataErr, err
}
if errors.Is(err, errSelfTestMismatch) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Classify corrupt manifests as data failures

When MANIFEST.json is present but invalid or from an unsupported format version, readInputManifest returns backup.ErrInvalidManifest/backup.ErrUnsupportedFormatVersion, but neither sentinel is mapped here so the CLI exits 1. That treats a broken dump tree as an operator flag/path error, despite the encode CLI contract separating bad backup data from user-input mistakes with exit 2, which can make runbooks skip their quarantine/repair path for corrupt manifests.

Useful? React with 👍 / 👎.

## Codex P2 v14 (1/2): mismatch cleanup must not delete directories

v12 added os.Remove(cfg.outputPath) on the self-test-mismatch branch
of writeAndPublish to keep the sidecar+FSM internally consistent.
That removal was unconditional. If --output names a directory (an
operator typo — e.g. forgot the .fsm filename and passed a directory
instead), the normal publish path would have failed at os.Rename,
but the mismatch cleanup branch would silently call os.Remove on the
directory and, for an empty one, delete it. Destructive behavior
specific to the mismatch path.

Fix: new helper removeStaleOutputFSM(outputPath, logger) performs an
os.Lstat first and only proceeds with os.Remove when info.Mode()
reports IsRegular. Non-regular files (directory, symlink, named
pipe, etc.) are logged at warn level and left alone. Stat errors
other than ErrNotExist are also warn-and-continue.

Pinned by TestCLISelfTestMismatchSkipsDirectoryAtOutputPath: pre-
places an empty directory at --output, drives a self-test mismatch
via the corruption seam, asserts publishErr == errSelfTestMismatch
AND the directory is still present + still a directory.

## Codex P2 v14 (2/2): classify corrupt manifests as data failures

readInputManifest returns backup.ErrInvalidManifest (invalid JSON or
schema-invariant violation) and backup.ErrUnsupportedFormatVersion
(format_version unknown) when the input MANIFEST.json is broken.
Neither sentinel was mapped in run(), so the CLI exited 1 — treating
a broken dump tree as an operator-flag error and breaking runbook
recovery paths that branch on exit status to triage corrupt-input
vs operator-typo.

Fix: extracted the error→exit-code mapping into classifyEncodeError
and added ErrInvalidManifest + ErrUnsupportedFormatVersion to the
exit-2 set. Switch-with-multiple-cases form keeps the function under
nestif/cyclop.

Pinned by TestCLIInvalidManifestExitsTwo (subtests: invalid JSON
body; unsupported format_version). Both subtests assert run() exits
exitDataErr.

## Caller audit per CLAUDE.md semantic-change rule

- writeAndPublish self-test mismatch branch: os.Remove call replaced
  with removeStaleOutputFSM. Sole caller is encodeOne; the
  publishErr contract is unchanged (still errSelfTestMismatch);
  on-disk state changes ONLY when --output was a non-regular file,
  in which case the prior behavior was destructive AND wrong.
  No legitimate caller is impacted.
- run() error classification: extracted to classifyEncodeError.
  Sole caller is run(). Two additional sentinels now classify as
  exit-2 (ErrInvalidManifest, ErrUnsupportedFormatVersion); all
  prior exit-2 sentinels still classify as exit-2; exit-1 paths
  are strictly a subset of the prior set. No runbook that
  branches on exit-2 would regress; runbooks that branch on
  exit-1 will see corrupt-manifest cases move to exit-2 where
  they belong.

## Lint compliance

run() had nestif complexity 5 (six sequential errors.Is branches).
classifyEncodeError uses a switch with multiple errors.Is cases per
arm, which drops the complexity score below the linter bound.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v15 addresses both codex P2 v14 findings.

codex P2 v14 (1/2) — mismatch cleanup must not delete directories: my v12 os.Remove(cfg.outputPath) on the self-test-mismatch branch was unconditional. If --output is a directory (operator typo), it would silently delete an empty directory. Fixed with new helper removeStaleOutputFSM that os.Lstats first and only removes if info.Mode().IsRegular(). Pinned by TestCLISelfTestMismatchSkipsDirectoryAtOutputPath (pre-places a directory at --output, drives self-test mismatch via the corruption seam, asserts the directory is preserved).

codex P2 v14 (2/2) — corrupt manifests should be exit 2: readInputManifest surfaces backup.ErrInvalidManifest and backup.ErrUnsupportedFormatVersion, neither was mapped, so the CLI exited 1 (treating bad dump data as operator typo). Fixed by extracting the error mapping into classifyEncodeError and adding both sentinels to the exit-2 set. Pinned by TestCLIInvalidManifestExitsTwo (subtests: invalid JSON body, unsupported format_version).

Caller audit (CLAUDE.md semantic-change rule):

  • writeAndPublish self-test branch: removeStaleOutputFSM replaces os.Remove. Sole caller is encodeOne; publishErr contract unchanged; on-disk state only differs for non-regular-file --output cases where prior behavior was destructive.
  • run() error mapping: extracted to classifyEncodeError. Two additional sentinels classify as exit-2; all prior exit-2 sentinels still classify as exit-2; exit-1 is strictly a subset of the prior set.

Lint: run() had nestif complexity 5; classifyEncodeError uses switch-with-multi-case-arms which keeps it under the linter bound.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 2m 36s —— View job


Code Review - PR #904 (v15)

Tasks

  • Gather context and understand the request
  • Read changed files (v15 commit af8c279)
  • Verify Fix 1: codex P2 v14 (1/2) — removeStaleOutputFSM guards against directory removal
  • Verify Fix 2: codex P2 v14 (2/2) — classifyEncodeError maps manifest errors to exit 2
  • Check for any new issues introduced in v15
  • Post final review

v15 Verification — Both codex P2 v14 Items Confirmed

Item Status
removeStaleOutputFSM helper: uses os.Lstat (not os.Stat) ✅ Symlinks at --output are caught by the IsRegular() check
removeStaleOutputFSM: !info.Mode().IsRegular() guard ✅ Only regular files are removed; directories and special files are logged-and-skipped
removeStaleOutputFSM: os.ErrNotExist on lstat → silent return ✅ First-ever encode case (no prior FSM) is handled
removeStaleOutputFSM: non-NotExist lstat failure → warn-and-continue ✅ Caller's errSelfTestMismatch stays as the dominant signal
writeAndPublish inline comment updated ✅ Lines 386-390 now explain both the v10 motivation and the v14 mode-check guard
classifyEncodeError extracted from run() ✅ switch/multi-case syntax keeps the function under nestif bound; run() becomes one line
classifyEncodeError: backup.ErrInvalidManifest → exit 2 ✅ Line 112; readInputManifest wraps ReadManifest error which carries the sentinel
classifyEncodeError: backup.ErrUnsupportedFormatVersion → exit 2 ✅ Line 113; same chain
All five prior exit-2 sentinels preserved ErrSelfTestLowerLastCommitTS, ErrEncodeUnsupportedDynamoDBLayout, ErrEncodeAdapterData, errSelfTestMismatch all in the switch
classifyEncodeError godoc lists each sentinel's source ✅ Lines 89-118; full provenance with codex/claude citation per convention
TestCLISelfTestMismatchSkipsDirectoryAtOutputPath ✅ Pre-places empty dir at --output; drives mismatch via corruption seam; asserts publishErr == errSelfTestMismatch AND os.Stat(out) succeeds AND info.IsDir()
TestCLIInvalidManifestExitsTwo "invalid JSON body" {not jsonReadManifest returns ErrInvalidManifestclassifyEncodeError → exit 2
TestCLIInvalidManifestExitsTwo "unsupported format_version" format_version:99ReadManifest returns ErrUnsupportedFormatVersion → exit 2

The errors.Is chain is preserved: readInputManifest does errors.Wrap(err, "read manifest") on the ReadManifest return; cockroachdb/errors.Wrap preserves the sentinel chain so errors.Is(err, backup.ErrInvalidManifest) resolves correctly through two wraps.


One Non-Blocking Observation

removeStaleOutputFSM silently skips a symlink to a regular FSM file

If --output is a symlink to a real .fsm file (an unusual but valid pattern), os.Lstat returns ModeSymlink, so IsRegular() is false and the stale file at the symlink target is left in place after a self-test mismatch. The published sidecar (matched=false, new SHA) then describes a snapshot that was not published, while the prior FSM bytes still exist on disk under a different path.

The v14 codex finding was specifically about directories; the symlink case is a pre-existing edge that v14 didn't surface, and the conservative behavior (leave non-regular alone) is defensible. Not a blocker.


Strengths in v15

  • removeStaleOutputFSM is correctly extracted as a standalone helper with a clear single responsibility — the writeAndPublish body is now readable without the multi-line inline remove logic.
  • classifyEncodeError as a named function is better than the prior inline if-chain for three reasons: it can be tested independently, the godoc clearly documents the full contract, and the switch/multi-case form is a natural expansion point for a future sixth sentinel without risking nestif lint failures.
  • The TestCLISelfTestMismatchSkipsDirectoryAtOutputPath test directly calls writeAndPublish rather than going through run(), correctly isolating the mismatch-cleanup behavior from the CLI-flag-parsing and encodeOne layers.
  • TestCLIInvalidManifestExitsTwo uses two subtests with t.Parallel() covering the two distinct manifest-error sentinels independently — a future change that accidentally merges them would surface in CI immediately.

Verdict: Ready to merge. Both codex P2 v14 items are correctly implemented and pinned by tests. No new issues in v15. All 14 prior review rounds' findings are resolved.
| Branch

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af8c279c4c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/backup/encode_snapshot.go Outdated
Comment on lines +451 to +452
for _, idx := range indices {
if err := NewRedisEncoder(inRoot, idx).Encode(b); err != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject Redis DB fan-out until DBs are encoded

When the input contains redis/db_3 or multiple redis/db_N directories, this loop feeds each directory into the same snapshotBuilder, but NewRedisEncoder uses dbIndex only to choose the input directory and emits the same !redis|... MVCC keys with no database component. The restored/decode path also constructs a single RedisDB from DecodeOptions.RedisDBIndex (default db_0), so distinct databases are merged, same-named keys across DBs fail as duplicates, and a db_3-only self-test decodes under db_0 instead of round-tripping; until the native keys preserve DB identity, non-zero/multiple Redis DB inputs should fail closed rather than publish a mis-scoped snapshot.

Useful? React with 👍 / 👎.

Comment thread internal/backup/encode_snapshot.go Outdated
Comment on lines +425 to +427
func parseRedisDBDir(ent os.DirEntry) (int, bool) {
if !ent.IsDir() {
return 0, false
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail closed on malformed Redis db_N entries

When a canonical Redis database entry such as redis/db_0 is a regular file or symlink, ent.IsDir() is false and this parser silently skips it, so encodeAllRedisDBs can succeed with no Redis records and publish a header-only/partial FSM. The per-DB encoder's existing Lstat guard would reject these shapes with ErrRedisEncodeNotDir; enumeration should preserve that fail-closed behavior for names matching db_<N> instead of treating malformed dump contents as an absent DB.

Useful? React with 👍 / 👎.

v14 added an enumerate-and-fan-out path for redis/db_<N>/ to address
codex P1 v13's silent-data-loss concern. Codex's v14 follow-up
clarified that the fan-out is itself incorrect, and two P2 findings
landed (L452 + L427).

## Codex P2 v14 L452: Reject Redis DB fan-out until DBs are encoded

The Redis MVCC key prefixes (!redis|str|, !redis|hll|, !redis|ttl|,
plus the collection helpers' analogues) carry NO database component.
Feeding two distinct DBs into the same snapshotBuilder would:
- collide on same-named keys across DBs (b.Add returns ErrDuplicate),
  or
- merge keys under db_0 at decode time (DecodeOptions.RedisDBIndex
  defaults to 0, so a db_3-only self-test would decode under db_0
  and the structural diff would fail).

In either case the produced .fsm is mis-scoped: the v14 fan-out
replaces silent-data-loss with silent-merge-or-collide.

Fix: encodeAllRedisDBs now enumerates and decides:
- 0 indices (no redis/, or empty redis/) → no-op.
- [0] only → proceed with NewRedisEncoder(_, 0) (the legacy path).
- anything else → fail closed with new sentinel
  ErrRedisEncodeMultiDBUnsupported.

The sentinel is marked at the runAdapterEncoders boundary with
ErrEncodeAdapterData so the CLI routes it to exit-2. Until Phase 1
makes native keys DB-aware, multi-DB inputs are quarantined as
data-correctness failures rather than silently mis-scoped output.

## Codex P2 v14 L427: Fail closed on malformed Redis db_N entries

The v14 parseRedisDBDir(ent) returned (_, false) whenever
!ent.IsDir(), so a regular file or symlink at redis/db_0 would be
silently skipped and the encoder would publish a header-only .fsm.

Fix: split into parseRedisDBName (pure name parser) and shift the
ent.IsDir() check into enumerateRedisDBs. When the name matches the
canonical db_<N> pattern AND the entry is not a directory,
enumerateRedisDBs returns ErrRedisEncodeNotDir (mirroring the per-DB
encoder's existing Lstat guard).

## Pinned by

- TestEncodeSnapshotRedisRejectsNonZeroDB: redis/db_3-only fixture
  through EncodeSnapshot → errors.Is ErrRedisEncodeMultiDBUnsupported
  AND errors.Is ErrEncodeAdapterData (both fire).
- TestEncodeSnapshotRedisRejectsMultipleDBs: redis/db_0 + redis/db_3
  → errors.Is ErrRedisEncodeMultiDBUnsupported.
- TestEnumerateRedisDBsRejectsNonDirDBEntry: redis/db_2 as a regular
  file → errors.Is ErrRedisEncodeNotDir.
- The replaced TestEncodeSnapshotRedisMultiDB (v14) was reformulated
  as TestEncodeSnapshotRedisRejectsNonZeroDB; the prior "more bytes
  than baseline" assertion no longer holds because the encoder
  rejects the fixture upfront.

## Caller audit per CLAUDE.md semantic-change rule

- encodeAllRedisDBs: sole caller is adapterRunners.redis. Semantic
  change: was "fan out across all db_<N>"; now "encode db_0 only,
  fail closed on anything else." Error contract:
  - errors.Is(err, ErrEncodeAdapterData) — still true (mark applied
    at runAdapterEncoders, unchanged).
  - errors.Is(err, ErrRedisEncodeMultiDBUnsupported) — new path.
  - errors.Is(err, ErrRedisEncodeNotDir) — newly reachable via
    enumerateRedisDBs for malformed db_<N> entries.
  All in-tree NewRedisEncoder direct callers (encode_redis_test.go,
  encode_redis_coll_test.go, encode_redis_hardlink_unix_test.go) are
  unaffected — they bypass the adapter runner.
- enumerateRedisDBs: was "silently skip non-dir db_<N>"; now "fail
  closed with ErrRedisEncodeNotDir." No production callers exist
  outside encodeAllRedisDBs; tests in encode_snapshot_test.go cover
  the new fail-closed path.
- parseRedisDBDir → parseRedisDBName: signature change (DirEntry →
  string). Sole caller is enumerateRedisDBs (updated in lock-step).
- v14's TestEncodeSnapshotRedisMultiDB is replaced — see above.
  TestEnumerateRedisDBsMixedEntries (v14) continues to pass: its
  positive entries (db_0, db_1, db_5) are all directories, so the
  IsDir check it never exercises remains exercised by the new
  TestEnumerateRedisDBsRejectsNonDirDBEntry.

Tests + lint green.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented Jun 2, 2026

@claude review

v16 corrects v14's Redis multi-DB fan-out per codex P2 v14 (L452 + L427).

codex P2 v14 L452 (semantic correctness) — v14's fan-out across redis/db_<N> was incorrect: the Redis MVCC key prefixes (!redis|str|, !redis|hll|, !redis|ttl|, …) carry NO database component, so feeding multiple DBs into the same snapshotBuilder either collides on same-named keys (b.Add returns ErrDuplicate) or merges everything under db_0 on restore (DecodeOptions.RedisDBIndex defaults to 0). v14 replaced the silent-data-loss bug with a silent-merge bug.

Fix: encodeAllRedisDBs now enumerates and decides:

  • 0 indices → no-op.
  • [0] only → legacy single-DB encode (the only correct case under the current key format).
  • anything else → fail closed with new sentinel ErrRedisEncodeMultiDBUnsupported.

Marked with ErrEncodeAdapterData at the runAdapterEncoders boundary, so the CLI routes it to exit-2 automatically. Until Phase 1 makes native keys DB-aware, multi-DB inputs are quarantined.

codex P2 v14 L427 (malformed db_)parseRedisDBDir(ent) returned (_, false) when !ent.IsDir(), silently skipping regular files at redis/db_0 etc. Fix: split into pure-name parser parseRedisDBName(name) and shift the dir check into enumerateRedisDBs, where a name-matches-but-not-a-dir entry returns ErrRedisEncodeNotDir (mirrors the per-DB encoder's existing Lstat guard).

Pinned by three tests:

  • TestEncodeSnapshotRedisRejectsNonZeroDB: db_3-only fixture → ErrRedisEncodeMultiDBUnsupported AND ErrEncodeAdapterData (both fire).
  • TestEncodeSnapshotRedisRejectsMultipleDBs: db_0 + db_3 → ErrRedisEncodeMultiDBUnsupported.
  • TestEnumerateRedisDBsRejectsNonDirDBEntry: db_2 as a regular file → ErrRedisEncodeNotDir.

Caller audit (CLAUDE.md semantic-change rule):

  • encodeAllRedisDBs: sole caller is adapterRunners.redis. Was "fan out"; now "db_0 only or fail closed." Error chain (ErrEncodeAdapterData, ErrRedisEncodeMultiDBUnsupported, ErrRedisEncodeNotDir) all preserved via errors.Mark + errors.Wrap.
  • enumerateRedisDBs: was "silently skip non-dir db_"; now "fail closed with ErrRedisEncodeNotDir." Existing positive tests (TestEnumerateRedisDBsMixedEntries) pass real directories so they remain green.
  • parseRedisDBDir(ent DirEntry)parseRedisDBName(name string): signature change. Sole caller updated in lock-step. No external callers.
  • v14's TestEncodeSnapshotRedisMultiDB replaced — the prior assertion no longer applies under fail-closed semantics.
  • Direct NewRedisEncoder test callers (encode_redis_test.go et al.) bypass the adapter runner; unaffected.

go test -race ./internal/backup/... ./cmd/elastickv-snapshot-encode/... + golangci-lint green.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 2, 2026

Claude finished @bootjp's task in 2m 50s —— View job


Code Review - PR #904 (v16)

Tasks

  • Gather context and understand the request
  • Read changed files (v16 commit b52a688)
  • Verify Fix 1: codex P2 v14 L452 — Redis multi-DB fail-closed semantics
  • Verify Fix 2: codex P2 v14 L427 — malformed db_<N> entry fail-closed
  • Check for any new issues introduced in v16
  • Post final review

v16 Verification — Both codex P2 v14 Items Confirmed

Item Status
ErrRedisEncodeMultiDBUnsupported sentinel defined with full godoc ✅ Lines 44–56; history note covers both the v13 P1 fan-out rationale and why v14 fan-out was wrong
encodeAllRedisDBs: 0 indices → no-op len(indices) == 0 branch at line 490
encodeAllRedisDBs: non-zero or multiple DBs → fail-closed len(indices) > 1 || indices[0] != 0ErrRedisEncodeMultiDBUnsupported at lines 493–495
encodeAllRedisDBs: db_0 only → NewRedisEncoder(inRoot, 0).Encode(b) ✅ Line 497; the only correct path under current key format
ErrRedisEncodeMultiDBUnsupported marked with ErrEncodeAdapterData ✅ Via errors.Mark(err, ErrEncodeAdapterData) in runAdapterEncoders line 543; both errors.Is calls hold
enumerateRedisDBs: canonical db_<N> non-dir → ErrRedisEncodeNotDir ✅ Lines 414–418; !ent.IsDir() check after parseRedisDBName succeeds
parseRedisDBName: pure name parser (IsDir check removed) ✅ Signature changed from (ent os.DirEntry) to (name string); dir check correctly shifted to caller
Symlink at redis/db_<N> also fails closed os.DirEntry.IsDir() does not follow symlinks; symlinks return IsDir()==falseErrRedisEncodeNotDir
TestEncodeSnapshotRedisRejectsNonZeroDB ✅ db_3-only → asserts err != nil + errors.Is(ErrRedisEncodeMultiDBUnsupported) + errors.Is(ErrEncodeAdapterData)
TestEncodeSnapshotRedisRejectsMultipleDBs ✅ db_0 + db_3 → ErrRedisEncodeMultiDBUnsupported
TestEnumerateRedisDBsRejectsNonDirDBEntry ✅ Regular file at redis/db_2ErrRedisEncodeNotDir (via enumerateRedisDBs directly)
v14's TestEncodeSnapshotRedisMultiDB replaced ✅ Old assertion (more bytes than baseline) no longer applies under fail-closed semantics

The decision logic in encodeAllRedisDBs is correct: len(indices) > 1 || indices[0] != 0 precisely covers the three bad cases (non-zero only, mixed, multiple non-zero) in one branch without requiring enumeration of every combination.


One Minor Observation (non-blocking)

Two new rejection tests don't assert buf.Len() == 0

internal/backup/encode_snapshot_test.go:485-500 (TestEncodeSnapshotRedisRejectsNonZeroDB) and 515-527 (TestEncodeSnapshotRedisRejectsMultipleDBs) both declare var buf bytes.Buffer and pass &buf to EncodeSnapshot, but neither asserts buf.Len() == 0 after the rejection.

All five prior rejection tests consistently include this assertion:

  • TestEncodeSnapshotRejectsMissingInputRoot: buf.Len() != 0 → t.Errorf
  • TestEncodeSnapshotRejectsLowManifestFloor: buf.Len() != 0 → t.Errorf
  • TestEncodeSnapshotRejectsDynamoDBJSONLLayout: buf.Len() != 0 → t.Errorf
  • TestEncodeSnapshotRejectsZeroAdapterSet: buf.Len() != 0 → t.Errorf
  • TestEncodeSnapshotMarksAdapterDataErrors: buf.Len() != 0 → t.Errorf

The underlying behavior is correct — runAdapterEncoders fails before any bytes reach out — but the invariant is unverified for the Redis multi-DB rejection path. Worth adding for consistency.


Strengths in v16

  • The correction from "fan-out" to "fail-closed" is precisely reasoned: the root cause (no DB component in MVCC key prefixes) is correctly identified, and the sentinel godoc explains both the mechanism and the forward path (Phase 1 keys becoming DB-aware).
  • The len(indices) > 1 || indices[0] != 0 condition is elegant: it handles db_3-only AND db_0+db_3 in a single predicate. The three tests (non-zero only, multiple, non-dir) form a minimal complete set.
  • parseRedisDBName's signature change from (ent DirEntry) to (name string) is a clean pure-function refactor — the dir-type validation is now unambiguously the caller's responsibility, documented both in the function godoc and in the call site inline comment.
  • The TestEncodeSnapshotRedisRejectsNonZeroDB two-assertion check (errors.Is(ErrRedisEncodeMultiDBUnsupported) AND errors.Is(ErrEncodeAdapterData)) mirrors the pattern established by TestEncodeSnapshotMarksAdapterDataErrors in v10, correctly pinning both sides of the errors.Mark chain.
  • TestEnumerateRedisDBsRejectsNonDirDBEntry calls enumerateRedisDBs directly (not through EncodeSnapshot), which is the right seam: it isolates the parse-plus-shape check from the adapter runner's error marking.

Verdict: Ready to merge. Both codex P2 v14 items are correctly implemented. The minor observation (buf.Len() == 0 assertion absent from two new tests) is non-blocking — the functional behavior is correct and the error-type assertions cover the more important invariant. All 15 prior review rounds' findings remain intact.
| Branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant