fix(buffers): disk buffers v2 reopen buffer after restart crash loop#25570
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
7b0865a to
db4960b
Compare
This comment has been minimized.
This comment has been minimized.
3614c5a to
5209ae6
Compare
db4960b to
e3cc97c
Compare
e3cc97c to
f6542c0
Compare
3cce5aa to
84df4d7
Compare
f6542c0 to
61e1f3e
Compare
6cf0d05 to
597b732
Compare
84df4d7 to
c066e34
Compare
597b732 to
b3c5011
Compare
8ff610c to
4811fb8
Compare
05d79f8 to
fe305fa
Compare
4811fb8 to
85cfa4b
Compare
85cfa4b to
d40777e
Compare
fe305fa to
17992e7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17992e7285
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| data_file_path = data_file_path.to_string_lossy().as_ref(), | ||
| "Reader resume data file is missing; it was fully acknowledged before deletion. Advancing past it." | ||
| ); | ||
| self.ledger.increment_acked_reader_file_id(); |
There was a problem hiding this comment.
Repair ledger accounting before skipping missing files
When a crash happens after delete_completed_data_file unlinks a data file but before its mmap changes are durably flushed, the restored ledger can still include that file's bytes and last-read record state even though the file is gone. This branch only advances the reader file id, so the skipped file's size remains in total_buffer_size and its record ids may be treated as gaps; after the remaining files drain, the buffer can still appear non-empty/full and the reader can wait for a next writer file instead of reaching an empty state. The recovery path needs to make the same accounting durable before unlink, or explicitly repair the skipped file's ledger state here.
Useful? React with 👍 / 👎.
17992e7 to
6dfab2e
Compare
d40777e to
7ecc846
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6dfab2e1a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| data_file_path = data_file_path.to_string_lossy().as_ref(), | ||
| "Reader resume data file is missing; it was fully acknowledged before deletion. Advancing past it." | ||
| ); | ||
| self.ledger.increment_acked_reader_file_id(); |
There was a problem hiding this comment.
Do not assume missing reader files were acknowledged
This branch also matches buffers affected by the creation-side durability bug this patch calls out: before this change (and still on non-Unix, where sync_directory is a no-op), a crash after the ledger/file contents were flushed but before the new data file's directory entry was durable can leave reader_file_id < writer_file_id with the reader's data file missing even though none of its records were delivered. Incrementing the acked reader file id here silently skips those unread records (and leaves the ledger's byte/record accounting inconsistent), so the recovery path should distinguish/repair the deleted-after-ack case rather than treating every missing file below the writer as fully acknowledged.
Useful? React with 👍 / 👎.
7ecc846 to
b4576c4
Compare
6dfab2e to
80aa8c9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 80aa8c93f1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let data_file_mmap = self | ||
| .ledger | ||
| .filesystem() |
There was a problem hiding this comment.
Recheck reader position before mmaping writer file
When recovery skips a missing reader file here, ensure_ready_for_read() can advance reader_file_id until it equals writer_file_id and open the writer's current file. Because the outer while condition is not rechecked before this mmap, a crash after the writer rolled to a freshly created but still-empty next data file still reopens into open_mmap_readable() on that empty writer file, which fails instead of falling back to the slow path. Recheck the reader/writer IDs after ensure_ready_for_read() and break once they are equal.
Useful? React with 👍 / 👎.
80aa8c9 to
cf9f640
Compare
We have observed in Antithesis scenario `vector_to_vector_e2e_disk` that the disk buffer can be driven to a state where Vector will crash loop without recovery. This happens when the reader is set to resume from a data file that no longer exists and will not be created. Strictly, the disk buffer reader is meant to chase the writer. This happens because the deletion of completed data files is not atomic. When a reader finishes an acks a data file `delete_completed_data_file`: * unlinks the .dat file * advances the reader id in the ledger * fsyncs the ledger If Vector crashes after the unlink but before the ledger is fsync'ed this will cause the next instance of Vector to attempt to read a .dat that will never appear, crashing said instance. Mostly. OS page cache complicates this some, as the newly introduced `reopen_recovers_with_reader_resume_data_file_is_missing` discusses. REF SMPTNG-749
This commit fixes the crash-loop by carefully re-arranging when the ledger is updated and ensuring that directory metadata is also fsynced. This later is important as ledger still updates after the unlink takes place. If a file is gone we can assume that it has been read iff the ledger points to a new dat file, meaning that the data was previously read entirely and readers may skip the 'stale' reference. I have left this skip event as a warn! but considering how likely this is in practice in the event of node termination, for example, it might rightly be a debug. Overflow of dat file indexes remains a concern but 1. I would like to demonstrate this in an antithesis run and 2. that belongs in a separate commit.
b4576c4 to
f538c36
Compare
cf9f640 to
0e78f49
Compare

Summary
We have observed in Antithesis scenario
vector_to_vector_e2e_diskthatthe disk buffer can be driven to a state where Vector will crash loop
without recovery. This happens when the reader is set to resume from a
data file that no longer exists and will not be created.
Strictly, the disk buffer reader is meant to chase the writer. This
happens because the deletion of completed data files is not atomic. When
a reader finishes an acks a data file
delete_completed_data_file:If Vector crashes after the unlink but before the ledger is fsync'ed
this will cause the next instance of Vector to attempt to read a .dat
that will never appear, crashing said instance. Mostly. OS page cache
complicates this some, as the newly introduced
reopen_recovers_with_reader_resume_data_file_is_missingdiscusses.Vector configuration
How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.