fix: confirm durable writes before advancing indexer state#784
Open
damilolaedwards wants to merge 1 commit into
Open
fix: confirm durable writes before advancing indexer state#784damilolaedwards wants to merge 1 commit into
damilolaedwards wants to merge 1 commit into
Conversation
Three indexer paths advanced a cursor or dropped their in-memory copy of data before confirming the durable write, and only logged the write error, so a transient failure lost the data with no retry. - The system-contract transaction matcher discarded the persistMatches error and advanced MatchHeight regardless, so a failed range was skipped and its request rows kept an empty tx hash forever. Persist the matches and the height together and roll the height back on error. - The bid cache removed bids from the cache before writing them, losing the flushed window on a transient db error. Write first, drop the cached copies only after the write succeeds. - Epoch finalization advanced lastFinalizedEpoch and freed the block bodies even when the blockdb write failed, after the unfinalized_blocks rows had already been deleted, leaving the bodies unrecoverable. Trigger a resync of the epoch on a blockdb write failure instead of advancing. Add tests for the matcher height rollback and the bid-cache write ordering.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three indexer paths mutated durable-facing state (a cursor, or the in-memory copy of data) before the corresponding database or blockdb write was confirmed, and only logged the write error. On a transient failure the data was then lost with no retry. Each is fixed by confirming the write first and propagating the error.
Transaction matcher drops a match on a persist error
runTransactionMatcheradvancedMatchHeightunconditionally and discarded thepersistMatchesreturn value, returning only the state-persist result. On sqlite a failed statement does not abort the transaction, so the tx committed with the match missing but the height advanced; on postgres the tx rolled back but the in-memory height did not. Either way the failed range was never revisited and its request rows kept an empty tx hash. The matches and the advanced height are now persisted together in one transaction, and the in-memory height is rolled back if that transaction fails, so the range is retried.Bid cache loses the flushed window on a db error
checkAndFlushandflushAllremoved bids from the cache (and advancedminSlot) before callingInsertBids; on error they logged and returned with the bids already gone, and they cannot be re-added. The write now happens first, and the cached copies are dropped only after it succeeds.Finalization can leave a block body unrecoverable on a blockdb write error
finalizeEpochdeletes theunfinalized_blocksrows in the committed finalization transaction, then writes the block bodies to the blockdb with only-logged errors, then advanceslastFinalizedEpochand frees the bodies from the cache regardless of the write result. If the blockdb write failed, the body then existed nowhere. On a blockdb write failure the epoch now returns without advancing, which routes it through the same resync path already used for a failed finalization transaction, leaving the cached bodies in place for the resync to rewrite.Tests
Both fail without their fix. The finalization change has no unit test (that path needs a full indexer) and reuses the existing resync behavior.
go build,go vetandgofmtare clean.