Ensure fork suppressor only triggers for includable seals#8470
Ensure fork suppressor only triggers for includable seals#8470jordanschalm merged 5 commits intomasterfrom
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
📝 WalkthroughWalkthroughExecForkSuppressor replaces per-block seal sets with per-block sets of IncorporatedResult IDs, defers includability checks to the wrapped mempool at query time, adds height-based pruning, and reworks conflict detection to translate IDs to includable seals, persist fork evidence, and invoke a callback when forks are detected. (41 words) Changes
Sequence Diagram(s)sequenceDiagram
participant Suppressor as ExecForkSuppressor
participant Pool as WrappedMempool
participant Store as Persistence
participant CB as ForkCallback
Suppressor->>Pool: Add(seal) — forward to wrapped pool
Pool-->>Suppressor: Accept/Reject
Suppressor->>Suppressor: record IncorporatedResult ID in sealsForBlock and byHeight
Suppressor->>Pool: Get(query for block) — request candidate seals
alt multiple IncorporatedResult IDs present
note right of Suppressor: translate IDs -> candidate seals
Suppressor->>Pool: Get(seal by ID) per-ID to check includability
Pool-->>Suppressor: return includable seals
end
Suppressor->>Suppressor: filterConflictingSeals(includable seals)
alt conflict detected
Suppressor->>Store: persist fork evidence
Store-->>Suppressor: ack
Suppressor->>CB: execForkDetected callback
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
module/mempool/consensus/exec_fork_suppressor_test.go (1)
237-241: Add a regression case for “non-includable conflicting seal” onGet.These updates verify extra
Getinvocations, but they don’t explicitly assert the core fix path. Please add a case where a conflicting seal exists insealsForBlockbutwrappedMempool.Get(conflictingID)returnsfalse, and verifyOnExecForkis not triggered.Also applies to: 278-281
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@module/mempool/consensus/exec_fork_suppressor_test.go` around lines 237 - 241, Add a regression sub-case that simulates a non-includable conflicting seal by stubbing wrappedMempool.Get to return (nil, false) for the conflicting seal ID and ensuring OnExecFork is not invoked; specifically, in the test around the existing wrappedMempool.On("Get", conflictingSeal.IncorporatedResultID()) calls, add an alternative expectation where wrappedMempool.On("Get", conflictingSeal.IncorporatedResultID()).Return(nil, false).Once() (or use the mock's Once/Times appropriate) for the path where conflictingSeal appears in sealsForBlock but is not present in the mempool, then assert that the ExecForkSuppressor (or the mock handling OnExecFork) does not receive an OnExecFork call for that block/height. Ensure you add the same case for the second location noted (lines ~278-281) so both test paths cover the regression.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@module/mempool/consensus/exec_fork_suppressor_test.go`:
- Around line 237-241: Add a regression sub-case that simulates a non-includable
conflicting seal by stubbing wrappedMempool.Get to return (nil, false) for the
conflicting seal ID and ensuring OnExecFork is not invoked; specifically, in the
test around the existing wrappedMempool.On("Get",
conflictingSeal.IncorporatedResultID()) calls, add an alternative expectation
where wrappedMempool.On("Get",
conflictingSeal.IncorporatedResultID()).Return(nil, false).Once() (or use the
mock's Once/Times appropriate) for the path where conflictingSeal appears in
sealsForBlock but is not present in the mempool, then assert that the
ExecForkSuppressor (or the mock handling OnExecFork) does not receive an
OnExecFork call for that block/height. Ensure you add the same case for the
second location noted (lines ~278-281) so both test paths cover the regression.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
module/mempool/consensus/exec_fork_suppressor.gomodule/mempool/consensus/exec_fork_suppressor_test.go
There was a problem hiding this comment.
I agree with the direction of your fix. However, I find that the change adds too much subtle complexity that easily can lead to further bugs being introduced in future maintenance.
I think a conceptual subtlety with the current code is the following:
flow-go/module/mempool/consensus/exec_fork_suppressor.go
Lines 129 to 140 in 75c573a
This documentation is ambiguous at best, misleading at the worst given the current implementation:
- When I read that the "underlying mempool NEVER ejects seals" it is very intuitive to assume: If I put a seal into the mempool, then I can retrieve it from the mempool.
- And while we do not eject seals from the mempool in the strict sense, I feel we are practically behaving in an unintuitive way:
- if a seal is hidden because it has only one EN committing to it, this is functionally indistinguishable from the seal being ejected (unless a second EN commits to the result). And this is exactly the detail that most people will likely intuitively get wrong.
We also didn't understand the source of the error while fire fighting. So to me, this indicates that there is more to this issue than just fixing the edge case (potentially increasing the complexity further).
Atm, I am not clear what the best approach is. On the one hand, I'd prefer to have more intuitive code with lower maintenance risk. Likely, verification and sealing (incl. extensive checking mode) is not going to be done anytime soon, so we'll be running with those disaster prevention mechanisms for quite some time. On the other hand, they are just substitutes for a mature solution that will eventually hopefully no longer be needed, and we are really tight on dev time right now. So this might be tech debt worth taking on.
Will experiment a couple hours, at least expanding on the documentation.
Update
@jordanschalm PR (largely documentation) and some additional guardrails: #8472 (merged)
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…objects, so there is no cached seal value to accidentally read without re-validating through the wrapped pool; the type name and its doc block make the superset semantics explicit.
the index `sealSet` stores ID of incorporated results only, not seal …
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
module/mempool/consensus/exec_fork_suppressor.go (2)
320-352:⚠️ Potential issue | 🟡 Minor
lowestHeightnot updated after pruning non-empty state.When
sealsForBlockis non-empty, the method prunes entries but doesn't updatelowestHeight. This causes:
- Suboptimal early rejection in
Add()(line 145) - seals below the pruned threshold still pass through to the wrapped pool- Incorrect range calculation for the optimization at line 339 in subsequent prune calls
The wrapped pool enforces correctness, but the local state becomes inconsistent.
Proposed fix
} else { for h := s.lowestHeight; h < height; h++ { s.removeByHeight(h) } } + s.lowestHeight = height return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@module/mempool/consensus/exec_fork_suppressor.go` around lines 320 - 352, PruneUpToHeight currently prunes entries when s.sealsForBlock is non-empty but never updates s.lowestHeight, leaving local state inconsistent; after acquiring the lock and performing the removals via s.removeByHeight (the branches iterating s.byHeight or from s.lowestHeight), set s.lowestHeight = height (ensure this happens while the mutex is held) so subsequent Add() and future PruneUpToHeight() calls use the updated lowestHeight for correct early rejection and range calculations.
311-318:⚠️ Potential issue | 🟡 Minor
byHeightindex should be cleared for consistency.The
Clear()method resetssealsForBlockbut doesn't clearbyHeight. While this may not cause immediate issues when called during fork detection (sinceexecForkDetectedprevents furtherAdd()calls), the interface contract promises to remove all entities. For completeness and to prevent stale state ifClear()is called in other contexts,byHeightshould also be reset.Proposed fix
func (s *ExecForkSuppressor) Clear() { s.mutex.Lock() defer s.mutex.Unlock() s.sealsForBlock = make(map[flow.Identifier]potentiallySealableResults) + s.byHeight = make(map[uint64]map[flow.Identifier]struct{}) s.seals.Clear() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@module/mempool/consensus/exec_fork_suppressor.go` around lines 311 - 318, Clear() currently resets s.sealsForBlock and calls s.seals.Clear() but leaves the byHeight index populated; update ExecForkSuppressor.Clear to also reset the byHeight map (e.g., s.byHeight = make(map[<keyType>]<valueType>) or the existing zeroing pattern used elsewhere) while holding s.mutex so all internal state (sealsForBlock, byHeight, and s.seals) is cleared consistently to honor the “remove all entities” contract.
🧹 Nitpick comments (1)
module/mempool/consensus/exec_fork_suppressor.go (1)
336-337: Minor documentation typo."range to prune" appears twice in the comment.
Proposed fix
- // Optimization: if there are less height in the index than the height range to prune, - // range to prune, then just go through each seal. + // Optimization: if there are fewer heights in the index than the height range to prune, + // then just iterate through each indexed height.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@module/mempool/consensus/exec_fork_suppressor.go` around lines 336 - 337, Fix the duplicated phrase in the comment in exec_fork_suppressor.go: replace the line "// Optimization: if there are less height in the index than the height range to prune, // range to prune, then just go through each seal." with a single corrected sentence (also fix grammar) such as "// Optimization: if there are fewer heights in the index than the height range to prune, then just go through each seal." so the duplicate "range to prune" is removed and wording is clearer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@module/mempool/consensus/exec_fork_suppressor.go`:
- Around line 320-352: PruneUpToHeight currently prunes entries when
s.sealsForBlock is non-empty but never updates s.lowestHeight, leaving local
state inconsistent; after acquiring the lock and performing the removals via
s.removeByHeight (the branches iterating s.byHeight or from s.lowestHeight), set
s.lowestHeight = height (ensure this happens while the mutex is held) so
subsequent Add() and future PruneUpToHeight() calls use the updated lowestHeight
for correct early rejection and range calculations.
- Around line 311-318: Clear() currently resets s.sealsForBlock and calls
s.seals.Clear() but leaves the byHeight index populated; update
ExecForkSuppressor.Clear to also reset the byHeight map (e.g., s.byHeight =
make(map[<keyType>]<valueType>) or the existing zeroing pattern used elsewhere)
while holding s.mutex so all internal state (sealsForBlock, byHeight, and
s.seals) is cleared consistently to honor the “remove all entities” contract.
---
Nitpick comments:
In `@module/mempool/consensus/exec_fork_suppressor.go`:
- Around line 336-337: Fix the duplicated phrase in the comment in
exec_fork_suppressor.go: replace the line "// Optimization: if there are less
height in the index than the height range to prune, // range to prune, then just
go through each seal." with a single corrected sentence (also fix grammar) such
as "// Optimization: if there are fewer heights in the index than the height
range to prune, then just go through each seal." so the duplicate "range to
prune" is removed and wording is clearer.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
module/mempool/consensus/exec_fork_suppressor.gomodule/mempool/consensus/incorporated_result_seals.gomodule/mempool/incorporated_result_seals.gomodule/mempool/stdmap/incorporated_result_seals.go
✅ Files skipped from review due to trivial changes (2)
- module/mempool/stdmap/incorporated_result_seals.go
- module/mempool/consensus/incorporated_result_seals.go
|
Thanks for figuring out what the problem was. Great work 🙇 |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
module/mempool/consensus/exec_fork_suppressor.go (2)
309-341:⚠️ Potential issue | 🟠 Major
lowestHeightis not updated after pruning in the normal case.When
sealsForBlockis non-empty, the pruning loop executes butlowestHeightis never updated toheight. This causes:
- Future
PruneUpToHeightcalls to iterate over already-pruned heights unnecessarily- The
Addmethod's height check (line 150) uses a stalelowestHeight, potentially allowing seals at heights that should have been rejectedProposed fix to update lowestHeight after pruning
} else { for h := s.lowestHeight; h < height; h++ { s.removeByHeight(h) } } + s.lowestHeight = height return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@module/mempool/consensus/exec_fork_suppressor.go` around lines 309 - 341, The PruneUpToHeight method (ExecForkSuppressor) never updates s.lowestHeight when s.sealsForBlock is non-empty, leaving a stale lowestHeight after you remove heights; after the pruning loop(s) (both the uint64(len(s.byHeight)) branch and the for h := s.lowestHeight branch) set s.lowestHeight = height while still holding s.mutex so future PruneUpToHeight and Add calls use the correct lowestHeight; ensure this assignment happens before the function returns.
300-307:⚠️ Potential issue | 🟡 Minor
byHeightis not cleared, leading to inconsistent state.
Clear()resetssealsForBlockbut leavesbyHeightintact. This creates an inconsistency:byHeightmaps heights to block IDs that no longer exist insealsForBlock. While this may not cause immediate failures (the system halts on fork detection), it could cause issues if the state is later inspected or if the component is reused.Proposed fix to reset all internal state
func (s *ExecForkSuppressor) Clear() { s.mutex.Lock() defer s.mutex.Unlock() s.sealsForBlock = make(map[flow.Identifier]potentiallySealableResults) + s.byHeight = make(map[uint64]map[flow.Identifier]struct{}) + s.lowestHeight = 0 s.seals.Clear() }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@module/mempool/consensus/exec_fork_suppressor.go` around lines 300 - 307, Clear() currently resets s.sealsForBlock and s.seals but leaves s.byHeight intact, causing stale mappings; update ExecForkSuppressor.Clear to also reset s.byHeight to an empty map (same concrete type as declared on the struct) while holding the mutex so all internal state is consistently cleared (i.e., add s.byHeight = make(<the byHeight map type>) before unlocking).
🧹 Nitpick comments (1)
module/mempool/consensus/exec_fork_suppressor.go (1)
237-242: Consider defensive check for missing sealsForBlock entry.Line 237 assumes
sealsForBlock[seal.Seal.BlockID]will always exist when the underlying pool returns a seal. While this is the expected invariant (per comments inRemove), ifirIDsis nil,len(irIDs) == 0silently falls through to the expensive multi-seal path instead of detecting the inconsistency.For consistency with the fatal check in
Remove()(lines 272-275), consider adding a similar guard:Proposed defensive check
irIDs := s.sealsForBlock[seal.Seal.BlockID] + if irIDs == nil { + s.mutex.RUnlock() + s.log.Fatal().Msg("inconsistent state detected: seal not in secondary index") + } if len(irIDs) == 1 {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@module/mempool/consensus/exec_fork_suppressor.go` around lines 237 - 242, Add a defensive existence check for s.sealsForBlock[seal.Seal.BlockID] before relying on its length: in the function that reads irIDs (where the snippet uses irIDs := s.sealsForBlock[seal.Seal.BlockID]), verify the map lookup succeeded (e.g., via the comma-ok form) and if not, release the lock (s.mutex.RUnlock()) and fail consistently with the same fatal/panic behavior used in Remove() to surface the invariant breach; ensure this uses the same log/panic style and mentions the BlockID so the inconsistent state is clearly reported.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@module/mempool/consensus/exec_fork_suppressor.go`:
- Around line 309-341: The PruneUpToHeight method (ExecForkSuppressor) never
updates s.lowestHeight when s.sealsForBlock is non-empty, leaving a stale
lowestHeight after you remove heights; after the pruning loop(s) (both the
uint64(len(s.byHeight)) branch and the for h := s.lowestHeight branch) set
s.lowestHeight = height while still holding s.mutex so future PruneUpToHeight
and Add calls use the correct lowestHeight; ensure this assignment happens
before the function returns.
- Around line 300-307: Clear() currently resets s.sealsForBlock and s.seals but
leaves s.byHeight intact, causing stale mappings; update
ExecForkSuppressor.Clear to also reset s.byHeight to an empty map (same concrete
type as declared on the struct) while holding the mutex so all internal state is
consistently cleared (i.e., add s.byHeight = make(<the byHeight map type>)
before unlocking).
---
Nitpick comments:
In `@module/mempool/consensus/exec_fork_suppressor.go`:
- Around line 237-242: Add a defensive existence check for
s.sealsForBlock[seal.Seal.BlockID] before relying on its length: in the function
that reads irIDs (where the snippet uses irIDs :=
s.sealsForBlock[seal.Seal.BlockID]), verify the map lookup succeeded (e.g., via
the comma-ok form) and if not, release the lock (s.mutex.RUnlock()) and fail
consistently with the same fatal/panic behavior used in Remove() to surface the
invariant breach; ensure this uses the same log/panic style and mentions the
BlockID so the inconsistent state is clearly reported.
durkmurder
left a comment
There was a problem hiding this comment.
Great job, really like the documentation changes.
Context
In this recent incident, sealing halted because a consensus safety mechanism intended to prevent sealing inconsistent results was triggered. The mechanism was triggered for a forked result which had one EN and one VN attesting to it.
However, we require two ENs attesting to a result in order to seal it. Conceptually the safety mechanism should only trigger if we have two inconsistent includable seals; in this case it triggered with one includable seal (the correct result, which 4/5 ENs attested to) and one almost-includable seal (fork result, which 1/5 ENs and one VN attested to).
The "includable" property is enforced by the
IncorporatedResultSealmempool: it only returns seals for which there are two execution receipts.The reason for this behaviour is that the two read methods of the
ExecForkSuppressorbehave differently:Allonly checks seals returned from the underlyingIncorporatedResultSealmempool.Getcompares the requested seal (from the underlyingIncorporatedResultSealmempool) to an internal cache stored locally in theExecForkSuppressor. This internal cache is necessary becauseGetqueries a single seal by ID, but we need to compare that seal to all the other seals. However, the internal cache includes all seals ever added to the mempool, including those for which there are less than two execution receipts. So, theGetfunction has different behaviour and will more liberally trigger the consensus safety mechanism.The codepath that triggered the consensus safety mechanism during the incident used
Get, which is why it triggered with only one EN producing a forked result/receipt.Changes
This PR aims to make minimal changes to the existing logic so that
ExecForkSuppressoronly triggers the safety mechanism for includable seals:All(already works as desired)Getnow filters the internal cache by checking whether the underlyingIncorporatedResultSealmempool would return the seal. If the mempool returns the seal by ID, that indicates that there are 2 execution receipts for the seal.Summary by CodeRabbit