fix(purger): preserve partial-batch progress in error log on mid-loop failure#55
Open
endrju19 wants to merge 1 commit into
Open
fix(purger): preserve partial-batch progress in error log on mid-loop failure#55endrju19 wants to merge 1 commit into
endrju19 wants to merge 1 commit into
Conversation
… failure OutboxPurger.tick() purges in a do-while loop where each iteration runs an independent transaction. When iteration N throws, batches 0..N-1 are already durably committed -- but the previous error log emitted only the bare exception with no count, leaving an operator paged on "Outbox purge failed" no way to tell whether 0 or 9000 entries were purged before the failure without querying the database directly. Hoist totalDeleted and batches out of the try block so the catch handler can include them in the error log. No behavioural change beyond the log message format -- the catch is correctly placed at the scheduler task boundary and must stay broad to prevent ScheduledExecutorService from cancelling future ticks. Test infrastructure: swap testRuntimeOnly(slf4j-simple) for testImplementation(logback-classic) in okapi-core so the new test can capture ILoggingEvent via ListAppender. Logback remains the sole SLF4J binding on the test classpath -- no multi-binding warning -- and production stays free of any logging backend. The new test asserts on the logger's typed argument array (Int batches, Int totalDeleted), not on the formatted message text, so future log-wording tweaks won't break the regression check.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
OutboxPurger.tick()purges DELIVERED entries in ado { ... } whileloop where each iteration runs an independent transaction. When iteration N throws (transient DB error, lock timeout, etc.), iterations 0..N-1 committed in their own transactions and their deletes are durable — but the previous error log emitted only the bare exception with no count, so an operator paged on "Outbox purge failed" had no way to tell whether 0 or 9000 entries were purged before the failure without inspecting the database directly.Why this matters
"Outbox purge failed after 7 batches (700 entries purged this tick), will retry at next scheduled interval"instead of just"Outbox purge failed, ...". Tells them at a glance whether the failure was early (real outage) or late (transient hiccup after most of the work was done).config.intervallater will re-issue the purge from a fresh cutoff. If N-1 batches succeeded, the next tick has N-1 fewer rows to scan. Logging the count makes this visible without DB inspection.What changed
OutboxPurger.kt— hoisttotalDeletedandbatchesout of thetryblock so thecatchhandler can include them as logger arguments. No behavioural change beyond the log message format — the broad catch is correctly placed at the scheduler-task boundary (must stay broad to preventScheduledExecutorServicefrom cancelling future ticks).okapi-core/build.gradle.kts— swaptestRuntimeOnly(libs.slf4jSimple)fortestImplementation(libs.logbackClassic). Logback is now the sole SLF4J binding on the test classpath (no multi-binding warning) and is required by the newListAppendercapture in the test. Production code stays free of any logging backend.OutboxPurgerTest.kt— new test"error log preserves partial batch progress on mid-loop failure". Asserts on the typed argument array (Int batches,Int totalDeleted) attached to the capturedILoggingEvent, not on the formatted message text — so future log-wording tweaks won't break the regression check.How this was identified
Multi-agent review of #54 (
pr-review-toolkit:silent-failure-hunter) flagged this as MEDIUM. #54's body explicitly defers the logging improvement to a separate follow-up; this PR is that follow-up.Out of scope
OutboxScheduler.tick()catch — no batch loop, no partial-progress information to lose.okapi-micrometer; this PR is logging-only.Compatibility with #54
Branches off
main(wheretransactionRunneris still nullable with elvis fallback). When #54 lands, this commit rebases trivially — the hoisted-counter pattern is structurally identical regardless of whether the runner is nullable.Test plan
./gradlew :okapi-core:test— green./gradlew ktlintCheck— green./gradlew build— full build green (all tests, all modules)