Fix broker stuck in SYNCHRONIZING on DB error during rollback#4995
Open
Fix broker stuck in SYNCHRONIZING on DB error during rollback#4995
Conversation
edad655 to
c6e8ca8
Compare
b45ec83 to
fdf1cfb
Compare
johha
previously approved these changes
Apr 8, 2026
df03384 to
c00c7e5
Compare
When a service broker update job fails and attempts to revert the
broker state, a database connection failure could cause the job to
crash without properly handling the original error. This left the
broker stuck in SYNCHRONIZING state with a FAILED job.
This change wraps the state rollback operation in error handling to
catch database errors and allow the original exception to be raised
and the job to be retried properly.
Changes:
- app/jobs/v3/services/update_broker_job.rb: Add error handling around
ServiceBroker.where().update() call in rescue block to gracefully
handle database disconnections during state rollback
- spec/unit/jobs/v3/services/update_broker_job_spec.rb: Add test case
for database disconnect during state rollback
c00c7e5 to
b002752
Compare
johha
previously approved these changes
Apr 9, 2026
7de4587 to
8a9524d
Compare
When UpdateBrokerJob exhausts retries and transitions to FAILED,
invoke recover_from_failure to revert broker from SYNCHRONIZING
back to previous state. This ensures brokers don't remain stuck
when jobs fail during extended database outages.
Changes:
- UpdateBrokerJob: Add recover_from_failure method with conditional
WHERE clause to safely revert SYNCHRONIZING brokers
- PollableJobWrapper: Call recover_from_failure hook in failure method
- Extract rollback_broker_state and destroy_update_request helpers
for cleaner error handling
- Add tests for recovery hook behavior and edge cases
8a9524d to
1c425fc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a service broker update job fails, a database connection failure during state rollback could leave the broker permanently stuck in
SYNCHRONIZINGstate with aFAILEDjob. This PR adds comprehensive error handling and a recovery hook to ensure brokers are reverted to their previous state even when database outages occur during job failure.Changes:
a) app/jobs/v3/services/update_broker_job.rb:
FAILEDSYNCHRONIZING) to prevent overwriting newer broker statesb) app/jobs/pollable_job_wrapper.rb:
c) spec/unit/jobs/v3/services/update_broker_job_spec.rb:
d) spec/unit/jobs/pollable_job_wrapper_spec.rb:
A short explanation of the proposed change:
This PR adds multi-layered error handling for broker state management during job failures:
FAILEDSYNCHRONIZINGbrokers are reverted, protecting against overwriting newer statesAn explanation of the use cases your change solves
This change solves a critical production issue where service brokers become permanently stuck in
SYNCHRONIZINGstate. This occurs when:AVAILABLE)SYNCHRONIZINGstate with aFAILEDjob, requiring manual interventionFAILED, the recover_from_failure hook is invoked by PollableJobWrapper, which attempts to revert the broker state. The conditional WHEREclause prevents overwriting any newer state that may have been set, ensuring safe recovery.
I have reviewed the contributing guide
I have viewed, signed, and submitted the Contributor License Agreement
I have made this pull request to the
mainbranchI have run all the unit tests using
bundle exec rakeI have run CF Acceptance Tests