Skip to content

[fix][broker] Handle missing replicator during snapshot request processing#25266

Open
Denovo1998 wants to merge 1 commit intoapache:masterfrom
Denovo1998:snapshot_request_when_replicator_removed_concurrently
Open

[fix][broker] Handle missing replicator during snapshot request processing#25266
Denovo1998 wants to merge 1 commit intoapache:masterfrom
Denovo1998:snapshot_request_when_replicator_removed_concurrently

Conversation

@Denovo1998
Copy link
Contributor

@Denovo1998 Denovo1998 commented Feb 26, 2026

Fixes #xyz

Main Issue: #xyz

PIP: #xyz

Motivation

A race condition can happen when a replicated subscription snapshot request is being processed while the corresponding replicator is removed concurrently. In this case, topic.getReplicators().get(...) may return null, which can lead to an exception path instead of graceful handling. This affects broker reliability under concurrent replication policy updates and marker handling.

Modifications

  • Added a null-check in ReplicatedSubscriptionsController#receivedSnapshotRequest for the source-cluster replicator.
  • When the replicator is missing, the broker now logs a warning with topic, snapshot id, and source cluster, then safely returns.
  • Kept the existing reconnect path (startReplProducers) for non-null but disconnected replicators.
  • Added ReplicatedSubscriptionsControllerTest#testSnapshotRequestWhenReplicatorRemovedConcurrentlyDoesNotThrow.
  • The test deterministically reproduces the race by blocking replicators.get(...), removing the replicator through PersistentTopic#removeReplicator, then resuming marker handling.
  • The test verifies replicator termination and cursor cleanup are invoked through the production removal path, and confirms marker handling completes without throwing.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: Denovo1998#22

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.63%. Comparing base (24eba10) to head (7ceb553).
⚠️ Report is 20 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #25266       +/-   ##
=============================================
+ Coverage     37.65%   72.63%   +34.97%     
- Complexity    13351    34417    +21066     
=============================================
  Files          1902     1959       +57     
  Lines        151237   155442     +4205     
  Branches      17238    17739      +501     
=============================================
+ Hits          56954   112900    +55946     
+ Misses        86577    33541    -53036     
- Partials       7706     9001     +1295     
Flag Coverage Δ
inttests 25.60% <0.00%> (-0.29%) ⬇️
systests 22.43% <0.00%> (-0.02%) ⬇️
unittests 73.60% <100.00%> (+39.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../persistent/ReplicatedSubscriptionsController.java 75.00% <100.00%> (+75.00%) ⬆️

... and 1420 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants