Skip to content

[fix][broker]system topic was created with different partitions acrossing clusters after enabled namespace-level replication#25312

Open
poorbarcode wants to merge 5 commits intoapache:masterfrom
poorbarcode:fix/repl_change_events_conflict
Open

[fix][broker]system topic was created with different partitions acrossing clusters after enabled namespace-level replication#25312
poorbarcode wants to merge 5 commits intoapache:masterfrom
poorbarcode:fix/repl_change_events_conflict

Conversation

@poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Mar 11, 2026

Motivation

The situation that replication does not work as expected.

  • cluster: c1
    • namespace: public/default

The system topic __change_events has already been created as a non-partitioned topic; in other words, the topic named __change_events has no partition.

  • cluster: c2, has the following configurations
    • allowAutoTopicCreationType=partitioned
    • defaultNumPartitions=1

These two configurations mean that the new topic will be created as a partitioned topic and contain 1 partition.

If we enable replication for the namespace between two clusters, the following issue will happen

  • The cluster c2 will create a partitioned topic __change_events , and it will have a partition like this __change_events-partition-0
  • The cluster c1 will start replication for the non-partitioned topic __change_events, which triggers a non-partitioned topic creation on the c2 cluster.
  • c2 will also start a replication to c1 since it has a partitioned topic which has a partition named __change_events-partition-0, it triggers a partitioned creation on c1 cluster.

Eventually, both clusters will hold both partitioned topics and non-partitioned topics as follows, but they have the same name.

  • admin/partitioned-toipic: {partitions: 1}
  • managed-ledger/persistent: [__change_events-partition-0, __change_events]

Because this is a system topic, it will lead to fragmentation and cause serious problems

Modifications

  • Create a topic with the same type as the remote side, if the topic has already been created on the remote side.
  • The change only affects if the following conditions match
    • Enable replication
    • The topic creation was triggered by consumer/producer's starting
    • The remote side has already created the topic
    • broker.conf -> isCreateTopicToRemoteClusterForReplication is true
    • enabled topic auto-creation

This PR will not cause such an infinite loop: the local cluster queries the remote cluster for the topic type and partition count, and on the remote side, the request is handled by handlePartitionMetadataRequest. When determining the auto-creation policy, the remote cluster would otherwise query the local cluster again. However, during the initial query ("the local cluster queries the remote cluster for the topic type and partition count"), the request is already limited to existing topics only, so the remote cluster will not execute the logic for fetching the auto-creation policy.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added this to the 4.2.0 milestone Mar 11, 2026
@poorbarcode poorbarcode self-assigned this Mar 11, 2026
@poorbarcode poorbarcode added type/bug The PR fixed a bug or issue reported a bug ready-to-test release/4.1.4 release/4.0.10 labels Mar 11, 2026
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Mar 11, 2026
@poorbarcode poorbarcode changed the title Fix/repl change events conflict [fix][broker]system topic was created with different partitions after enabled namespace-level replication Mar 11, 2026
@poorbarcode poorbarcode changed the title [fix][broker]system topic was created with different partitions after enabled namespace-level replication [fix][broker]system topic was created with different partitions acrossing clusters after enabled namespace-level replication Mar 11, 2026
@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 60.93750% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.70%. Comparing base (f1aac3d) to head (037a5a4).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...rg/apache/pulsar/broker/service/BrokerService.java 60.93% 16 Missing and 9 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #25312      +/-   ##
============================================
+ Coverage     72.63%   72.70%   +0.07%     
+ Complexity    34632    34236     -396     
============================================
  Files          1967     1954      -13     
  Lines        156326   154769    -1557     
  Branches      17812    17713      -99     
============================================
- Hits         113542   112521    -1021     
+ Misses        33713    33184     -529     
+ Partials       9071     9064       -7     
Flag Coverage Δ
inttests 25.73% <12.50%> (+0.12%) ⬆️
systests 22.48% <10.93%> (+0.23%) ⬆️
unittests 73.68% <60.93%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rg/apache/pulsar/broker/service/BrokerService.java 82.65% <60.93%> (-1.15%) ⬇️

... and 164 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@poorbarcode poorbarcode requested a review from coderzc March 13, 2026 06:24
Comment on lines +800 to +803
PersistentTopic persistentTopic1 = (PersistentTopic) broker1.getTopic(tp, false).join().get();
assertTrue(persistentTopic1.getReplicators().isEmpty());
PersistentTopic persistentTopic2 = (PersistentTopic) broker1.getTopic(tp, false).join().get();
assertTrue(persistentTopic2.getReplicators().isEmpty());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PersistentTopic persistentTopic1 = (PersistentTopic) broker1.getTopic(tp, false).join().get();
assertTrue(persistentTopic1.getReplicators().isEmpty());
PersistentTopic persistentTopic2 = (PersistentTopic) broker1.getTopic(tp, false).join().get();
assertTrue(persistentTopic2.getReplicators().isEmpty());
PersistentTopic persistentTopic1 = (PersistentTopic) broker1.getTopic(tp, false).join().get();
assertTrue(persistentTopic1.getReplicators().isEmpty());
PersistentTopic persistentTopic2 = (PersistentTopic) broker2.getTopic(tp, false).join().get();
assertTrue(persistentTopic2.getReplicators().isEmpty());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

if (actEx instanceof PulsarClientException.NotFoundException
| actEx instanceof PulsarClientException.TopicDoesNotExistException
| actEx instanceof PulsarAdminException.NotFoundException) {
future.complete(TopicExistsInfo.newTopicNotExists());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is 3+ clusters, this branch returns TopicExistsInfo.newTopicNotExists() as soon as the first remote cluster says the topic is missing, the later clusters are never consulted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, take the first one as the standard. Since the second cluster has already been compared with the first cluster when creating the topic, we can assume that the old cluster is consistent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test release/4.0.10 release/4.1.4 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants