PHOENIX-7870 :- Per-HA-group poller futures and url1/url2 alternation in GetClusterRoleRecordUtil by lokiore · Pull Request #2490 · apache/phoenix

lokiore · 2026-05-28T22:19:45Z

What changes were proposed in this pull request?

Two correctness fixes in GetClusterRoleRecordUtil's non-active CRR poller infrastructure:

Bug 1 — Per-HA-group future tracking. The previous implementation kept a single static volatile ScheduledFuture<?> pollerFuture field that was overwritten by every schedulePoller(...) invocation regardless of haGroupName. When the active-CRR detection branch later cancelled pollerFuture, it cancelled whichever future had been scheduled most recently — which could belong to a different HA group than the one whose lambda was running. Replaced with a ConcurrentHashMap<String, ScheduledFuture<?>> futureMap keyed by haGroupName. Symmetric handling for the pre-existing schedulerMap (now final, removed from the map on the active-CRR cancel path).

Bug 2 — url1/url2 alternation each tick. The previous implementation pinned each scheduled poller to the single URL passed in at schedule time. If that cluster's RegionServer Endpoint became transiently unreachable, the poller could never observe the peer cluster's CRR — even after the peer became Active. The poller now alternates between url1 and url2 each tick (even ticks → url1, odd ticks → url2). A failed tick still increments the counter so alternation continues uninterrupted on the next iteration.

Method signatures updated: fetchClusterRoleRecord(url1, url2, primaryUrl, haGroupName, ...) and schedulePoller(url1, url2, haGroupName, ...). Caller sites in HighAvailabilityGroup.getClusterRoleRecordFromEndpoint updated to pass both URLs explicitly while preserving existing per-call-site primary-URL ordering.

JIRA: https://issues.apache.org/jira/browse/PHOENIX-7870

Why are the changes needed?

Both bugs surface in deployments where multiple HA groups are configured against the same JVM, or where one of the two clusters' RegionServer Endpoints experiences a transient outage. Bug 1 can cancel an unrelated HA group's poller (silent failure of the cancelled group's recovery loop). Bug 2 can stall non-active CRR detection indefinitely if the polled URL's cluster is the one having issues, even when the peer cluster has already become Active.

Does this PR introduce any user-facing change?

No

The only signature changes are on package-private internal methods (schedulePoller) and on the public utility entry-point fetchClusterRoleRecord which is consumed only by HighAvailabilityGroup (within phoenix-core-client). External consumers do not call these directly.

How was this patch tested?

New unit test class GetClusterRoleRecordUtilTest (4 tests, all PASS):

testSelectUrlForTickAlternates — verifies even/odd alternation across the first six ticks
testSelectUrlForTickHandlesLargeTickValues — guards against sign issues at large tick values including Long.MAX_VALUE
testFutureMapIsolatesEntriesPerHaGroup — verifies distinct HA groups produce distinct future-map entries (Bug 1 invariant)
testCancelOneHaGroupDoesNotCancelOthers — verifies cancelling one HA group's poller leaves peers untouched (Bug 1 behavioural invariant)

Local commands run on PHOENIX-7562-feature-new HEAD:

mvn install -DskipTests                                                # full repo install — BUILD SUCCESS
mvn -pl phoenix-core-client compile                                    # prod-only compile — BUILD SUCCESS
mvn -pl phoenix-core test -Dtest=GetClusterRoleRecordUtilTest          # Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

… in GetClusterRoleRecordUtil Bug 1 — Per-HA-group future tracking Replaces the single static volatile pollerFuture field (which was overwritten on every schedulePoller call regardless of haGroupName, so cancelling one HA group's poller would target whichever future was scheduled most recently — possibly belonging to a different HA group) with a ConcurrentHashMap<String, ScheduledFuture<?>> keyed by haGroupName. Symmetric handling for the existing schedulerMap (now also removed from the map on the active-CRR cancel path). Bug 2 — url1/url2 alternation each tick Replaces the single-URL poller (which would stall progress if its target cluster's RegionServer Endpoint became transiently unreachable while the peer cluster held the Active role) with even/odd-tick alternation between url1 and url2. Method signatures updated: fetchClusterRoleRecord and schedulePoller now accept both URLs explicitly. Generated-by: Claude Code (Opus 4.7)

lokiore force-pushed the PHOENIX-7870-poller-bug-fixes branch from 1782804 to 13b6500 Compare May 28, 2026 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-7870 :- Per-HA-group poller futures and url1/url2 alternation in GetClusterRoleRecordUtil#2490

PHOENIX-7870 :- Per-HA-group poller futures and url1/url2 alternation in GetClusterRoleRecordUtil#2490
lokiore wants to merge 1 commit into
apache:PHOENIX-7562-feature-newfrom
lokiore:PHOENIX-7870-poller-bug-fixes

lokiore commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lokiore commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lokiore commented May 28, 2026 •

edited

Loading