fix: widen ActorSelection SendConsistencySpec timeout to 60s#3090
Closed
He-Pin wants to merge 1 commit into
Closed
fix: widen ActorSelection SendConsistencySpec timeout to 60s#3090He-Pin wants to merge 1 commit into
He-Pin wants to merge 1 commit into
Conversation
Motivation: The "must be able to send messages with actorSelection concurrently preserving order" test flakes on CI: 4 sender actors each drive 1000 round trips via ActorSelection (4004 messages total), wrapped in ActorSelectionMessage with remote-side path traversal overhead. On CI with -Dpekko.test.timefactor=2, within(30.seconds) dilates to 60s which is insufficient under load — 3 of 4 senders complete but the 4th times out waiting for the final success2 message. Modification: Bump within(30.seconds) to within(60.seconds) at line 219 for the ActorSelection test only. The ActorRef variant (line 179) is unchanged as it passes within the original budget. Result: Up to 120s of dilated wall-clock headroom on CI for the ActorSelection variant; still fails fast on genuine deadlocks. Tests: - sbt "remote / Test / compile" — passes - CI Check / Test will exercise the artery variants References: Refs #3041 (previous timeout widening from 10s to 30s)
Member
|
test now failing in scala 3.3 |
Member
Author
|
Closing in favor of #3092 which fixes the root cause. The timeout widening only masks the symptom. The actual issue is that #3092 distributes ActorSelection messages across lanes based on their target path hash instead, eliminating the structural bottleneck. |
He-Pin
added a commit
that referenced
this pull request
Jun 19, 2026
… by target path Motivation: With multi-lane artery config (outbound-lanes > 1), all ActorSelection messages were routed to the same outbound queue because selectQueue used the anchor's UID (root guardian, always 0) as the distribution key: math.abs(0 % N) = 0 for any N. This concentrated all ActorSelection traffic on a single lane, creating a throughput bottleneck. Modification: Handle ActorSelectionMessage in a dedicated case that distributes across lanes based on the target path elements hash instead of the anchor's UID. PriorityMessage ActorSelection (cluster heartbeats) continues to use the control queue. Uses (hash & Int.MaxValue) to guard against Integer.MIN_VALUE producing a negative queue index. Result: ActorSelection messages are distributed across all outbound lanes by target path. Per-path message ordering is preserved (same path → same lane). PriorityMessage routing and all other message types are unaffected. Tests: - sbt "remote / Test / compile" — passes - sbt "remote / Test / testOnly *ActorSelectionQueueDistributionSpec" — 5/5 - CI will exercise the artery variants References: Refs #3041, supersedes #3090
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The "must be able to send messages with actorSelection concurrently preserving order" test flakes on CI: 4 sender actors each drive 1000 round trips via
ActorSelection(4004 messages total), wrapped inActorSelectionMessagewith remote-side path traversal overhead. On CI with-Dpekko.test.timefactor=2,within(30.seconds)dilates to 60s which is insufficient under load — 3 of 4 senders complete but the 4th times out waiting for the finalsuccess2message.Investigation confirmed no recent code change caused this:
LazyDispatch(ba4e950) has zero impactGraphInterpreterpendingFinalization(a505843) is a hot-path optimization, no behavioral changeModification
Bump
within(30.seconds)→within(60.seconds)at line 219 for the ActorSelection test only. The ActorRef variant (line 179) is unchanged as it passes within the original budget.Result
Up to 120s of dilated wall-clock headroom on CI for the ActorSelection variant; still fails fast on genuine deadlocks.
Tests
sbt "remote / Test / compile"— passesReferences
Refs #3041 (previous timeout widening from 10s to 30s)