Skip to content

fix(artery): distribute ActorSelection messages across lanes by target path#3098

Closed
He-Pin wants to merge 1 commit into
mainfrom
fix/remote-send-consistency-flaky
Closed

fix(artery): distribute ActorSelection messages across lanes by target path#3098
He-Pin wants to merge 1 commit into
mainfrom
fix/remote-send-consistency-flaky

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 19, 2026

Copy link
Copy Markdown
Member

Motivation

The "must be able to send messages with actorSelection concurrently preserving order" test flakes on CI: 4 sender actors each drive 1000 round-trips via ActorSelection (4004 messages total). With multi-lane artery config (outbound-lanes > 1), all ActorSelection messages are routed to the same outbound queue because selectQueue uses the anchor's UID as the distribution key:

OrdinaryQueueIndex + (math.abs(r.path.uid % outboundLanes))

The anchor for ActorSelection is the root guardian (RootActorPath), whose UID is always 0 (ActorCell.undefinedUid). So math.abs(0 % N) = 0 for any N — all ActorSelection traffic concentrates on lane 0 while other lanes sit idle.

Similarly, inbound lane partitioning uses the wire recipient's UID (root guardian = 0), concentrating all inbound ActorSelection processing on a single inbound lane.

This is not a recent regression — the selectQueue logic has been unchanged since the Pekko fork from Akka.

Modification

Outbound (Association.send): Add a dedicated case sel: ActorSelectionMessage that computes the queue index from the selection's target path elements hash instead of the anchor's UID:

case sel: ActorSelectionMessage =>
  sel.msg match {
    case _: PriorityMessage =>
      // cluster heartbeats stay on control queue
      controlQueue.offer(outboundEnvelope)
    case _ =>
      val queueIndex =
        if (outboundLanes == 1) OrdinaryQueueIndex
        else OrdinaryQueueIndex + ((sel.elements.hashCode() & Int.MaxValue) % outboundLanes)
      queues(queueIndex).offer(outboundEnvelope)
  }

Inbound (ArteryTransport.inboundLanePartitioner): Parse the ActorSelectionMessage's target path from the envelope byte buffer and use it as the destination hash key, distributing inbound ActorSelection processing across all inbound lanes.

PriorityMessage ActorSelection (used by cluster heartbeats) continues to go through the control queue unchanged.

Result

ActorSelection messages are distributed across all outbound and inbound lanes based on target path, eliminating the single-lane throughput bottleneck while preserving per-path message ordering (same target path → same hash → same lane).

Tests

sbt -Dpekko.test.timefactor=2 "remote / Test / testOnly *SendConsistency*"
→ 24/24 passed (all 6 spec variants: Upd/Tcp/TlsTcp × 1-lane/3-lanes)

sbt "remote / mimaReportBinaryIssues" → clean

References

Fixes #3089

…t path

Motivation:
The "must be able to send messages with actorSelection concurrently
preserving order" test flakes on CI: 4 sender actors each drive 1000
round-trips via ActorSelection (4004 messages total). With multi-lane
artery config (outbound-lanes > 1), all ActorSelection messages are
routed to the same outbound queue because selectQueue uses the anchor's
UID as the distribution key. The anchor for ActorSelection is the root
guardian (RootActorPath), whose UID is always 0 (ActorCell.undefinedUid).
So math.abs(0 % N) = 0 for any N — all ActorSelection traffic
concentrates on lane 0 while other lanes sit idle.

Similarly, inbound lane partitioning uses the wire recipient's UID
(root guardian = 0), concentrating all inbound ActorSelection processing
on a single inbound lane.

Modification:
Outbound: Add a dedicated case for ActorSelectionMessage in
Association.send that computes the queue index from the selection's
target path elements hash instead of the anchor's UID. PriorityMessage
ActorSelection (cluster heartbeats) continues going through the control
queue.

Inbound: Update ArteryTransport.inboundLanePartitioner to parse the
ActorSelectionMessage's target path from the envelope byte buffer and
use it as the destination hash key, distributing inbound ActorSelection
processing across all inbound lanes.

Result:
ActorSelection messages are distributed across all outbound and inbound
lanes based on target path, eliminating the single-lane throughput
bottleneck while preserving per-path message ordering.

Tests:
sbt -Dpekko.test.timefactor=2 "remote / Test / testOnly
  *SendConsistency*" — 24/24 passed (all 6 spec variants)
sbt "remote / mimaReportBinaryIssues" — clean

References:
Fixes #3089
@He-Pin He-Pin marked this pull request as draft June 19, 2026 22:48
@He-Pin

He-Pin commented Jun 19, 2026

Copy link
Copy Markdown
Member Author

Superseded by #3092 which has a more optimized CodedInputStream-based protobuf parsing and includes test coverage.

@He-Pin He-Pin closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AbstractRemoteSendConsistencySpec actorSelection preserving order test now flaky

1 participant