Skip to content

test: isolate SendConsistency specs into separate JVM forks#3099

Closed
He-Pin wants to merge 2 commits into
mainfrom
fix/ci-remote-send-consistency-fork
Closed

test: isolate SendConsistency specs into separate JVM forks#3099
He-Pin wants to merge 2 commits into
mainfrom
fix/ci-remote-send-consistency-fork

Conversation

@He-Pin

@He-Pin He-Pin commented Jun 19, 2026

Copy link
Copy Markdown
Member

Motivation

ArteryTlsTcpSendConsistencyWithOneLaneSpec flakes on CI because it creates 2 ActorSystems with TLS-TCP transport and runs 1000 round-trip message exchanges via ActorSelection. When other test classes run in the same forked JVM (the remote module uses a single fork), lingering ActorSystem threads from previous tests consume CPU and compete with the TLS operations, causing the test to exceed its 60-second timeout (30s × timefactor=2).

Even with Test / parallelExecution := false, test classes share the same JVM and their ActorSystem cleanup threads (TLS connections, scheduler threads, dispatcher pools) overlap.

Modification

Add Tests.Group configuration to the remote module that partitions *SendConsistency* test classes into their own SubProcess (forked JVM), while keeping all other remote tests in a shared "other" group:

Test / testGrouping := {
  val allTests = (Test / definedTests).value
  val (sendConsistencyTests, otherTests) =
    allTests.partition(_.name.contains("SendConsistency"))
  val defaultForkOptions = ForkOptions()
    .withRunJVMOptions((Test / javaOptions).value.toVector)
  val otherGroup = Tests.Group("other", otherTests, Tests.SubProcess(defaultForkOptions))
  val sendConsistencyGroups = sendConsistencyTests.map { t =>
    Tests.Group(t.name, Seq(t), Tests.SubProcess(defaultForkOptions))
  }
  otherGroup +: sendConsistencyGroups
}

Each of the 6 SendConsistency specs gets a clean JVM with no thread contention from previously-run test classes.

Result

SendConsistency specs run in isolated JVM forks, eliminating cross-test thread contention that caused the 1-lane TLS-TCP variant to flake. Other remote tests continue to share a single fork (no additional overhead).

Tests

sbt -Dpekko.test.timefactor=2 "remote / Test / testOnly
  *ArteryTlsTcpSendConsistencyWithOneLaneSpec" — 4/4 passed

sbt "show remote / Test / testGrouping" — verified 6 SendConsistency
  groups + 1 "other" group

References

Refs #3089

Motivation:
ArteryTlsTcpSendConsistencyWithOneLaneSpec flakes on CI because it
creates 2 ActorSystems with TLS-TCP transport and runs 1000 round-trip
message exchanges via ActorSelection. When other test classes run in
the same forked JVM, lingering ActorSystem threads from previous tests
consume CPU and compete with the TLS operations, causing the test to
exceed its 60-second timeout (30s × timefactor=2).

Modification:
Add Tests.Group configuration to the remote module that partitions
SendConsistency test classes into their own SubProcess (forked JVM),
while keeping all other remote tests in a shared "other" group. Each
SendConsistency spec gets a clean JVM with no thread contention
from previously-run test classes.

Result:
SendConsistency specs run in isolated JVM forks, eliminating cross-test
thread contention that caused the 1-lane TLS-TCP variant to flake.

Tests:
sbt -Dpekko.test.timefactor=2 "remote / Test / testOnly
  *ArteryTlsTcpSendConsistencyWithOneLaneSpec" — 4/4 passed
sbt "show remote / Test / testGrouping" — verified 6 SendConsistency
  groups + 1 "other" group

References:
Refs #3089
@He-Pin He-Pin marked this pull request as draft June 19, 2026 22:48
Motivation:
PekkoBuild.scala sets workingDirectory to the project root for all
forked test groups because some tests depend on the Pekko root being
the working dir. The ForkOptions() created here from scratch missed
this setting, defaulting to the module directory (remote/) which could
break tests that rely on the project root as their working directory.

Modification:
Add .withWorkingDirectory(Some(new File(System.getProperty("user.dir"))))
to the ForkOptions, matching PekkoBuild.scala:225-237.

Result:
Forked SendConsistency test JVMs use the correct project root working
directory, consistent with all other test groups.

Tests:
sbt "show remote / Test / testGrouping" — verified workingDirectory
  is set correctly on all groups

References:
Refs #3099
@He-Pin

He-Pin commented Jun 20, 2026

Copy link
Copy Markdown
Member Author

Closing: fork isolation helps with JVM-internal contention but the 1-lane SendConsistency flakiness is caused by CI host-level CPU contention (confirmed same test also fails on origin/main without this change). The lane distribution fix in #3092 is the more impactful change for the multi-lane variants.

@He-Pin He-Pin closed this Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant