test: isolate SendConsistency specs into separate JVM forks#3099
Closed
He-Pin wants to merge 2 commits into
Closed
test: isolate SendConsistency specs into separate JVM forks#3099He-Pin wants to merge 2 commits into
He-Pin wants to merge 2 commits into
Conversation
Motivation: ArteryTlsTcpSendConsistencyWithOneLaneSpec flakes on CI because it creates 2 ActorSystems with TLS-TCP transport and runs 1000 round-trip message exchanges via ActorSelection. When other test classes run in the same forked JVM, lingering ActorSystem threads from previous tests consume CPU and compete with the TLS operations, causing the test to exceed its 60-second timeout (30s × timefactor=2). Modification: Add Tests.Group configuration to the remote module that partitions SendConsistency test classes into their own SubProcess (forked JVM), while keeping all other remote tests in a shared "other" group. Each SendConsistency spec gets a clean JVM with no thread contention from previously-run test classes. Result: SendConsistency specs run in isolated JVM forks, eliminating cross-test thread contention that caused the 1-lane TLS-TCP variant to flake. Tests: sbt -Dpekko.test.timefactor=2 "remote / Test / testOnly *ArteryTlsTcpSendConsistencyWithOneLaneSpec" — 4/4 passed sbt "show remote / Test / testGrouping" — verified 6 SendConsistency groups + 1 "other" group References: Refs #3089
Motivation:
PekkoBuild.scala sets workingDirectory to the project root for all
forked test groups because some tests depend on the Pekko root being
the working dir. The ForkOptions() created here from scratch missed
this setting, defaulting to the module directory (remote/) which could
break tests that rely on the project root as their working directory.
Modification:
Add .withWorkingDirectory(Some(new File(System.getProperty("user.dir"))))
to the ForkOptions, matching PekkoBuild.scala:225-237.
Result:
Forked SendConsistency test JVMs use the correct project root working
directory, consistent with all other test groups.
Tests:
sbt "show remote / Test / testGrouping" — verified workingDirectory
is set correctly on all groups
References:
Refs #3099
Member
Author
|
Closing: fork isolation helps with JVM-internal contention but the 1-lane SendConsistency flakiness is caused by CI host-level CPU contention (confirmed same test also fails on origin/main without this change). The lane distribution fix in #3092 is the more impactful change for the multi-lane variants. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
ArteryTlsTcpSendConsistencyWithOneLaneSpecflakes on CI because it creates 2 ActorSystems with TLS-TCP transport and runs 1000 round-trip message exchanges viaActorSelection. When other test classes run in the same forked JVM (the remote module uses a single fork), lingering ActorSystem threads from previous tests consume CPU and compete with the TLS operations, causing the test to exceed its 60-second timeout (30s × timefactor=2).Even with
Test / parallelExecution := false, test classes share the same JVM and their ActorSystem cleanup threads (TLS connections, scheduler threads, dispatcher pools) overlap.Modification
Add
Tests.Groupconfiguration to the remote module that partitions*SendConsistency*test classes into their ownSubProcess(forked JVM), while keeping all other remote tests in a shared "other" group:Each of the 6 SendConsistency specs gets a clean JVM with no thread contention from previously-run test classes.
Result
SendConsistency specs run in isolated JVM forks, eliminating cross-test thread contention that caused the 1-lane TLS-TCP variant to flake. Other remote tests continue to share a single fork (no additional overhead).
Tests
References
Refs #3089