Fix Dataflow test timeouts under JIT stress: add timeouts to SpinWait.SpinUntil calls#125677
Fix Dataflow test timeouts under JIT stress: add timeouts to SpinWait.SpinUntil calls#125677
Conversation
Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/ConcurrentTests.cs
Outdated
Show resolved
Hide resolved
|
@copilot how did you identify these tests as the likely point of hang - jsut code analysis? |
There was a problem hiding this comment.
Pull request overview
This PR updates System.Threading.Tasks.Dataflow.Tests to avoid indefinite hangs under JIT stress by ensuring previously-unbounded SpinWait.SpinUntil(...) loops now have a bounded timeout and fail with an assertion instead of timing out the entire test run.
Changes:
- Add a 30s timeout to previously unbounded
SpinWait.SpinUntil(...)waits in several tests/helpers. - Wrap these waits with
Assert.True(...)so failures surface as test failures rather than Helix/work-item timeouts.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/ConcurrentTests.cs |
Adds 30s timeouts + assertions to multiple SpinWait.SpinUntil calls in block-construction helpers to prevent indefinite spinning. |
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/TransformBlockTests.cs |
Adds 30s timeout + assertion around the OutputCount == 10 spin wait in TestCount. |
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/TransformManyBlockTests.cs |
Adds 30s timeout + assertion around the OutputCount == 10 spin wait in TestCount. |
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/TransformManyBlockTests.IAsyncEnumerable.cs |
Adds 30s timeout + assertion around the OutputCount == 10 spin wait in TestCountAsyncEnumerable. |
Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Yes, purely code analysis — I scanned all |
|
@copilot why were spins running indefinitely in the first place? How does jitstress (makes things slower/changes scheduling) can change "finishes in reasonable time" to "never" Is the idea that there's an underlying test/product issue, this change at least is a guess at how to make test job as a whole not hang so we can see more. |
The spins aren't "logically" indefinite — the dataflow blocks will eventually process the posted messages. The issue is a resource starvation feedback loop specific to tight spin-waits under JIT stress:
So yes — there's no product correctness bug in Dataflow itself. The test design assumption ("this condition will be met quickly") breaks under extreme JIT stress because "quickly" degrades enough to exceed the outer timeout. This change makes individual tests fail with a clear assertion rather than taking the entire work item down, which at least gives us visibility into which specific test is lagging under stress. |
|
@copilot should these tests be spinning at all, since it holds a core? maybe something like spin for a while, then sleep for a bit, then spin .. ? don't change it just wondering if that would help. (I know in SpinWait itself it has its own policies of this kind already) |
|
Tagging subscribers to this area: @agocke, @VSadov |
There was a problem hiding this comment.
Pull request overview
This PR updates System.Threading.Tasks.Dataflow.Tests to avoid CI work-item timeouts under JIT stress by ensuring all SpinWait.SpinUntil usage is bounded and will fail tests cleanly instead of hanging indefinitely.
Changes:
- Added a shared
DataflowTestHelpers.SpinTimeoutMsconstant (30s) for spin-wait timeouts. - Updated previously unbounded
SpinWait.SpinUntil(...)calls to use the timeout and assert success (Assert.True(...)). - Applied the updates across the affected Transform* count tests and concurrency helper constructors.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/DataflowTestHelper.cs |
Introduces SpinTimeoutMs constant used by spin-wait-based tests. |
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/ConcurrentTests.cs |
Bounds spin-waits in helper constructors to prevent indefinite hangs. |
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/TransformBlockTests.cs |
Adds bounded spin-wait + assertion in TestCount. |
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/TransformManyBlockTests.cs |
Adds bounded spin-wait + assertion in TestCount. |
src/libraries/System.Threading.Tasks.Dataflow/tests/Dataflow/TransformManyBlockTests.IAsyncEnumerable.cs |
Adds bounded spin-wait + assertion in TestCountAsyncEnumerable. |
The root problem is that even with that back-off, under heavy JIT stress the thread pool threads processing the dataflow messages were themselves so slow that by the time |
System.Threading.Tasks.Dataflow.Testswas timing out underjitstress2-jitstressregsCI because 9SpinWait.SpinUntil()calls had no timeout. Under JIT stress, every method re-JITs repeatedly with extra register stress (~5–10× slower execution), creating a resource starvation feedback loop: the busy-loop nature of an unboundedSpinWaitcompetes for CPU with the thread pool threads that process dataflow messages — compounding enough to outlast the Helix work-item wall-clock timeout and kill the entire process. Note thatSpinWaitalready has its own internal back-off policy (spin → yield →Sleep(0)→Sleep(1)), but even with that back-off the total elapsed time exceeded the work-item timeout under this level of stress. There is no Dataflow correctness bug; the test design assumption ("this condition will be met quickly") simply breaks under extreme JIT stress.Description
SpinWait.SpinUntil()calls across 4 test filesAssert.True(...)so individual tests fail with a clear assertion rather than taking down the entire work itemDataflowTestHelpers.SpinTimeoutMsconstant inDataflowTestHelper.csto avoid magic number duplicationConcurrentTests.csTransformBlockTests.csTransformManyBlockTests.csTransformManyBlockTests.IAsyncEnumerable.csBefore:
After:
Where
DataflowTestHelpers.SpinTimeoutMs = 30_000is a shared constant defined inDataflowTestHelper.cs.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.