Skip to content

Protect first C# diagnostic computation from cancellation to fix bimodal allocation in speedometer#13011

Draft
ToddGrun wants to merge 1 commit intomainfrom
dev/toddgrun/BiModalDiagnosticPerformance
Draft

Protect first C# diagnostic computation from cancellation to fix bimodal allocation in speedometer#13011
ToddGrun wants to merge 1 commit intomainfrom
dev/toddgrun/BiModalDiagnosticPerformance

Conversation

@ToddGrun
Copy link
Copy Markdown
Contributor

@ToddGrun ToddGrun commented Apr 5, 2026

*** In draft mode until I get several speedometer runs indicating this addresses the issue ***

The RazorEditingTests.CompletionInCohostingForComponents speedometer test exhibits bimodal CLR_BytesAllocated_NonDevenv behavior for the html completion scenario. This behavior has existed since the creation of the test. Most speedometer sessions don't experience any bad runs, but sometimes 1 or 2 bad runs occur where there is a marked increase in non-devenv allocations SDK insertions seemingly hit this hard, with all five runs experiencing the poor allocation behavior.

Inspection of the traces leads to a large number of operation cancelled exceptions in the bad traces, leading to partially completed diagnostic requests to be tossed. AI indicated that Roslyn's IncrementalMemberEditAnalyzer has a heavy performance cost until it's completed once for a file. With the test hammering away typing and committing completions, the diagnostic requests get cancelled before any can complete.

*** AI's description of this change ***
The root cause is a bistable race condition in Roslyn's IncrementalMemberEditAnalyzer. This analyzer caches the last successfully analyzed document in a single WeakReference<Document?> field. When cached, subsequent analyses are incremental (~50ms). When not cached, every analysis is a full recomputation (~1-3s). The problem: full analyses are slow enough to be cancelled by the next incoming diagnostic request, preventing the cache from ever being set — a self-reinforcing failure mode.

Razor bypasses Roslyn's pull diagnostics infrastructure (which has its own protections) and calls DiagnosticAnalyzerService.GetDiagnosticsForSpanAsync directly with the caller's cancellation token. The VS LSP client cancels in-flight diagnostic requests whenever a new request is issued, creating the rapid cancellation cycle.

The fix uses AsyncLazy to ensure the first diagnostic computation per document runs with CancellationToken.None, allowing it to complete and bootstrap the analyzer. Subsequent requests pass through with normal cancellation semantics since the analyzer is now fast. A double-checked locking pattern with a volatile field guards the single-entry bootstrap cache. The initiator returns the bootstrap result directly; concurrent callers await the shared computation then make their own fresh call. On cancellation, the bootstrap is preserved (the factory continues running). On fault, it is cleared under a ReferenceEquals identity guard to avoid clobbering a different document's bootstrap.

*** Speedometer graph from the last several days of runs ***
image

…dal allocation in speedometer

The RazorEditingTests.CompletionInCohostingForComponents speedometer test exhibits bimodal CLR_BytesAllocated_NonDevenv behavior for the html completion scenario. This behavior has existed since the creation of the test. Most speedometer sessions don't experience any bad runs, but sometimes 1 or 2 bad runs occur where there is a marked increase in non-devenv allocations, and seemingly sdk insertions hit this hard, with all five runs experiencing the poor allocation behavior.

Inspection of the traces leads to a large number of operation cancelled exceptions in the bad traces, leading to partially completed diagnostic requests to be tossed. AI indicated that Roslyn's IncrementalMemberEditAnalyzer has a heavy performance cost until it's completed once for a file. With the test hammering away typing and committing completions, the diagnostic requests get cancelled before any can complete.

*** AI's description of this change ***
The root cause is a bistable race condition in Roslyn's IncrementalMemberEditAnalyzer. This analyzer caches the last successfully
 analyzed document in a single WeakReference<Document?> field. When cached,  subsequent analyses are incremental (~50ms). When not cached, every analysis is a full recomputation (~1-3s). The problem: full analyses are slow enough to be cancelled by the next incoming diagnostic request, preventing the cache from ever being set — a self-reinforcing failure mode.

Razor bypasses Roslyn's pull diagnostics infrastructure (which has its own protections) and calls DiagnosticAnalyzerService.GetDiagnosticsForSpanAsync directly with the caller's cancellation token. The VS LSP client cancels
 in-flight diagnostic requests whenever a new request is issued, creating the rapid cancellation cycle.

The fix uses AsyncLazy to ensure the first diagnostic computation per document runs with CancellationToken.None, allowing it to complete and bootstrap the analyzer. Subsequent requests pass through with normal cancellation semantics since the analyzer is now fast. A double-checked locking pattern with a volatile field guards the single-entry bootstrap cache. The initiator returns the bootstrap result directly; concurrent callers await the shared computation then make their own fresh call. On cancellation, the bootstrap is preserved (the factory continues running). On fault, it is cleared under a ReferenceEquals identity guard to avoid clobbering a different document's bootstrap.
@ToddGrun
Copy link
Copy Markdown
Contributor Author

ToddGrun commented Apr 6, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@ToddGrun
Copy link
Copy Markdown
Contributor Author

ToddGrun commented Apr 7, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant