Skip to content

Fix GC live lock issue in rapid UI switching#10856

Merged
jonathanpeppers merged 4 commits intomainfrom
copilot/fix-gc-live-lock-issue-again
Mar 13, 2026
Merged

Fix GC live lock issue in rapid UI switching#10856
jonathanpeppers merged 4 commits intomainfrom
copilot/fix-gc-live-lock-issue-again

Conversation

Copy link
Contributor

Copilot AI commented Feb 24, 2026

Fixes a GC live lock that occurs when an app goes through rapid UI section switching, causing a large number of Java proxy objects to be allocated in a short time.

Root Cause

When GRefs exceed the critical threshold (~46,800), GC.Collect() is triggered repeatedly but cannot free enough memory because:

  • Java proxy objects have finalizers that release GRefs, but those finalizers run on the Finalizer Thread
  • Every Full GC triggers a Gen2GcCallback which runs ArrayPool<T>.Shared trimming on the Finalizer Thread
  • The Finalizer Thread is busy with pool trimming, so Java proxy finalizers can't run → GRefs don't get released → GRef count remains critical → triggers another Full GC → cycle repeats

Fix

In AndroidRuntime.CreateGlobalReference, added GC.WaitForPendingFinalizers() before GC.Collect() when the GRef threshold is reached (src/Mono.Android/Android.Runtime/AndroidRuntime.cs).

This ensures the finalizer queue (including Java proxy object finalizers) drains before the GC runs, giving GRefs a chance to be released and breaking the live lock cycle.

This code path is only triggered when GRefs are already at the critical threshold (>= gref_gc_threshold), so the synchronous wait is acceptable — the app is already in a degraded state at that point and GC.Collect() was already being called.

Original prompt

This section details on the original issue you should resolve

<issue_title>GC live lock</issue_title>
<issue_description>### Android framework version

net9.0-android, net10.0-android (Preview)

Affected platform version

.NET 9, .NET 10

Description

We have an application that sometimes ends up in a vicious GC cycle live lock. The bug happens when the app goes through rapid UI section switching which results in a large number of Java proxy objects being allocated in a short time. What happens next is the following:

  • GRef exhaustion: The number of Java Global References (GRefs) exceeds 90% of the system threshold (approximately > 46,800). Every subsequent GRef allocation (essentially any access to Java objects) triggers a Full GC.
  • Full GC triggers a relatively expensive GC Bridge, where garbage collection is synchronized across both .NET and Java. This involves complex logic to reconcile references from one managed heap to the other; roughly 22k objects (give or take) enter this process.
  • The GC bridge clears almost nothing, so the GRef count remains critical. The system gets stuck in a loop of constant garbage collection.
  • Finalizer Bottleneck: The lack of memory recovery is partly because Java objects have finalizers, all of which run on the Finalizer Thread.
  • The Finalizer Thread is preoccupied because every Full GC triggers a Gen2GcCallback, which runs trimming of various ArrayPool<T>.Shared array pools.

This creates a vicious cycle from which the application cannot recover.

Steps to Reproduce

See above. We don't have repro app but we do have GC dumps and sampling profiler traces from the app.

Did you find any workaround?

No, but I believe it may potentially help to call GC.WaitForPendingFinalizers() before GC.Collect() when the GRef threshold is reached here:

Relevant log output

```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

@jonathanpeppers
Copy link
Member

There is a memory usage issue in dotnet/dotnet/main when on Linux, copilot can't run right now:

     0>MSBUILD : error MSB4166: Child node "4" exited prematurely. Shutting down. Diagnostic information may be found in files in "/tmp/MSBuildTempZhEi6q/" and will be named MSBuild_*.failure.txt. This location can be changed by setting the MSBUILDDEBUGPATH environment variable to a different directory.

@jonathanpeppers
Copy link
Member

@copilot try again, you encountered an error

… GRef threshold is reached

Co-authored-by: jonathanpeppers <840039+jonathanpeppers@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 5, 2026

@copilot try again, you encountered an error

Done — commit 59ad351 adds GC.WaitForPendingFinalizers() before GC.Collect() in AndroidRuntime.CreateGlobalReference when the GRef threshold is reached. This lets Java proxy object finalizers drain (releasing their GRefs) before the next full GC, helping break the vicious cycle described in the issue.

Copilot AI changed the title [WIP] Fix GC live lock issue in rapid UI switching Fix GC live lock issue in rapid UI switching Mar 5, 2026
@jonathanpeppers jonathanpeppers marked this pull request as ready for review March 10, 2026 16:58
Copilot AI review requested due to automatic review settings March 10, 2026 16:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses a GC live-lock scenario in .NET for Android when rapid UI switching causes a surge of Java proxy allocations and Global References (GRefs), leading to repeated full GCs that can’t make progress because finalizers aren’t draining.

Changes:

  • When the GRef “critical” threshold is reached in global reference creation, wait for pending finalizers before forcing a full GC.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test failure we can ignore:

Request did not throw cancellation exception; threw: System.AggregateException: AggregateException_ctor_DefaultMessage (Connection failure)
---> System.Net.Http.HttpRequestException: Connection failure
---> Java.Net.ConnectException: Failed to connect to /10.255.255.1:80

In general, I think we should merge this and try it in .NET 11.

@jonathanpeppers jonathanpeppers merged commit 38e5789 into main Mar 13, 2026
5 of 6 checks passed
@jonathanpeppers jonathanpeppers deleted the copilot/fix-gc-live-lock-issue-again branch March 13, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GC live lock

4 participants