Skip to content

Split gc.cpp to multiple files based on the functionality - part 1#125703

Open
janvorli wants to merge 43 commits intodotnet:mainfrom
janvorli:gc-split
Open

Split gc.cpp to multiple files based on the functionality - part 1#125703
janvorli wants to merge 43 commits intodotnet:mainfrom
janvorli:gc-split

Conversation

@janvorli
Copy link
Member

This PR splits the monolithic gc.cpp (~57,000 lines) into 19 smaller files organized by functionality, reducing gc.cpp to ~8,300 lines of shared infrastructure. The new files are #included at the end of gc.cpp, preserving the
existing single-compilation-unit build model for this first step - no CMakeLists.txt changes were made.

New files

File Lines Scope
allocation.cpp 4,337 Allocation helpers, allocate_small/large, SOH/LOH/POH allocation
background.cpp 3,964 Background GC (concurrent mark, sweep, write watch support)
card_table.cpp 1,810 Card/brick table management, card bundling
collect.cpp 1,511 Collection entry points, garbage_collect, do_post_gc
diagnostics.cpp 1,615 Heap verification, verify_heap, ETW/diag walking
dynamic_heap_count.cpp 1,280 Dynamic heap count (server GC heap scaling)
dynamic_tuning.cpp 2,441 GC tuning heuristics, generation budgets, smoothing
finalization.cpp 520 Finalization queue management
init.cpp 1,350 GC initialization (gc_heap::initialize_gc)
interface.cpp 2,261 GCHeap interface implementation (public API surface)
mark_phase.cpp 3,728 Mark phase, mark stack, pinned object handling
memory.cpp 420 Virtual memory commit/decommit, address space management
no_gc.cpp 827 No-GC region support
plan_phase.cpp 7,374 Plan phase, plug/gap processing, generation planning
region_allocator.cpp 421 Region allocator (regions mode)
region_free_list.cpp 420 Region free list management
regions_segments.cpp 2,052 Segment/region lifecycle, segment mapping table
relocate_compact.cpp 1,977 Relocate and compact phases
sweep.cpp 527 Sweep phase, free list building

Design decisions

  • Single compilation unit preserved: All new files are #included from gc.cpp, so gcwks.cpp and gcsvr.cpp continue to compile everything as one TU. No build system changes required.
  • #ifdef guards maintained: Each moved method retains its original preprocessor guards (BACKGROUND_GC, USE_REGIONS, MULTIPLE_HEAPS, etc.).
    - Remaining in gc.cpp: Shared globals, macros, general-purpose helpers, and methods that don't clearly belong to one functional category stay in gc.cpp.

What's NOT changed

  • No functional/behavioral changes — this is a pure code reorganization
  • No changes to headers (gcpriv.h, gc.h, etc.)
  • No changes to the build system (CMakeLists.txt)
  • All existing #ifdef nesting is preserved

These changes were made by copilot cli with my supervision and reviewing. The next step will be to actually compile each of the files separately.

@janvorli janvorli added this to the 11.0.0 milestone Mar 18, 2026
@janvorli janvorli self-assigned this Mar 18, 2026
Copilot AI review requested due to automatic review settings March 18, 2026 00:54
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke, @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request appears to refactor CoreCLR GC implementation by moving substantial logic into new .cpp compilation units (which are then included from gc.cpp), separating concerns like sweeping, regions, no-GC mode, memory commit/decommit accounting, collection, initialization, and finalization.

Changes:

  • Adds new GC implementation files for sweeping, regions (allocator/free list), no-GC region behavior, memory commit/decommit, collection, initialization, and finalization.
  • Updates GC internals by removing an unused VERIFY_HEAP method declaration from gcpriv.h.
  • (Implied by file structure) Consolidates GC implementation via #include-based composition of these new .cpp units.

Reviewed changes

Copilot reviewed 15 out of 21 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/coreclr/gc/sweep.cpp Adds sweep/free-list building and related helpers (incl. FEATURE_BASICFREEZE RO sweep).
src/coreclr/gc/region_free_list.cpp Adds region free-list management for USE_REGIONS.
src/coreclr/gc/region_allocator.cpp Adds region allocation/deallocation logic for USE_REGIONS.
src/coreclr/gc/no_gc.cpp Adds no-GC region support logic, including callback scheduling and region-extension helpers.
src/coreclr/gc/memory.cpp Adds commit/decommit accounting and region decommit logic.
src/coreclr/gc/init.cpp Adds GC initialization logic (including region range reservation/init under USE_REGIONS).
src/coreclr/gc/finalization.cpp Adds finalization queue implementation and finalizer work scheduling.
src/coreclr/gc/collect.cpp Adds GC collection logic (including region ephemeral/gc range computation under USE_REGIONS).
src/coreclr/gc/gcpriv.h Removes set_batch_mark_array_bits declaration under VERIFY_HEAP.

You can also share your feedback on Copilot code review. Take the survey.

#ifndef USE_REGIONS
if ((settings.condemned_generation == max_generation) && ro_segments_in_range)
{
heap_segment* seg = generation_start_segment (generation_of (max_generation));;
Comment on lines +6 to +17
dprintf(1, ("[no_gc_callback] calling enable_no_gc_callback with callback_threshold = %llu\n", callback_threshold));
enable_no_gc_region_callback_status status = enable_no_gc_region_callback_status::succeed;
suspend_EE();
{
if (!current_no_gc_region_info.started)
{
status = enable_no_gc_region_callback_status::not_started;
}
else if (current_no_gc_region_info.callback != nullptr)
{
status = enable_no_gc_region_callback_status::already_registered;
}
Comment on lines +62 to +67
status = insufficient_budget;
}
if (dd_new_allocation (hp->dynamic_data_of (loh_generation)) <= (ptrdiff_t)loh_withheld_budget)
{
dprintf(1, ("[no_gc_callback] failed because of running out of loh budget= %llu\n", loh_withheld_budget));
status = insufficient_budget;
Comment on lines +326 to +348
*start = alloc;
*end = alloc + alloc_size;
ret = (alloc != NULL);

gc_etw_segment_type segment_type;

if (gen_num == loh_generation)
{
segment_type = gc_etw_segment_large_object_heap;
}
else if (gen_num == poh_generation)
{
segment_type = gc_etw_segment_pinned_object_heap;
}
else
{
segment_type = gc_etw_segment_small_object_heap;
}

FIRE_EVENT(GCCreateSegment_V1, (alloc + sizeof (aligned_plug_and_gap)),
size - sizeof (aligned_plug_and_gap),
segment_type);

{
uint32_t next_size = get_num_units (next_val);
free_block_size += next_size;
region_end += next_size;
Comment on lines +248 to +260
uint32_t* busy_block;
uint32_t* free_block;
if (direction == 1)
{
busy_block = current_index;
free_block = current_index + num_units;
}
else
{
busy_block = current_index - num_units;
free_block = current_index - current_num_units;
}

@MichalStrehovsky
Copy link
Member

If there is interest in keeping git blame working, it might be possible to commit this in a way that preserves it: https://devblogs.microsoft.com/oldnewthing/20190916-00/?p=102892

A repo admin might be necessary to actually merge that because a squash would likely undo it.

@janvorli
Copy link
Member Author

If there is interest in keeping git blame working, it might be possible to commit this in a way that preserves it: https://devblogs.microsoft.com/oldnewthing/20190916-00/?p=102892

That sounds very interesting. I'll give that a try.

@jkoritzinsky
Copy link
Member

If necessary for perf reasons (ie lost optimizations that LTCG doesn’t figure out when you try to compile separately in the next PR), you can use the UNITY_BUILD feature in CMake to force all of the files to be included in one C++ file in Release builds to maintain the current experience while providing a better dev story.

@janvorli
Copy link
Member Author

@MichalStrehovsky it seems that git blame still works without any special handling. You just need to pass it the -C -C option. Copilot says github UI doesn't use that option though, so it would only work from the command line. Not sure if the way suggested in the article you've shared would work in github too. But i'll give it a try anyways. Btw, the blame in the github UI never worked for the gc.cpp, as it always times out because the file is so huge. So I wonder if it would have any benefit even if I did that based on the article.

@janvorli
Copy link
Member Author

Hmm, I take it back, the git -C -C doesn't work.

janvorli and others added 15 commits March 18, 2026 13:56
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
janvorli and others added 20 commits March 18, 2026 13:56
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split the monolithic gc.cpp (~57,000 lines) into 19 smaller files
organized by functionality, reducing gc.cpp to ~8,300 lines of
core infrastructure. New files are #included at the end of gc.cpp,
preserving the single-compilation-unit build model.

Uses rename-per-branch technique to preserve git blame history
across the split.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The gc.cpp split uses rename-per-branch to preserve blame history.
Adding the trim commits to .git-blame-ignore-revs lets git blame
(and GitHub's web UI) see through them to the original gc.cpp authors.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 18, 2026 13:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR appears to split previously monolithic GC implementation code into multiple focused .cpp files under src/coreclr/gc/ (e.g., sweep/regions/no-GC/memory/init/finalization/collect) and updates .git-blame-ignore-revs to keep git blame attribution useful after the mechanical refactor.

Changes:

  • Extract GC implementation areas into new compilation-unit fragments (included from gc.cpp), such as sweeping, region allocation/free-list management, no-GC region logic, memory commit/decommit accounting, GC init, finalization, and collection logic.
  • Add region allocator/free-list implementations behind USE_REGIONS.
  • Add blame-ignore entries for the mechanical “split gc.cpp” trimming commits.

Reviewed changes

Copilot reviewed 15 out of 21 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/coreclr/gc/sweep.cpp New split-out sweeping/free-list threading logic (plus RO segment sweeping under BASICFREEZE).
src/coreclr/gc/region_free_list.cpp New region free-list bookkeeping, ordering, and sorting utilities (USE_REGIONS).
src/coreclr/gc/region_allocator.cpp New region allocator implementation (USE_REGIONS). Contains correctness issues noted in PR comments.
src/coreclr/gc/no_gc.cpp New split-out no-GC region and callback logic.
src/coreclr/gc/memory.cpp New split-out commit/decommit accounting and decommit stepping logic (incl. regions).
src/coreclr/gc/init.cpp New split-out GC initialization logic (incl. regions initial reservation).
src/coreclr/gc/finalization.cpp New split-out finalization queue logic and finalizer work scheduling.
src/coreclr/gc/collect.cpp New split-out GC collection driver logic and region ephemeral-range computation.
.git-blame-ignore-revs Adds mechanical split/trim commits to blame-ignore list to preserve attribution.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +325 to +328
alloc = allocate (num_units, direction, fn);
*start = alloc;
*end = alloc + alloc_size;
ret = (alloc != NULL);
Comment on lines +345 to +347
FIRE_EVENT(GCCreateSegment_V1, (alloc + sizeof (aligned_plug_and_gap)),
size - sizeof (aligned_plug_and_gap),
segment_type);
{
uint32_t next_size = get_num_units (next_val);
free_block_size += next_size;
region_end += next_size;
uint32_t current_val = *current_index;
assert (!is_unit_memory_free (current_val));

dprintf (REGIONS_LOG, ("----DEL %d (%u units)-----", (*current_index - *region_map_left_start), current_val));
Comment on lines +248 to +255
uint32_t* busy_block;
uint32_t* free_block;
if (direction == 1)
{
busy_block = current_index;
free_block = current_index + num_units;
}
else
Comment on lines +61 to +66
dprintf(1, ("[no_gc_callback] failed because of running out of soh budget= %llu\n", soh_withheld_budget));
status = insufficient_budget;
}
if (dd_new_allocation (hp->dynamic_data_of (loh_generation)) <= (ptrdiff_t)loh_withheld_budget)
{
dprintf(1, ("[no_gc_callback] failed because of running out of loh budget= %llu\n", loh_withheld_budget));
Comment on lines +61 to +66
dprintf(1, ("[no_gc_callback] failed because of running out of soh budget= %llu\n", soh_withheld_budget));
status = insufficient_budget;
}
if (dd_new_allocation (hp->dynamic_data_of (loh_generation)) <= (ptrdiff_t)loh_withheld_budget)
{
dprintf(1, ("[no_gc_callback] failed because of running out of loh budget= %llu\n", loh_withheld_budget));
#ifndef USE_REGIONS
if ((settings.condemned_generation == max_generation) && ro_segments_in_range)
{
heap_segment* seg = generation_start_segment (generation_of (max_generation));;
@janvorli
Copy link
Member Author

Copilot cli has reworked the PR based on the article mentioned. It also added a .git-blame-ignore-revs file that enables github blame to work properly by ignoring the changes in this PR in the blame.

To make local git blame work the best, git config diff.algorithm histogram can be used. The default diff is not quite good in properly detecting all moves.

@janvorli
Copy link
Member Author

It requires git version >= 2.40 (older git doesn't seem to use the algorithm)

@janvorli
Copy link
Member Author

And this PR needs to be merged without squashing. Otherwise the tracking would get lost

@jkoritzinsky jkoritzinsky added the NO-SQUASH The PR should not be squashed label Mar 18, 2026
@janvorli janvorli requested review from VSadov and jkotas March 18, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-GC-coreclr NO-SQUASH The PR should not be squashed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants