Split gc.cpp to multiple files based on the functionality - part 1#125703
Split gc.cpp to multiple files based on the functionality - part 1#125703janvorli wants to merge 43 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @agocke, @dotnet/gc |
There was a problem hiding this comment.
Pull request overview
This pull request appears to refactor CoreCLR GC implementation by moving substantial logic into new .cpp compilation units (which are then included from gc.cpp), separating concerns like sweeping, regions, no-GC mode, memory commit/decommit accounting, collection, initialization, and finalization.
Changes:
- Adds new GC implementation files for sweeping, regions (allocator/free list), no-GC region behavior, memory commit/decommit, collection, initialization, and finalization.
- Updates GC internals by removing an unused
VERIFY_HEAPmethod declaration fromgcpriv.h. - (Implied by file structure) Consolidates GC implementation via
#include-based composition of these new.cppunits.
Reviewed changes
Copilot reviewed 15 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/gc/sweep.cpp | Adds sweep/free-list building and related helpers (incl. FEATURE_BASICFREEZE RO sweep). |
| src/coreclr/gc/region_free_list.cpp | Adds region free-list management for USE_REGIONS. |
| src/coreclr/gc/region_allocator.cpp | Adds region allocation/deallocation logic for USE_REGIONS. |
| src/coreclr/gc/no_gc.cpp | Adds no-GC region support logic, including callback scheduling and region-extension helpers. |
| src/coreclr/gc/memory.cpp | Adds commit/decommit accounting and region decommit logic. |
| src/coreclr/gc/init.cpp | Adds GC initialization logic (including region range reservation/init under USE_REGIONS). |
| src/coreclr/gc/finalization.cpp | Adds finalization queue implementation and finalizer work scheduling. |
| src/coreclr/gc/collect.cpp | Adds GC collection logic (including region ephemeral/gc range computation under USE_REGIONS). |
| src/coreclr/gc/gcpriv.h | Removes set_batch_mark_array_bits declaration under VERIFY_HEAP. |
You can also share your feedback on Copilot code review. Take the survey.
| #ifndef USE_REGIONS | ||
| if ((settings.condemned_generation == max_generation) && ro_segments_in_range) | ||
| { | ||
| heap_segment* seg = generation_start_segment (generation_of (max_generation));; |
| dprintf(1, ("[no_gc_callback] calling enable_no_gc_callback with callback_threshold = %llu\n", callback_threshold)); | ||
| enable_no_gc_region_callback_status status = enable_no_gc_region_callback_status::succeed; | ||
| suspend_EE(); | ||
| { | ||
| if (!current_no_gc_region_info.started) | ||
| { | ||
| status = enable_no_gc_region_callback_status::not_started; | ||
| } | ||
| else if (current_no_gc_region_info.callback != nullptr) | ||
| { | ||
| status = enable_no_gc_region_callback_status::already_registered; | ||
| } |
| status = insufficient_budget; | ||
| } | ||
| if (dd_new_allocation (hp->dynamic_data_of (loh_generation)) <= (ptrdiff_t)loh_withheld_budget) | ||
| { | ||
| dprintf(1, ("[no_gc_callback] failed because of running out of loh budget= %llu\n", loh_withheld_budget)); | ||
| status = insufficient_budget; |
| *start = alloc; | ||
| *end = alloc + alloc_size; | ||
| ret = (alloc != NULL); | ||
|
|
||
| gc_etw_segment_type segment_type; | ||
|
|
||
| if (gen_num == loh_generation) | ||
| { | ||
| segment_type = gc_etw_segment_large_object_heap; | ||
| } | ||
| else if (gen_num == poh_generation) | ||
| { | ||
| segment_type = gc_etw_segment_pinned_object_heap; | ||
| } | ||
| else | ||
| { | ||
| segment_type = gc_etw_segment_small_object_heap; | ||
| } | ||
|
|
||
| FIRE_EVENT(GCCreateSegment_V1, (alloc + sizeof (aligned_plug_and_gap)), | ||
| size - sizeof (aligned_plug_and_gap), | ||
| segment_type); | ||
|
|
| { | ||
| uint32_t next_size = get_num_units (next_val); | ||
| free_block_size += next_size; | ||
| region_end += next_size; |
| uint32_t* busy_block; | ||
| uint32_t* free_block; | ||
| if (direction == 1) | ||
| { | ||
| busy_block = current_index; | ||
| free_block = current_index + num_units; | ||
| } | ||
| else | ||
| { | ||
| busy_block = current_index - num_units; | ||
| free_block = current_index - current_num_units; | ||
| } | ||
|
|
|
If there is interest in keeping A repo admin might be necessary to actually merge that because a squash would likely undo it. |
That sounds very interesting. I'll give that a try. |
|
If necessary for perf reasons (ie lost optimizations that LTCG doesn’t figure out when you try to compile separately in the next PR), you can use the UNITY_BUILD feature in CMake to force all of the files to be included in one C++ file in Release builds to maintain the current experience while providing a better dev story. |
|
@MichalStrehovsky it seems that git blame still works without any special handling. You just need to pass it the |
|
Hmm, I take it back, the git -C -C doesn't work. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split the monolithic gc.cpp (~57,000 lines) into 19 smaller files organized by functionality, reducing gc.cpp to ~8,300 lines of core infrastructure. New files are #included at the end of gc.cpp, preserving the single-compilation-unit build model. Uses rename-per-branch technique to preserve git blame history across the split. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The gc.cpp split uses rename-per-branch to preserve blame history. Adding the trim commits to .git-blame-ignore-revs lets git blame (and GitHub's web UI) see through them to the original gc.cpp authors. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR appears to split previously monolithic GC implementation code into multiple focused .cpp files under src/coreclr/gc/ (e.g., sweep/regions/no-GC/memory/init/finalization/collect) and updates .git-blame-ignore-revs to keep git blame attribution useful after the mechanical refactor.
Changes:
- Extract GC implementation areas into new compilation-unit fragments (included from
gc.cpp), such as sweeping, region allocation/free-list management, no-GC region logic, memory commit/decommit accounting, GC init, finalization, and collection logic. - Add region allocator/free-list implementations behind
USE_REGIONS. - Add blame-ignore entries for the mechanical “split gc.cpp” trimming commits.
Reviewed changes
Copilot reviewed 15 out of 21 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/gc/sweep.cpp | New split-out sweeping/free-list threading logic (plus RO segment sweeping under BASICFREEZE). |
| src/coreclr/gc/region_free_list.cpp | New region free-list bookkeeping, ordering, and sorting utilities (USE_REGIONS). |
| src/coreclr/gc/region_allocator.cpp | New region allocator implementation (USE_REGIONS). Contains correctness issues noted in PR comments. |
| src/coreclr/gc/no_gc.cpp | New split-out no-GC region and callback logic. |
| src/coreclr/gc/memory.cpp | New split-out commit/decommit accounting and decommit stepping logic (incl. regions). |
| src/coreclr/gc/init.cpp | New split-out GC initialization logic (incl. regions initial reservation). |
| src/coreclr/gc/finalization.cpp | New split-out finalization queue logic and finalizer work scheduling. |
| src/coreclr/gc/collect.cpp | New split-out GC collection driver logic and region ephemeral-range computation. |
| .git-blame-ignore-revs | Adds mechanical split/trim commits to blame-ignore list to preserve attribution. |
You can also share your feedback on Copilot code review. Take the survey.
| alloc = allocate (num_units, direction, fn); | ||
| *start = alloc; | ||
| *end = alloc + alloc_size; | ||
| ret = (alloc != NULL); |
| FIRE_EVENT(GCCreateSegment_V1, (alloc + sizeof (aligned_plug_and_gap)), | ||
| size - sizeof (aligned_plug_and_gap), | ||
| segment_type); |
| { | ||
| uint32_t next_size = get_num_units (next_val); | ||
| free_block_size += next_size; | ||
| region_end += next_size; |
| uint32_t current_val = *current_index; | ||
| assert (!is_unit_memory_free (current_val)); | ||
|
|
||
| dprintf (REGIONS_LOG, ("----DEL %d (%u units)-----", (*current_index - *region_map_left_start), current_val)); |
| uint32_t* busy_block; | ||
| uint32_t* free_block; | ||
| if (direction == 1) | ||
| { | ||
| busy_block = current_index; | ||
| free_block = current_index + num_units; | ||
| } | ||
| else |
| dprintf(1, ("[no_gc_callback] failed because of running out of soh budget= %llu\n", soh_withheld_budget)); | ||
| status = insufficient_budget; | ||
| } | ||
| if (dd_new_allocation (hp->dynamic_data_of (loh_generation)) <= (ptrdiff_t)loh_withheld_budget) | ||
| { | ||
| dprintf(1, ("[no_gc_callback] failed because of running out of loh budget= %llu\n", loh_withheld_budget)); |
| dprintf(1, ("[no_gc_callback] failed because of running out of soh budget= %llu\n", soh_withheld_budget)); | ||
| status = insufficient_budget; | ||
| } | ||
| if (dd_new_allocation (hp->dynamic_data_of (loh_generation)) <= (ptrdiff_t)loh_withheld_budget) | ||
| { | ||
| dprintf(1, ("[no_gc_callback] failed because of running out of loh budget= %llu\n", loh_withheld_budget)); |
| #ifndef USE_REGIONS | ||
| if ((settings.condemned_generation == max_generation) && ro_segments_in_range) | ||
| { | ||
| heap_segment* seg = generation_start_segment (generation_of (max_generation));; |
|
Copilot cli has reworked the PR based on the article mentioned. It also added a To make local git blame work the best, |
|
It requires git version >= 2.40 (older git doesn't seem to use the algorithm) |
|
And this PR needs to be merged without squashing. Otherwise the tracking would get lost |
This PR splits the monolithic gc.cpp (~57,000 lines) into 19 smaller files organized by functionality, reducing gc.cpp to ~8,300 lines of shared infrastructure. The new files are #included at the end of gc.cpp, preserving the
existing single-compilation-unit build model for this first step - no CMakeLists.txt changes were made.
New files
allocation.cppallocate_small/large, SOH/LOH/POH allocationbackground.cppcard_table.cppcollect.cppgarbage_collect,do_post_gcdiagnostics.cppverify_heap, ETW/diag walkingdynamic_heap_count.cppdynamic_tuning.cppfinalization.cppinit.cppgc_heap::initialize_gc)interface.cppGCHeapinterface implementation (public API surface)mark_phase.cppmemory.cppno_gc.cppplan_phase.cppregion_allocator.cppregion_free_list.cppregions_segments.cpprelocate_compact.cppsweep.cppDesign decisions
- Remaining in gc.cpp: Shared globals, macros, general-purpose helpers, and methods that don't clearly belong to one functional category stay in gc.cpp.
What's NOT changed
These changes were made by copilot cli with my supervision and reviewing. The next step will be to actually compile each of the files separately.