Skip to content

disagg: adapt GC merge fan-in after S3 read failures#10800

Merged
ti-chi-bot[bot] merged 5 commits intopingcap:masterfrom
JaySon-Huang:issue-10794-followup-impl
Apr 14, 2026
Merged

disagg: adapt GC merge fan-in after S3 read failures#10800
ti-chi-bot[bot] merged 5 commits intopingcap:masterfrom
JaySon-Huang:issue-10794-followup-impl

Conversation

@JaySon-Huang
Copy link
Copy Markdown
Contributor

@JaySon-Huang JaySon-Huang commented Apr 14, 2026

What problem does this PR solve?

Issue Number: close #10798, ref #10794

Problem Summary:

Background GC merges can still fan in too many segments after bounded S3 stream retries are exhausted. That keeps a single GC merge exposed to long remote-read windows, while medium forward seeks still prefer draining the existing stream instead of reopening from the target offset.

What is changed and how it works?

33e66a4a07 disagg: tighten S3 forward seek reopen threshold
bd070ca5fe disagg: classify retry-exhausted S3 reads
7cbe1147ae disagg: adapt GC merge fan-in after S3 errors
085cb158be disagg: clarify S3 retry exhaustion flow
1f87ade691 disagg: narrow GC cap adjustment triggers
  • tighten the S3 forward-seek reopen threshold from 128 KiB to 64 KiB and update the boundary tests
  • let S3RandomAccessFile::read and S3RandomAccessFile::seek throw ErrorCodes::S3_ERROR after the bounded stream retry budget is exhausted, with doc comments for the public APIs and concise comments around the retry-exhaustion branch
  • add a table-level gc_mergeable_segments_cap so background GC merge fan-in is capped and can be reduced after S3_ERROR
  • only reduce the cap when gcTrySegmentMerge fails with S3_ERROR, and recover the cap only after checkSegmentUpdate succeeds
  • add failpoints and unit tests for cap limiting, S3-error-triggered reduction, recovery, and non-S3 failures

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features

    • Background garbage collection now dynamically adjusts merge concurrency based on S3 error conditions, automatically recovering when operations succeed.
    • S3 operations now throw explicit error codes when retry budgets are exhausted, improving error handling clarity.
  • Chores

    • Adjusted S3 stream forwarding threshold from 128KB to 64KB for optimized performance.

@ti-chi-bot ti-chi-bot bot added the release-note-none Denotes a PR that doesn't merit a release note. label Apr 14, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai bot commented Apr 14, 2026

Review Complete

Findings: 0 issues
Posted: 0
Duplicates/Skipped: 0

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 14, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

This PR introduces adaptive capacity management for DeltaMerge GC merges and bounded retry exhaustion handling for S3 reads. When S3 errors occur during background GC, the mergeable segments cap reduces incrementally; on successful merges, it recovers. S3 stream retry failures now throw ErrorCodes::S3_ERROR instead of retrying indefinitely, preventing GC thread stalls.

Changes

Cohort / File(s) Summary
Failpoint Registration
dbms/src/Common/FailPoint.cpp
Registered two new failpoint identifiers for testing S3 and generic GC merge errors.
GC Capacity Management
dbms/src/Storages/DeltaMerge/DeltaMergeStore.h, dbms/src/Storages/DeltaMerge/DeltaMergeStore_InternalBg.cpp
Added atomic gc_mergeable_segments_cap with default/min/recover-step constants. Implemented reduceGcMergeableSegmentsCap() and recoverGcMergeableSegmentsCap() to adaptively adjust merge parallelism; cap halves on S3 errors and increments on success. Integrated failpoint hooks and cap enforcement in merge retry logic.
S3 Error Handling
dbms/src/Storages/S3/S3RandomAccessFile.h, dbms/src/Storages/S3/S3RandomAccessFile.cpp
Added S3_ERROR error code and throwRetryExhaustedError() method to bound stream-side retries. Lowered s3_forward_seek_reopen_threshold from 128 KB to 64 KB. Modified retry logic to throw S3_ERROR when bounded retry budget is exhausted instead of retrying forever.
Test Coverage
dbms/src/Storages/DeltaMerge/tests/gtest_dm_store_background.cpp, dbms/src/Storages/S3/tests/gtest_s3file.cpp
Added three new GC cap tests: MergeableSegmentsCapLimitsFanIn, S3ErrorReducesCapAndSuccessfulMergeRecoversIt, NonS3ErrorDoesNotReduceCap. Updated S3 file tests to assert exception throwing on retry exhaustion and adjusted forward-seek thresholds.

Sequence Diagram(s)

sequenceDiagram
    participant GC as DeltaMerge GC Thread
    participant Merge as gcTrySegmentMerge()
    participant S3 as S3RandomAccessFile
    participant Store as DeltaMergeStore

    GC->>Merge: onSyncGc() - attempt merge
    Merge->>S3: read()/seek() with bounded retries
    alt S3 succeeds
        S3-->>Merge: data/offset
        Merge->>Store: recoverGcMergeableSegmentsCap()
        Store->>Store: cap += recover_step (up to default)
        Merge-->>GC: success
    else S3 retry exhausted
        S3->>S3: throwRetryExhaustedError()
        S3-->>Merge: Exception(S3_ERROR)
        Merge->>GC: Exception propagates to onSyncGc
        GC->>Merge: catch S3_ERROR
        GC->>Store: reduceGcMergeableSegmentsCap()
        Store->>Store: cap = cap/2 (down to min)
        GC->>GC: rethrow, retry later with lower cap
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size/XL

Suggested reviewers

  • JinheLin
  • Lloyd-Pottiger
  • CalvinNeo

Poem

🐰 A cap that shrinks when S3 fails,
Then grows again when merge prevails,
No infinite loops stall our way,
Bounded retries save the day! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adapting GC merge fan-in after S3 read failures.
Linked Issues check ✅ Passed The PR directly addresses issue #10794 by implementing all required objectives: bounded S3 retry handling, cap-based GC fan-in limiting, error-specific cap recovery logic, and comprehensive unit tests.
Out of Scope Changes check ✅ Passed All changes are focused on resolving issue #10794: S3 retry bounds, GC merge cap management, error handling, and related tests. No unrelated modifications detected.
Description check ✅ Passed PR description includes all required sections: problem statement with issue numbers, detailed change explanation with commit messages, completed checklist showing unit tests included, and acknowledgment of side effects and documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@pantheon-ai pantheon-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Code looks good. No issues found.

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Apr 14, 2026
@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-nextgen-20251011

@ti-chi-bot
Copy link
Copy Markdown
Member

@JaySon-Huang: once the present PR merges, I will cherry-pick it on top of release-nextgen-20251011 in the new PR and assign it to you.

Details

In response to this:

/cherry-pick release-nextgen-20251011

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@JaySon-Huang
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-nextgen-202603

@ti-chi-bot
Copy link
Copy Markdown
Member

@JaySon-Huang: once the present PR merges, I will cherry-pick it on top of release-nextgen-202603 in the new PR and assign it to you.

Details

In response to this:

/cherry-pick release-nextgen-202603

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 14, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CalvinNeo, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 14, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 14, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-14 04:33:13.973872342 +0000 UTC m=+1449199.179232399: ☑️ agreed by CalvinNeo.
  • 2026-04-14 05:14:13.1598933 +0000 UTC m=+1451658.365253347: ☑️ agreed by JinheLin.

@ti-chi-bot ti-chi-bot bot merged commit 45331bf into pingcap:master Apr 14, 2026
10 checks passed
@ti-chi-bot
Copy link
Copy Markdown
Member

@JaySon-Huang: new pull request created to branch release-nextgen-202603: #10803.

Details

In response to this:

/cherry-pick release-nextgen-202603

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown
Member

@JaySon-Huang: new pull request created to branch release-nextgen-20251011: #10804.

Details

In response to this:

/cherry-pick release-nextgen-20251011

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

ti-chi-bot bot pushed a commit that referenced this pull request Apr 14, 2026
ref #10794, close #10798

33e66a4 disagg: tighten S3 forward seek reopen threshold
bd070ca disagg: classify retry-exhausted S3 reads
7cbe114 disagg: adapt GC merge fan-in after S3 errors
085cb15 disagg: clarify S3 retry exhaustion flow
1f87ade disagg: narrow GC cap adjustment triggers

Co-authored-by: JaySon-Huang <tshent@qq.com>
ti-chi-bot bot pushed a commit that referenced this pull request Apr 14, 2026
ref #10794, close #10798

33e66a4 disagg: tighten S3 forward seek reopen threshold
bd070ca disagg: classify retry-exhausted S3 reads
7cbe114 disagg: adapt GC merge fan-in after S3 errors
085cb15 disagg: clarify S3 retry exhaustion flow
1f87ade disagg: narrow GC cap adjustment triggers

Co-authored-by: JaySon-Huang <tshent@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S3RandomAccessFile still spends excessive time on medium seeks and long GC merges under persistent OSS timeouts

4 participants