Skip to content

disagg: bound stream retries and reopen large forward seeks (#10795)#10797

Merged
ti-chi-bot[bot] merged 4 commits intopingcap:release-nextgen-202603from
ti-chi-bot:cherry-pick-10795-to-release-nextgen-202603
Apr 13, 2026
Merged

disagg: bound stream retries and reopen large forward seeks (#10795)#10797
ti-chi-bot[bot] merged 4 commits intopingcap:release-nextgen-202603from
ti-chi-bot:cherry-pick-10795-to-release-nextgen-202603

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

This is an automated cherry-pick of #10795

What problem does this PR solve?

Issue Number: close #10794

Problem Summary:

S3RandomAccessFile could retry forever after a retryable stream-side read() / ignore() failure. In the reported OSS case, repeated 206 range responses could keep ending with stream errors, and the unbounded outer retry loop could stall DeltaMerge GC. Forward seek was also still draining the current HTTP body with ignore(), which is fragile for large skips.

What is changed and how it works?

S3RandomAccessFile: bound stream-side retries and normalize initialize retry state
S3RandomAccessFile: use 128 KiB threshold for forward seek reopen

The change does two things:

  • bounds stream-side retries in read() / seek() to max_retry = 3 total attempts, while keeping cur_retry scoped to per-initialize GetObject failures only
  • adds reopenAt(...) so backward seek and stream retry recovery reopen from a committed offset with a fresh initialize budget
  • changes forward seek to use a 128 KiB threshold: small skips still drain the current stream, while large skips reopen from the target offset directly
  • fixes accounting so reopen-based forward seek still records logical seek bytes, but does not charge skipped bytes into remote read bytes
  • extends MockS3Client to observe GetObject count and last range so the threshold split can be asserted in unit tests

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

Summary by CodeRabbit

  • New Features

    • Forward seeks that jump far now reopen the remote stream at the target, avoiding draining data.
  • Bug Fixes

    • Improved and bounded retry behavior for remote file read/seek operations; retry budget resets on reopen.
    • Seek/reopen logic now avoids charging skipped bytes as remote reads.
  • Observability

    • Added a profiling metric to track reopen-on-seek events.
  • Tests

    • Added tests covering retry, reopen-on-seek, and reopen failure scenarios.
    • Exposed test hooks to observe/reset remote GET call counts and ranges.
  • Chores

    • Added a failpoint to inject seek failures for testing.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-nextgen-202603 labels Apr 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • release-8.5
  • release-7.5
  • release-8.1

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bbe5c45c-0537-4489-9458-6b013f2c06e9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@JaySon-Huang
Copy link
Copy Markdown
Contributor

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Apr 13, 2026
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 13, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 13, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-04-13 07:32:46.671486249 +0000 UTC m=+1373571.876846305: ☑️ agreed by JaySon-Huang.
  • 2026-04-13 07:32:58.42560235 +0000 UTC m=+1373583.630962408: ☑️ agreed by JinheLin.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 13, 2026

@kolafish: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, JinheLin, kolafish

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,JinheLin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JaySon-Huang
Copy link
Copy Markdown
Contributor

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 13, 2026
@ti-chi-bot ti-chi-bot bot merged commit 6812cd6 into pingcap:release-nextgen-202603 Apr 13, 2026
5 checks passed
@ti-chi-bot ti-chi-bot bot deleted the cherry-pick-10795-to-release-nextgen-202603 branch April 13, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. type/cherry-pick-for-release-nextgen-202603

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants