disagg: bound stream retries and reopen large forward seeks (#10795)#10797
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/hold |
[LGTM Timeline notifier]Timeline:
|
|
@kolafish: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JaySon-Huang, JinheLin, kolafish The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/unhold |
6812cd6
into
pingcap:release-nextgen-202603
This is an automated cherry-pick of #10795
What problem does this PR solve?
Issue Number: close #10794
Problem Summary:
S3RandomAccessFilecould retry forever after a retryable stream-sideread()/ignore()failure. In the reported OSS case, repeated206range responses could keep ending with stream errors, and the unbounded outer retry loop could stall DeltaMerge GC. Forward seek was also still draining the current HTTP body withignore(), which is fragile for large skips.What is changed and how it works?
The change does two things:
read()/seek()tomax_retry = 3total attempts, while keepingcur_retryscoped to per-initializeGetObjectfailures onlyreopenAt(...)so backward seek and stream retry recovery reopen from a committed offset with a fresh initialize budget128 KiBthreshold: small skips still drain the current stream, while large skips reopen from the target offset directlyMockS3Clientto observeGetObjectcount and last range so the threshold split can be asserted in unit testsCheck List
Tests
Side effects
Documentation
Release note
Summary by CodeRabbit
New Features
Bug Fixes
Observability
Tests
Chores