Skip to content

branch-4.0: [opt](s3) Skip S3 listing for deterministic file paths using HEAD requests #60414#60911

Open
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-60414-branch-4.0
Open

branch-4.0: [opt](s3) Skip S3 listing for deterministic file paths using HEAD requests #60414#60911
github-actions[bot] wants to merge 1 commit intobranch-4.0from
auto-pick-60414-branch-4.0

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #60414

…uests (#60414)

## Summary

- For S3 paths without wildcards (`*`, `?`, `[...]`), use HEAD requests
instead of ListObjectsV2 to avoid requiring `s3:ListBucket` permission
- Brace patterns like `{1..10}` are expanded to concrete file paths and
verified individually with HEAD requests
- This enables loading data from S3 when only `s3:GetObject` permission
is granted

## Motivation

S3 `ListBucket` permission is often more restricted than `GetObject` in
enterprise environments. When users specify exact file paths or
deterministic patterns like `file{1..3}.csv`, listing is unnecessary
since the file names can be determined from the input.

## Changes

| File | Description |
|------|-------------|
| `S3Util.java` | Added `isDeterministicPattern()` to detect paths
without wildcards, and `expandBracePatterns()` to expand brace patterns
to concrete paths |
| `S3ObjStorage.java` | Modified `globListInternal()` to use HEAD
requests for deterministic paths |
| `S3UtilTest.java` | Added unit tests for new utility methods |

## Examples

| Path | Deterministic? | Behavior |
|------|----------------|----------|
| `s3://bucket/data/file.csv` | ✅ Yes | Single HEAD request |
| `s3://bucket/data/file{1..3}.csv` | ✅ Yes | 3 HEAD requests |
| `s3://bucket/data/*.csv` | ❌ No | Falls back to LIST |

## Test Plan

- [x] Added unit tests for `isDeterministicPattern()`
- [x] Added unit tests for `expandBracePatterns()`
- [ ] Manual testing with S3 TVF and Broker Load

🤖 Generated with [Claude Code](https://claude.ai/code)
@github-actions github-actions bot requested a review from yiguolei as a code owner February 28, 2026 08:03
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Feb 28, 2026
@hello-stephen
Copy link
Contributor

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants