Skip to content

Resolve Parquet shard count via bucket index to optimize storage calls#7648

Open
SungJin1212 wants to merge 1 commit into
cortexproject:masterfrom
SungJin1212:parquet-shard-count-from-bucket-index
Open

Resolve Parquet shard count via bucket index to optimize storage calls#7648
SungJin1212 wants to merge 1 commit into
cortexproject:masterfrom
SungJin1212:parquet-shard-count-from-bucket-index

Conversation

@SungJin1212

Copy link
Copy Markdown
Member

What this PR does:
This PR updates the Parquet shard resolution logic to utilize the bucket index, reducing the number of object storage calls.

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

…e calls

Signed-off-by: SungJin1212 <tjdwls1201@gmail.com>
@SungJin1212 SungJin1212 force-pushed the parquet-shard-count-from-bucket-index branch from f7b225e to 0fdab66 Compare June 26, 2026 07:53
shardCounts := make(map[string]int, len(blockIDs))

if p.bucketIndexEnabled {
idx, err := bucketindex.ReadIndex(ctx, p.indexBucket, p.userID, p.limits, p.logger)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cached with some TTL instead? Or we would rather have a separate goroutine to sync bucket index periodically rather than resolving it at query time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use bucketindex.Loader it has built-in caching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants