WIP/adapter: bypass catalog_transact_inner for storage usage updates (25x speedup)#35436
Draft
aljoscha wants to merge 14 commits intoMaterializeInc:mainfrom
Draft
WIP/adapter: bypass catalog_transact_inner for storage usage updates (25x speedup)#35436aljoscha wants to merge 14 commits intoMaterializeInc:mainfrom
aljoscha wants to merge 14 commits intoMaterializeInc:mainfrom
Conversation
Documenting the finding that storage_usage_update blocks the coordinator main loop for seconds in large environments, and a proposal to bypass catalog_transact_inner since the durable ID allocator it uses is unused. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace PROMPT.md with storage-usage coordinator blocking investigation instructions. Create storage-usage-log.md for recording session findings. Modeled after the ddl-perf prompt/log workflow: each milestone gets committed, re-read PROMPT.md after context compaction, update log and design doc as findings emerge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Baseline measurements (optimized build, --storage-usage-collection-interval-sec=10s): - ~2.4k shards: ~51ms/cycle - ~5k shards: ~150ms/cycle - ~10k shards: ~499ms/cycle Scaling is linear with shard count. Also updated setup notes: always use --optimized builds for measurements (debug builds are too slow for reliable cycle timing), and the collection interval is a startup flag not an ALTER SYSTEM SET variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added timing logs to storage_usage_update and catalog::transact to identify the dominant cost of coordinator blocking during storage usage collection. Results at 10k shards (optimized build): - transact_inner op loop: ~430ms (91%) - Open transaction: ~10ms (2%) - Persist commit: ~6ms (1.3%) - Group commit: ~15ms (3%) - Oracle: ~2ms (0.4%) The fix (bypassing catalog_transact_inner) should reduce total from ~475ms to ~17ms. Updated design doc and log with findings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
…peedup) Rewrote storage_usage_update to bypass catalog_transact_inner entirely. Instead of creating N Op::WeirdStorageUsageUpdates ops and routing them through the catalog transaction machinery (which incurs a durable persist transaction just to bump an unused id allocator), we now: 1. Get a write timestamp from the oracle (unchanged, ~2ms) 2. Pack N BuiltinTableUpdate rows directly via pack_storage_usage_update 3. Submit via builtin_table_update().execute() (group commit, ~18ms) The durable STORAGE_USAGE_ID_ALLOC_KEY allocator is replaced with a cheap local counter (storage_usage_next_id on the Coordinator). All rows within one collection batch share the same id so that consumers can identify which rows were collected together. Results at 10k shards (optimized build): - Before: ~499ms per collection cycle - After: ~20ms per collection cycle - Speedup: 25x The Op::WeirdStorageUsageUpdates variant and durable id allocator are now dead code and can be cleaned up in a follow-up. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…fix results Added measured cost breakdown (the op loop is 91% of the cost, not persist I/O), scaling data across shard counts, prototype fix results (25x speedup at 10k shards), and remaining cleanup items. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove Op::WeirdStorageUsageUpdates, its transact_op match arm, the weird_builtin_table_update return path, allocate_storage_usage_ids, STORAGE_USAGE_ID_ALLOC_KEY, and the diagnostic info! log from storage_usage_update. All of these became dead code when the storage usage path was changed to bypass catalog_transact_inner and pack builtin table rows directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove storage_usage IdAllocKey entries from catalog test snapshots to match the removal of storage usage code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
368e078 to
67e53b9
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rewrote storage_usage_update to bypass catalog_transact_inner entirely.
Instead of creating N Op::WeirdStorageUsageUpdates ops and routing them
through the catalog transaction machinery (which incurs a durable persist
transaction just to bump an unused id allocator), we now:
The durable STORAGE_USAGE_ID_ALLOC_KEY allocator is replaced with a cheap
local counter (storage_usage_next_id on the Coordinator). The id column in
mz_storage_usage_by_shard is unused by any view or query.
Results at 10k shards (optimized build):
The Op::WeirdStorageUsageUpdates variant and durable id allocator are now
dead code and can be cleaned up in a follow-up.