Skip to content

WIP/adapter: bypass catalog_transact_inner for storage usage updates (25x speedup)#35436

Draft
aljoscha wants to merge 14 commits intoMaterializeInc:mainfrom
aljoscha:design-storage-usage-blocking
Draft

WIP/adapter: bypass catalog_transact_inner for storage usage updates (25x speedup)#35436
aljoscha wants to merge 14 commits intoMaterializeInc:mainfrom
aljoscha:design-storage-usage-blocking

Conversation

@aljoscha
Copy link
Contributor

Rewrote storage_usage_update to bypass catalog_transact_inner entirely.
Instead of creating N Op::WeirdStorageUsageUpdates ops and routing them
through the catalog transaction machinery (which incurs a durable persist
transaction just to bump an unused id allocator), we now:

  1. Get a write timestamp from the oracle (unchanged, ~2ms)
  2. Pack N BuiltinTableUpdate rows directly via pack_storage_usage_update
  3. Submit via builtin_table_update().execute() (group commit, ~18ms)

The durable STORAGE_USAGE_ID_ALLOC_KEY allocator is replaced with a cheap
local counter (storage_usage_next_id on the Coordinator). The id column in
mz_storage_usage_by_shard is unused by any view or query.

Results at 10k shards (optimized build):

  • Before: ~499ms per collection cycle
  • After: ~20ms per collection cycle
  • Speedup: 25x

The Op::WeirdStorageUsageUpdates variant and durable id allocator are now
dead code and can be cleaned up in a follow-up.

aljoscha and others added 6 commits March 3, 2026 15:35
Documenting the finding that storage_usage_update blocks the
coordinator main loop for seconds in large environments, and a
proposal to bypass catalog_transact_inner since the durable ID
allocator it uses is unused.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace PROMPT.md with storage-usage coordinator blocking investigation
instructions. Create storage-usage-log.md for recording session findings.

Modeled after the ddl-perf prompt/log workflow: each milestone gets committed,
re-read PROMPT.md after context compaction, update log and design doc as
findings emerge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Baseline measurements (optimized build, --storage-usage-collection-interval-sec=10s):
- ~2.4k shards: ~51ms/cycle
- ~5k shards: ~150ms/cycle
- ~10k shards: ~499ms/cycle

Scaling is linear with shard count. Also updated setup notes:
always use --optimized builds for measurements (debug builds are too
slow for reliable cycle timing), and the collection interval is a
startup flag not an ALTER SYSTEM SET variable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Added timing logs to storage_usage_update and catalog::transact to
identify the dominant cost of coordinator blocking during storage
usage collection.

Results at 10k shards (optimized build):
- transact_inner op loop: ~430ms (91%)
- Open transaction: ~10ms (2%)
- Persist commit: ~6ms (1.3%)
- Group commit: ~15ms (3%)
- Oracle: ~2ms (0.4%)

The fix (bypassing catalog_transact_inner) should reduce total from
~475ms to ~17ms. Updated design doc and log with findings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

aljoscha and others added 8 commits March 11, 2026 18:01
…peedup)

Rewrote storage_usage_update to bypass catalog_transact_inner entirely.
Instead of creating N Op::WeirdStorageUsageUpdates ops and routing them
through the catalog transaction machinery (which incurs a durable persist
transaction just to bump an unused id allocator), we now:

1. Get a write timestamp from the oracle (unchanged, ~2ms)
2. Pack N BuiltinTableUpdate rows directly via pack_storage_usage_update
3. Submit via builtin_table_update().execute() (group commit, ~18ms)

The durable STORAGE_USAGE_ID_ALLOC_KEY allocator is replaced with a cheap
local counter (storage_usage_next_id on the Coordinator). All rows within
one collection batch share the same id so that consumers can identify which
rows were collected together.

Results at 10k shards (optimized build):
- Before: ~499ms per collection cycle
- After:  ~20ms per collection cycle
- Speedup: 25x

The Op::WeirdStorageUsageUpdates variant and durable id allocator are now
dead code and can be cleaned up in a follow-up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…fix results

Added measured cost breakdown (the op loop is 91% of the cost, not
persist I/O), scaling data across shard counts, prototype fix results
(25x speedup at 10k shards), and remaining cleanup items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove Op::WeirdStorageUsageUpdates, its transact_op match arm, the
weird_builtin_table_update return path, allocate_storage_usage_ids,
STORAGE_USAGE_ID_ALLOC_KEY, and the diagnostic info! log from
storage_usage_update. All of these became dead code when the storage
usage path was changed to bypass catalog_transact_inner and pack
builtin table rows directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove storage_usage IdAllocKey entries from catalog test snapshots
to match the removal of storage usage code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aljoscha aljoscha force-pushed the design-storage-usage-blocking branch from 368e078 to 67e53b9 Compare March 11, 2026 17:02
@aljoscha aljoscha changed the title WIP/adapter: don't allocate expensive IDs for storage-usage updates WIP/adapter: bypass catalog_transact_inner for storage usage updates (25x speedup) Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant