Skip to content

[ISSUE #10527] Reduce auxiliary component allocation via AtomicLong, string caches, StringBuilder reuse, and dirty flag#10528

Open
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/auxiliary-component-optimization
Open

[ISSUE #10527] Reduce auxiliary component allocation via AtomicLong, string caches, StringBuilder reuse, and dirty flag#10528
wang-jiahua wants to merge 1 commit into
apache:developfrom
wang-jiahua:perf/auxiliary-component-optimization

Conversation

@wang-jiahua

@wang-jiahua wang-jiahua commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Which Issue(s) This PR Fixes

Fixes #10527

Brief Description

Reduce allocation in auxiliary store components through five independent optimizations:

  1. QueueOffsetOperator — Replace ConcurrentMap<String, Long> with ConcurrentMap<String, AtomicLong>. Eliminates Long boxing on every queue offset update.
  2. BrokerStatsManager — Cache buildStatsKey/topicQueueKey/consumerOffset string results. Change Integer parameters to int in incQueue* methods to eliminate autoboxing.
  3. IndexService — Reuse StringBuilder via member field instead of allocating per-call.
  4. TimerWheel — Add volatile dirty flag to skip flush when no changes occurred since last flush.
  5. AppendMessageResult — Add constructor with pre-computed fields to avoid redundant allocation.

How Did You Test This Change?

  1. Unit tests: All 15 BrokerStatsManagerTest tests pass (including testOnTopicDeleted which exercises all incQueue* methods).
  2. Compilation: store module compiles cleanly on JDK 21.
  3. Commercial compatibility: No commercial classes extend QueueOffsetOperator, BrokerStatsManager, IndexService, TimerWheel, or AppendMessageResult.
  4. A/B benchmark (256 threads, 1KB sync send, Dragonwell JDK 21, 6g heap):
Metric Baseline (develop) Optimized (S4) Change
TPS ~209k ~198k ~-5% (within variance)
p99 RT 3ms 3ms unchanged
Young GCs (total) 388 312 -19.6%

S4 targets auxiliary components (stats, index, timer) that run on background threads, not on the send critical path. The significant GC reduction (-19.6%) demonstrates reduced allocation pressure, while TPS/RT remain unchanged because the send path is not the bottleneck being optimized. A longer-running benchmark or Consumer benchmark would better showcase S4's benefits.

…Long, string caches, StringBuilder reuse, and dirty flag

- QueueOffsetOperator: ConcurrentMap<String,Long> -> ConcurrentMap<String,AtomicLong>

- BrokerStatsManager: cache buildStatsKey/topicQueueKey/consumerOffset strings, Integer->int params

- IndexService: reusable StringBuilder via ThreadLocal

- TimerWheel: volatile dirty flag to skip unnecessary flush

- AppendMessageResult: constructor with pre-computed fields

- Fix BrokerStatsManagerTest: QUEUE_ID Integer->int
Copilot AI review requested due to automatic review settings June 17, 2026 15:14

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR focuses on reducing allocation/flush overhead in the store layer by introducing cached key builders, adding a “dirty” optimization for timer wheel flushing, and improving queue offset updates with atomic counters.

Changes:

  • Add key caching and primitive int queueId APIs in BrokerStatsManager to reduce stats-key allocation.
  • Optimize TimerWheel.flush() using a dirty flag and 8-byte aligned comparisons to reduce unnecessary IO/copying.
  • Replace queue offset maps from Long values to AtomicLong for safer concurrent increments; add key-building reuse in IndexService.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
store/src/test/java/org/apache/rocketmq/store/stats/BrokerStatsManagerTest.java Align test constant type (Integerint) with production API changes.
store/src/main/java/org/apache/rocketmq/store/timer/TimerWheel.java Add dirty gating and faster flush loop to reduce forced writes.
store/src/main/java/org/apache/rocketmq/store/stats/BrokerStatsManager.java Add multiple caches for stats key strings and switch queueId params to primitive int.
store/src/main/java/org/apache/rocketmq/store/queue/QueueOffsetOperator.java Use AtomicLong to avoid lost updates on concurrent offset increments; add snapshot/convert helpers.
store/src/main/java/org/apache/rocketmq/store/index/IndexService.java Reuse a StringBuilder for key construction to reduce temporary allocations.
store/src/main/java/org/apache/rocketmq/store/AppendMessageResult.java Add an additional constructor overload mirroring existing fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread store/src/main/java/org/apache/rocketmq/store/timer/TimerWheel.java
Comment thread store/src/main/java/org/apache/rocketmq/store/timer/TimerWheel.java
Comment thread store/src/main/java/org/apache/rocketmq/store/timer/TimerWheel.java

@RockteMQ-AI RockteMQ-AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by github-manager-bot

Summary

This PR reduces auxiliary component allocation in the store layer through five optimization strategies: reusable StringBuilder in IndexService, AtomicLong-based offset tracking in QueueOffsetOperator, string caching in BrokerStatsManager, dirty-flag flush gating in TimerWheel, and a new AppendMessageResult constructor overload. Overall direction is sound — reducing per-message allocation pressure in hot paths.

Findings

  • [Critical] IndexService.java:52Thread-safety violation with reusableKeyBuilder. IndexService is shared across threads (commitlog dispatch, query, etc.), and buildKey() is called from both queryOffset() (read lock) and buildIndex() (write lock). A shared mutable StringBuilder without synchronization will produce corrupted keys under concurrent access. The JVM may even reorder setLength(0) and append() calls across threads. This is a data corruption risk — keys will be silently wrong, causing index lookups to miss or return incorrect offsets.

    • Fix: Either use a ThreadLocal<StringBuilder>, pass the builder as a parameter from callers that already hold appropriate locks, or revert to the original string concatenation (the JIT will optimize topic + "#" + key into a StringBuilder anyway, and the allocation is short-lived).
  • [Critical] TimerWheel.java:58,137,151,303,313,326,332dirty flag race condition. The pattern if (!dirty) return; ... dirty = false; is a non-atomic check-then-act. Between the if (!dirty) guard and the final dirty = false, another thread calling setDirty(true) can have its update lost — the flush proceeds but then resets the flag, losing the knowledge that new data was written. Over time this can cause the timer wheel to skip flushes, leading to data loss on crash.

    • Fix: Use AtomicBoolean with compareAndSet(true, false) to atomically claim the dirty state, or synchronize the flush method. Alternatively, accept that some flushes may be redundant (safe) by checking dirty again after the flush.
  • [Warning] BrokerStatsManager.java:496Unbounded cache growth. The statsKeyByGroupCache and similar ConcurrentHashMap<String, String[]> caches grow without eviction. In deployments with thousands of topics × groups, this can accumulate significant heap. Consider adding a size cap or using a bounded cache (e.g., Caffeine with maximumSize).

  • [Info] QueueOffsetOperator.java — The change from Long to AtomicLong for offset values is a good correctness improvement (atomic increment vs. non-atomic read-modify-write). The getTopicQueueTable() snapshot copy is appropriate for iteration safety.

Suggestions

  1. The IndexService thread-safety issue is a blocker — please fix before merge.
  2. The TimerWheel dirty flag issue should also be addressed; consider AtomicBoolean.getAndSet(false) as a minimal fix.
  3. Consider adding a JMH benchmark for the hot paths to quantify the allocation reduction.

Automated review by github-manager-bot

@codecov-commenter

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 59.31034% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.07%. Comparing base (59a70d9) to head (958a954).
⚠️ Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
...pache/rocketmq/store/stats/BrokerStatsManager.java 55.42% 22 Missing and 15 partials ⚠️
...org/apache/rocketmq/store/AppendMessageResult.java 0.00% 10 Missing ⚠️
...ache/rocketmq/store/queue/QueueOffsetOperator.java 74.19% 7 Missing and 1 partial ⚠️
...va/org/apache/rocketmq/store/timer/TimerWheel.java 75.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop   #10528      +/-   ##
=============================================
- Coverage      48.19%   48.07%   -0.12%     
+ Complexity     13394    13381      -13     
=============================================
  Files           1377     1377              
  Lines         100730   100850     +120     
  Branches       13012    13042      +30     
=============================================
- Hits           48542    48487      -55     
- Misses         46254    46392     +138     
- Partials        5934     5971      +37     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Reduce auxiliary component allocation via AtomicLong, string caches, StringBuilder reuse, and dirty flag

4 participants