Skip to content

feat(backup_request): add rate-limited backup request policy (#3228)#3229

Open
feng-y wants to merge 1 commit intoapache:masterfrom
feng-y:feat/rate-limited-backup-request
Open

feat(backup_request): add rate-limited backup request policy (#3228)#3229
feng-y wants to merge 1 commit intoapache:masterfrom
feng-y:feat/rate-limited-backup-request

Conversation

@feng-y
Copy link
Contributor

@feng-y feng-y commented Feb 25, 2026

Summary

Add ratio-based rate limiting for backup requests to prevent backup request storms under high QPS or downstream latency spikes.

Design

  • BackupRequestPolicy interface unchanged (ABI stable)
  • BackupRateLimiter: standalone statistics module tracking backup/total ratio within a sliding time window using bvar::Window<bvar::Adder> + relaxed atomics
  • RateLimitedBackupPolicy: internal implementation composing BackupRateLimiter, hidden in .cpp
  • CreateRateLimitedBackupPolicy(): public factory function in header (validates inputs, returns NULL on error)
  • ChannelOptions.backup_request_max_ratio: per-channel configuration
    • Priority: backup_request_policy > backup_request_max_ratio > backup_request_ms
    • -1 (default): use global gflag FLAGS_backup_request_max_ratio
    • 0: explicitly disabled
    • (0, 1]: rate limiting enabled (1.0 means no limit)
  • Ownership: Channel tracks via _owns_backup_policy bool; user-provided policy is never deleted
  • 3 gflags with validators: backup_request_max_ratio, backup_request_ratio_window_size_s, backup_request_ratio_update_interval_s

Edge cases handled

  • User-provided backup_request_policy takes precedence (with LOG(WARNING) if max_ratio also set)
  • max_ratio == 1.0 acts as no-limit (special-cased since ratio cannot exceed 1.0)
  • max_ratio > 1.0 clamped to 1.0 with LOG(WARNING)
  • max_ratio == 0 explicitly disables rate limiting even when global gflag is set
  • Channel re-Init correctly cleans up previously created internal policy
  • Idle channel: window returns 0 after inactivity → ratio defaults to 0 (backups allowed), no stale all-time fallback
  • Factory returns NULL on invalid params with LOG(ERROR); Channel checks NULL before setting ownership

Closes #3228

Usage

// Simple: just set the ratio, Channel handles the rest
brpc::ChannelOptions options;
options.backup_request_ms = 100;
options.backup_request_max_ratio = 0.3;  // limit backups to 30% of total
channel.Init("...", &options);

// Advanced: create policy manually for full control
auto* policy = brpc::CreateRateLimitedBackupPolicy(100, 0.3, 10, 5);
options.backup_request_policy = policy;  // user manages lifetime

Test plan

  • Test 1: Policy created when max_ratio > 0
  • Test 2: No policy when max_ratio = -1
  • Test 3: max_ratio = 0 explicitly disables even when global gflag set
  • Test 4: max_ratio = 1.0 creates policy (allows all backups)
  • Test 5: max_ratio > 1.0 clamped to 1.0, policy still created
  • Test 6: Custom policy preserved, not replaced
  • Test 7: No policy when backup_request_ms < 0
  • Test 8: Functional — 40 requests, backup count within upper bound (RAII gflag restore)

🤖 Generated with Claude Code

@feng-y
Copy link
Contributor Author

feng-y commented Feb 25, 2026

@copilot review this PR

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a rate-limiting mechanism for backup requests, aiming to prevent “backup request storms” by suppressing backups when the backup/total ratio exceeds a configurable threshold over a sliding time window.

Changes:

  • Introduces ChannelOptions.backup_request_max_ratio and creates an internal rate-limited BackupRequestPolicy when enabled.
  • Adds global gflags for ratio threshold, window size, and ratio cache update interval, backed by bvar::Window + cumulative counters for cold start.
  • Adds a unit test covering policy creation/precedence plus a timing-based functional check.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/brpc_channel_unittest.cpp Adds coverage for rate-limited backup request behavior and configuration precedence.
src/brpc/channel.h Adds per-channel configuration and internal ownership for an auto-created backup policy.
src/brpc/channel.cpp Creates and wires an internal rate-limited backup policy during channel initialization.
src/brpc/backup_request_policy.h Refactors BackupRequestPolicy into a concrete base with optional rate limiting support.
src/brpc/backup_request_policy.cpp Implements rate limiter, gflags, and default policy behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/brpc/backup_request_policy.cpp:120

  • When compare_exchange_strong fails on line 105, the code continues to use the potentially stale ratio value loaded on line 102, even though another thread may have updated _cached_ratio on line 116. While the comment acknowledges this is "best-effort rate limiting," consider reloading _cached_ratio after a failed compare_exchange to reduce the window of staleness, especially in high-concurrency scenarios where the stale value could lead to temporarily incorrect rate limiting decisions.
    bool DoBackup(const Controller* /*controller*/) const override {
        const int64_t now_us = butil::cpuwide_time_us();
        int64_t last_us = _last_update_us.load(butil::memory_order_relaxed);
        double ratio = _cached_ratio.load(butil::memory_order_relaxed);

        if (now_us - last_us >= _update_interval_us) {
            if (_last_update_us.compare_exchange_strong(
                    last_us, now_us, butil::memory_order_relaxed)) {
                int64_t total = _total_window.get_value();
                int64_t backup = _backup_window.get_value();
                // Fall back to cumulative counts when the window has no
                // sampled data yet (cold-start within the first few seconds).
                if (total <= 0) {
                    total = _total_count.get_value();
                    backup = _backup_count.get_value();
                }
                ratio = (total > 0) ? static_cast<double>(backup) / total : 0.0;
                _cached_ratio.store(ratio, butil::memory_order_relaxed);
            }
        }

        return ratio < _max_backup_ratio;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch 3 times, most recently from 7b50fdc to e6db035 Compare February 25, 2026 11:19
@feng-y feng-y requested a review from Copilot February 25, 2026 11:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

test/brpc_channel_unittest.cpp:2856

  • The comment says “ratio refreshes on every DoBackup call” but with FLAGS_backup_request_ratio_update_interval_s = 1 the ratio cache can only refresh at most once per second. This makes the test rationale misleading when debugging failures; please adjust the comment to match the actual behavior.
    // Test 5: Functional — rate limiting reduces backup requests.
    // Use short update interval so ratio refreshes on every DoBackup call.
    // Save/restore gflags outside the inner scope so cleanup always runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch 3 times, most recently from 4ed6666 to e614e33 Compare February 25, 2026 11:43
@feng-y feng-y requested a review from Copilot February 25, 2026 12:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (4)

src/brpc/channel.cpp:284

  • Consider checking if CreateRateLimitedBackupPolicy returns NULL before setting _owns_backup_policy to true. While the current guards (line 278 checks max_ratio > 0, validators ensure gflags are valid) should prevent NULL returns in practice, defensive programming suggests verifying the return value and only setting _owns_backup_policy if the policy was successfully created. This would make the code more robust against future changes to CreateRateLimitedBackupPolicy or the validators.
        _options.backup_request_policy = CreateRateLimitedBackupPolicy(
            _options.backup_request_ms, max_ratio,
            FLAGS_backup_request_ratio_window_size_s,
            FLAGS_backup_request_ratio_update_interval_s);
        _owns_backup_policy = true;

test/brpc_channel_unittest.cpp:2868

  • The gflag FLAGS_backup_request_max_ratio is saved and restored but never actually modified in this test. The test uses opt.backup_request_max_ratio = 0.3 (per-channel config) instead of the global gflag. The save/restore of saved_ratio at lines 2862 and 2868 is unnecessary and could be removed for clarity.
        const double saved_ratio = brpc::FLAGS_backup_request_max_ratio;
        brpc::FLAGS_backup_request_ratio_update_interval_s = 1;
        brpc::FLAGS_backup_request_ratio_window_size_s = 2;
        BRPC_SCOPE_EXIT {
            brpc::FLAGS_backup_request_ratio_update_interval_s = saved_interval;
            brpc::FLAGS_backup_request_ratio_window_size_s = saved_window;
            brpc::FLAGS_backup_request_max_ratio = saved_ratio;

src/brpc/backup_request_policy.h:48

  • The documentation states "The caller owns the returned pointer" but this is inconsistent with actual usage in channel.cpp where the Channel owns the policy when created via backup_request_max_ratio. Consider clarifying: "The caller owns the returned pointer and must delete it when done. However, when Channel creates this policy internally via backup_request_max_ratio, Channel manages the lifetime automatically."
// The caller owns the returned pointer.

test/brpc_channel_unittest.cpp:2853

  • Missing test coverage for backup_request_max_ratio = 0 (explicitly disabled). While -1 means "use gflag default", 0 should explicitly disable rate limiting even if the gflag is set to enable it. Consider adding a test case that sets the gflag to a positive value, then creates a channel with opt.backup_request_max_ratio = 0, and verifies that no policy is created.
    // Test 4: No policy when backup_request_ms < 0 (backup disabled)
    {
        brpc::ChannelOptions opt;
        opt.backup_request_ms = -1;
        opt.backup_request_max_ratio = 0.3;
        brpc::Channel channel;
        ASSERT_EQ(0, channel.Init(_ep, &opt));
        ASSERT_TRUE(channel.options().backup_request_policy == NULL);
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y feng-y requested a review from Copilot February 25, 2026 12:18
@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch from e614e33 to 27fcaee Compare February 25, 2026 12:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (3)

test/brpc_channel_unittest.cpp:2831

  • Missing test coverage: The test doesn't cover the edge case where backup_request_max_ratio = 0. According to the PR description, 0 should explicitly disable rate limiting. A test case similar to Test 2 but with max_ratio = 0 instead of -1 would verify this behavior and ensure both forms of disabling work correctly.
    // Test 2: No policy when ratio is disabled (-1)
    {
        brpc::ChannelOptions opt;
        opt.backup_request_ms = 10;
        opt.backup_request_max_ratio = -1;
        brpc::Channel channel;
        ASSERT_EQ(0, channel.Init(_ep, &opt));
        ASSERT_TRUE(channel.options().backup_request_policy == NULL);
    }

test/brpc_channel_unittest.cpp:2831

  • Missing test coverage: The test doesn't cover the special case where backup_request_max_ratio = 1.0. According to the implementation (backup_request_policy.cpp:106-107), values >= 1.0 are treated as "no limit" since the ratio cannot exceed 1.0. A test should verify that setting max_ratio to 1.0 allows all backup requests, similar to having no rate limiting.
    // Test 2: No policy when ratio is disabled (-1)
    {
        brpc::ChannelOptions opt;
        opt.backup_request_ms = 10;
        opt.backup_request_max_ratio = -1;
        brpc::Channel channel;
        ASSERT_EQ(0, channel.Init(_ep, &opt));
        ASSERT_TRUE(channel.options().backup_request_policy == NULL);
    }

test/brpc_channel_unittest.cpp:2831

  • Missing test coverage: The test doesn't cover the case where backup_request_max_ratio > 1.0. According to channel.cpp:273-276, values > 1.0 should be clamped to 1.0 with a warning. A test should verify this clamping behavior works correctly and that the policy is still created with the clamped value.
    // Test 2: No policy when ratio is disabled (-1)
    {
        brpc::ChannelOptions opt;
        opt.backup_request_ms = 10;
        opt.backup_request_max_ratio = -1;
        brpc::Channel channel;
        ASSERT_EQ(0, channel.Init(_ep, &opt));
        ASSERT_TRUE(channel.options().backup_request_policy == NULL);
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch from 27fcaee to abe44a3 Compare February 25, 2026 12:28
@feng-y feng-y requested a review from Copilot February 25, 2026 12:32
@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch 2 times, most recently from d9a2e0d to efd2f73 Compare February 25, 2026 12:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch from efd2f73 to 95af5c8 Compare February 25, 2026 12:47
@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch from 95af5c8 to 2c97312 Compare February 25, 2026 12:58
@feng-y feng-y requested a review from Copilot February 25, 2026 13:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch from 2c97312 to 00aa7f1 Compare February 25, 2026 13:25
@feng-y feng-y requested a review from Copilot February 25, 2026 13:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 124 to 127
bvar::Adder<int64_t> _total_count;
bvar::Adder<int64_t> _backup_count;
bvar::Window<bvar::Adder<int64_t>> _total_window;
bvar::Window<bvar::Adder<int64_t>> _backup_window;
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each BackupRateLimiter instance constructs two bvar::Adder reducers wrapped by bvar::Window, which schedules two sampler tasks that run periodically in the global sampler thread. With many channels/policies this adds O(#policies) per-second sampling overhead. If high channel counts are expected, consider documenting this cost and/or using a lighter-weight sliding-window counter implementation without per-instance samplers.

Copilot uses AI. Check for mistakes.
@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch from 00aa7f1 to 56fa347 Compare February 25, 2026 13:58
…3228)

Add ratio-based rate limiting for backup requests to prevent backup
request storms under high QPS or downstream latency spikes.

Design:
- BackupRequestPolicy interface unchanged (ABI stable)
- BackupRateLimiter: standalone statistics module tracking backup/total
  ratio within a sliding time window using bvar counters
- RateLimitedBackupPolicy: internal implementation composing
  BackupRateLimiter, hidden in .cpp
- CreateRateLimitedBackupPolicy() factory function in header
- ChannelOptions.backup_request_max_ratio: per-channel configuration
  Priority: backup_request_policy > backup_request_max_ratio > backup_request_ms
- Channel auto-creates internal policy when max_ratio > 0 and no user
  policy; uses max_ratio > 0 as ownership marker for cleanup
- 3 gflags with validators: backup_request_max_ratio,
  backup_request_ratio_window_size_s, backup_request_ratio_update_interval_s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@feng-y feng-y force-pushed the feat/rate-limited-backup-request branch from 56fa347 to 10ed5b0 Compare February 25, 2026 14:23
@feng-y feng-y requested a review from Copilot February 25, 2026 14:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/brpc/backup_request_policy.cpp:90

  • Member initialization order in the constructor doesn't match the declaration order in the class. According to C++ standards, members are initialized in the order they are declared in the class, not the order in the initializer list. Currently, the constructor initializer list has _total_window and _backup_window being initialized before _total_count and _backup_count (which they reference), but in the class declaration (lines 127-136), _total_count and _backup_count are declared before the window members. This works correctly because the declaration order is correct, but the initializer list order should match to avoid confusion and potential warnings from some compilers with -Wreorder.
    BackupRateLimiter(double max_backup_ratio,
                      int window_size_seconds,
                      int update_interval_seconds)
        : _max_backup_ratio(max_backup_ratio)
        , _update_interval_us(update_interval_seconds * 1000000LL)
        , _total_window(&_total_count, window_size_seconds)
        , _backup_window(&_backup_count, window_size_seconds)
        , _cached_ratio(0.0)
        , _last_update_us(0) {
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] 为 backup request 增加比例限制机制

2 participants