feat(backup_request): add rate-limited backup request policy (#3228)#3229
feat(backup_request): add rate-limited backup request policy (#3228)#3229feng-y wants to merge 1 commit intoapache:masterfrom
Conversation
|
@copilot review this PR |
There was a problem hiding this comment.
Pull request overview
Adds a rate-limiting mechanism for backup requests, aiming to prevent “backup request storms” by suppressing backups when the backup/total ratio exceeds a configurable threshold over a sliding time window.
Changes:
- Introduces
ChannelOptions.backup_request_max_ratioand creates an internal rate-limitedBackupRequestPolicywhen enabled. - Adds global gflags for ratio threshold, window size, and ratio cache update interval, backed by
bvar::Window+ cumulative counters for cold start. - Adds a unit test covering policy creation/precedence plus a timing-based functional check.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/brpc_channel_unittest.cpp | Adds coverage for rate-limited backup request behavior and configuration precedence. |
| src/brpc/channel.h | Adds per-channel configuration and internal ownership for an auto-created backup policy. |
| src/brpc/channel.cpp | Creates and wires an internal rate-limited backup policy during channel initialization. |
| src/brpc/backup_request_policy.h | Refactors BackupRequestPolicy into a concrete base with optional rate limiting support. |
| src/brpc/backup_request_policy.cpp | Implements rate limiter, gflags, and default policy behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/brpc/backup_request_policy.cpp:120
- When compare_exchange_strong fails on line 105, the code continues to use the potentially stale
ratiovalue loaded on line 102, even though another thread may have updated_cached_ratioon line 116. While the comment acknowledges this is "best-effort rate limiting," consider reloading_cached_ratioafter a failed compare_exchange to reduce the window of staleness, especially in high-concurrency scenarios where the stale value could lead to temporarily incorrect rate limiting decisions.
bool DoBackup(const Controller* /*controller*/) const override {
const int64_t now_us = butil::cpuwide_time_us();
int64_t last_us = _last_update_us.load(butil::memory_order_relaxed);
double ratio = _cached_ratio.load(butil::memory_order_relaxed);
if (now_us - last_us >= _update_interval_us) {
if (_last_update_us.compare_exchange_strong(
last_us, now_us, butil::memory_order_relaxed)) {
int64_t total = _total_window.get_value();
int64_t backup = _backup_window.get_value();
// Fall back to cumulative counts when the window has no
// sampled data yet (cold-start within the first few seconds).
if (total <= 0) {
total = _total_count.get_value();
backup = _backup_count.get_value();
}
ratio = (total > 0) ? static_cast<double>(backup) / total : 0.0;
_cached_ratio.store(ratio, butil::memory_order_relaxed);
}
}
return ratio < _max_backup_ratio;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
7b50fdc to
e6db035
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
test/brpc_channel_unittest.cpp:2856
- The comment says “ratio refreshes on every DoBackup call” but with
FLAGS_backup_request_ratio_update_interval_s = 1the ratio cache can only refresh at most once per second. This makes the test rationale misleading when debugging failures; please adjust the comment to match the actual behavior.
// Test 5: Functional — rate limiting reduces backup requests.
// Use short update interval so ratio refreshes on every DoBackup call.
// Save/restore gflags outside the inner scope so cleanup always runs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4ed6666 to
e614e33
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (4)
src/brpc/channel.cpp:284
- Consider checking if CreateRateLimitedBackupPolicy returns NULL before setting _owns_backup_policy to true. While the current guards (line 278 checks max_ratio > 0, validators ensure gflags are valid) should prevent NULL returns in practice, defensive programming suggests verifying the return value and only setting _owns_backup_policy if the policy was successfully created. This would make the code more robust against future changes to CreateRateLimitedBackupPolicy or the validators.
_options.backup_request_policy = CreateRateLimitedBackupPolicy(
_options.backup_request_ms, max_ratio,
FLAGS_backup_request_ratio_window_size_s,
FLAGS_backup_request_ratio_update_interval_s);
_owns_backup_policy = true;
test/brpc_channel_unittest.cpp:2868
- The gflag FLAGS_backup_request_max_ratio is saved and restored but never actually modified in this test. The test uses opt.backup_request_max_ratio = 0.3 (per-channel config) instead of the global gflag. The save/restore of saved_ratio at lines 2862 and 2868 is unnecessary and could be removed for clarity.
const double saved_ratio = brpc::FLAGS_backup_request_max_ratio;
brpc::FLAGS_backup_request_ratio_update_interval_s = 1;
brpc::FLAGS_backup_request_ratio_window_size_s = 2;
BRPC_SCOPE_EXIT {
brpc::FLAGS_backup_request_ratio_update_interval_s = saved_interval;
brpc::FLAGS_backup_request_ratio_window_size_s = saved_window;
brpc::FLAGS_backup_request_max_ratio = saved_ratio;
src/brpc/backup_request_policy.h:48
- The documentation states "The caller owns the returned pointer" but this is inconsistent with actual usage in channel.cpp where the Channel owns the policy when created via backup_request_max_ratio. Consider clarifying: "The caller owns the returned pointer and must delete it when done. However, when Channel creates this policy internally via backup_request_max_ratio, Channel manages the lifetime automatically."
// The caller owns the returned pointer.
test/brpc_channel_unittest.cpp:2853
- Missing test coverage for backup_request_max_ratio = 0 (explicitly disabled). While -1 means "use gflag default", 0 should explicitly disable rate limiting even if the gflag is set to enable it. Consider adding a test case that sets the gflag to a positive value, then creates a channel with opt.backup_request_max_ratio = 0, and verifies that no policy is created.
// Test 4: No policy when backup_request_ms < 0 (backup disabled)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = -1;
opt.backup_request_max_ratio = 0.3;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e614e33 to
27fcaee
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (3)
test/brpc_channel_unittest.cpp:2831
- Missing test coverage: The test doesn't cover the edge case where
backup_request_max_ratio = 0. According to the PR description, 0 should explicitly disable rate limiting. A test case similar to Test 2 but withmax_ratio = 0instead of-1would verify this behavior and ensure both forms of disabling work correctly.
// Test 2: No policy when ratio is disabled (-1)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = 10;
opt.backup_request_max_ratio = -1;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
test/brpc_channel_unittest.cpp:2831
- Missing test coverage: The test doesn't cover the special case where
backup_request_max_ratio = 1.0. According to the implementation (backup_request_policy.cpp:106-107), values >= 1.0 are treated as "no limit" since the ratio cannot exceed 1.0. A test should verify that setting max_ratio to 1.0 allows all backup requests, similar to having no rate limiting.
// Test 2: No policy when ratio is disabled (-1)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = 10;
opt.backup_request_max_ratio = -1;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
test/brpc_channel_unittest.cpp:2831
- Missing test coverage: The test doesn't cover the case where
backup_request_max_ratio > 1.0. According to channel.cpp:273-276, values > 1.0 should be clamped to 1.0 with a warning. A test should verify this clamping behavior works correctly and that the policy is still created with the clamped value.
// Test 2: No policy when ratio is disabled (-1)
{
brpc::ChannelOptions opt;
opt.backup_request_ms = 10;
opt.backup_request_max_ratio = -1;
brpc::Channel channel;
ASSERT_EQ(0, channel.Init(_ep, &opt));
ASSERT_TRUE(channel.options().backup_request_policy == NULL);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
27fcaee to
abe44a3
Compare
d9a2e0d to
efd2f73
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
efd2f73 to
95af5c8
Compare
95af5c8 to
2c97312
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2c97312 to
00aa7f1
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/brpc/backup_request_policy.cpp
Outdated
| bvar::Adder<int64_t> _total_count; | ||
| bvar::Adder<int64_t> _backup_count; | ||
| bvar::Window<bvar::Adder<int64_t>> _total_window; | ||
| bvar::Window<bvar::Adder<int64_t>> _backup_window; |
There was a problem hiding this comment.
Each BackupRateLimiter instance constructs two bvar::Adder reducers wrapped by bvar::Window, which schedules two sampler tasks that run periodically in the global sampler thread. With many channels/policies this adds O(#policies) per-second sampling overhead. If high channel counts are expected, consider documenting this cost and/or using a lighter-weight sliding-window counter implementation without per-instance samplers.
00aa7f1 to
56fa347
Compare
…3228) Add ratio-based rate limiting for backup requests to prevent backup request storms under high QPS or downstream latency spikes. Design: - BackupRequestPolicy interface unchanged (ABI stable) - BackupRateLimiter: standalone statistics module tracking backup/total ratio within a sliding time window using bvar counters - RateLimitedBackupPolicy: internal implementation composing BackupRateLimiter, hidden in .cpp - CreateRateLimitedBackupPolicy() factory function in header - ChannelOptions.backup_request_max_ratio: per-channel configuration Priority: backup_request_policy > backup_request_max_ratio > backup_request_ms - Channel auto-creates internal policy when max_ratio > 0 and no user policy; uses max_ratio > 0 as ownership marker for cleanup - 3 gflags with validators: backup_request_max_ratio, backup_request_ratio_window_size_s, backup_request_ratio_update_interval_s Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
56fa347 to
10ed5b0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/brpc/backup_request_policy.cpp:90
- Member initialization order in the constructor doesn't match the declaration order in the class. According to C++ standards, members are initialized in the order they are declared in the class, not the order in the initializer list. Currently, the constructor initializer list has _total_window and _backup_window being initialized before _total_count and _backup_count (which they reference), but in the class declaration (lines 127-136), _total_count and _backup_count are declared before the window members. This works correctly because the declaration order is correct, but the initializer list order should match to avoid confusion and potential warnings from some compilers with -Wreorder.
BackupRateLimiter(double max_backup_ratio,
int window_size_seconds,
int update_interval_seconds)
: _max_backup_ratio(max_backup_ratio)
, _update_interval_us(update_interval_seconds * 1000000LL)
, _total_window(&_total_count, window_size_seconds)
, _backup_window(&_backup_count, window_size_seconds)
, _cached_ratio(0.0)
, _last_update_us(0) {
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Add ratio-based rate limiting for backup requests to prevent backup request storms under high QPS or downstream latency spikes.
Design
BackupRequestPolicyinterface unchanged (ABI stable)BackupRateLimiter: standalone statistics module tracking backup/total ratio within a sliding time window usingbvar::Window<bvar::Adder>+ relaxed atomicsRateLimitedBackupPolicy: internal implementation composingBackupRateLimiter, hidden in.cppCreateRateLimitedBackupPolicy(): public factory function in header (validates inputs, returns NULL on error)ChannelOptions.backup_request_max_ratio: per-channel configurationbackup_request_policy>backup_request_max_ratio>backup_request_ms-1(default): use global gflagFLAGS_backup_request_max_ratio0: explicitly disabled(0, 1]: rate limiting enabled (1.0means no limit)_owns_backup_policybool; user-provided policy is never deletedbackup_request_max_ratio,backup_request_ratio_window_size_s,backup_request_ratio_update_interval_sEdge cases handled
backup_request_policytakes precedence (withLOG(WARNING)ifmax_ratioalso set)max_ratio == 1.0acts as no-limit (special-cased since ratio cannot exceed 1.0)max_ratio > 1.0clamped to 1.0 withLOG(WARNING)max_ratio == 0explicitly disables rate limiting even when global gflag is setLOG(ERROR); Channel checks NULL before setting ownershipCloses #3228
Usage
Test plan
max_ratio > 0max_ratio = -1max_ratio = 0explicitly disables even when global gflag setmax_ratio = 1.0creates policy (allows all backups)max_ratio > 1.0clamped to 1.0, policy still createdbackup_request_ms < 0🤖 Generated with Claude Code