Skip to content

Feat:2395 Per-key rate-limit overrides: configure distinct rate, burst, and period for specific keys/subgraphs; applied at runtime.#2409

Open
rajendersaini wants to merge 9 commits intowundergraph:mainfrom
rajendersaini:feat_2395_rate_limiter_override_local_test
Open

Feat:2395 Per-key rate-limit overrides: configure distinct rate, burst, and period for specific keys/subgraphs; applied at runtime.#2409
rajendersaini wants to merge 9 commits intowundergraph:mainfrom
rajendersaini:feat_2395_rate_limiter_override_local_test

Conversation

@rajendersaini
Copy link

@rajendersaini rajendersaini commented Dec 11, 2025

Here's the trimmed-down version:


Rate Limiter: Per-Key Overrides for simple_strategy

Problem

The existing rate limiter applies a single global limit to all traffic. This is too coarse for API monetisation use cases where different clients need different limits based on their subscription tier.

Example: A SaaS platform with free (5 req/10s), pro (60 req/min), and enterprise (1000 req/min) tiers had no way to express this before this change.


Solution

Adds an overrides map to simple_strategy. Each entry maps a fully-qualified rate-limit key to a custom rate, burst, and period. The router resolves the key per-request via key_suffix_expression, looks it up in the overrides map, and applies the override if found — otherwise the global default is used.


Configuration Example

rate_limit:
  enabled: true
  key_suffix_expression: "request.auth.claims.client_id != nil ? 'id_' + request.auth.claims.client_id : ''"
  storage:
    urls:
      - 'redis://localhost:6379/0'
    key_prefix: 'cosmo_rate_limit'
  simple_strategy:
    rate: 5        # default: free tier
    burst: 5
    period: 10s
    reject_exceeding_requests: true
    reject_status_code: 429
    overrides:
      'cosmo_rate_limit:id_pro_client':
        rate: 60
        burst: 60
        period: 1m
      'cosmo_rate_limit:id_enterprise_client':
        rate: 1000
        burst: 1000
        period: 1m

A JWT with "client_id": "pro_client" gets 60 req/min; everyone else gets the global default — no code changes, purely config-driven.


Design Decisions & Trade-offs

Decision Rationale
Exact-key match only Simple and predictable; avoids ambiguity of wildcard matching
Key format mirrors Redis key Easy to audit via redis-cli; no separate translation step
Overrides compiled at startup Conversion happens once, keeping the hot path allocation-free
No dynamic reload Requires a router restart; dynamic propagation can be added later

Testing

Scenario Config Expected
Paying client 1 req / 10s override Req 1 → 200, Req 2 → 429
Pro client 2 req / min override Req 1-2 → 200, Req 3 → 429
Free user Global 5 req / 10s Req 1-5 → 200, Req 6 → 429

Unit tests in ratelimiter_test.go cover exact-match override, fallback to global limit, and JWT claim extraction. No behavioural change when overrides is omitted.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

Adds per-key rate-limit overrides: new config types and schema, conversion helper in graph server, override storage and lookup in the rate limiter (including new interface and generateKey return), and tests updated to exercise suffix and override behavior.

Changes

Cohort / File(s) Summary
Configuration
router/pkg/config/config.go, router/pkg/config/config.schema.json
Added RateLimitSimpleOverride and Overrides map[string]RateLimitSimpleOverride to RateLimitSimpleStrategy; schema extended with overrides property requiring rate, burst, and period (duration ≥ 1s).
Rate limiter core
router/core/ratelimiter.go
Introduced public RateLimitOverride and internal redisLimiter interface; CosmoRateLimiter now uses redisLimiter, stores keyOverrides map[string]redis_rate.Limit, and CosmoRateLimiterOptions gained Overrides. generateKey now returns (key, suffix, error); added limitFor to pick per-key overrides or default limits; updated limiter construction to apply overrides.
Rate limiter tests
router/core/ratelimiter_test.go
Updated tests for new generateKey signature (capture suffix) and added/adjusted cases validating suffix extraction, override lookups, default fallback, and end-to-end override behavior (including Redis-backed scenarios).
Graph server integration
router/core/graph_server.go
Added rateLimitOverridesFromConfig helper to convert config RateLimitSimpleOverride entries to runtime RateLimitOverride/redis_rate.Limit and wired overrides into Cosmo rate limiter construction (Redis path).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to: generateKey signature change propagation and error-return shape (ensure callers accept the extra suffix).
  • Verify correctness of override key trimming/mapping when converting config to runtime limits (rateLimitOverridesFromConfig).
  • Review limitFor logic and interactions between default limits and per-key overrides (burst/period handling).
  • Inspect tests for coverage of override/no-override and error paths.
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding per-key rate-limit overrides with configurable rate, burst, and period for specific keys/subgraphs applied at runtime, which aligns with the changeset across all modified files.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a53bb35 and 1b08158.

📒 Files selected for processing (5)
  • router/core/graph_server.go (3 hunks)
  • router/core/ratelimiter.go (4 hunks)
  • router/core/ratelimiter_test.go (2 hunks)
  • router/pkg/config/config.go (2 hunks)
  • router/pkg/config/config.schema.json (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/pkg/config/config.go
  • router/core/graph_server.go
  • router/pkg/config/config.schema.json
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router/pkg/config/config.go
  • router/core/graph_server.go
📚 Learning: 2025-11-12T18:38:46.619Z
Learnt from: StarpTech
Repo: wundergraph/cosmo PR: 2327
File: graphqlmetrics/core/metrics_service.go:435-442
Timestamp: 2025-11-12T18:38:46.619Z
Learning: In graphqlmetrics/core/metrics_service.go, the opGuardCache TTL (30 days) is intentionally shorter than the gql_metrics_operations database retention period (90 days). This design reduces memory usage while relying on database constraints to prevent duplicate inserts after cache expiration.

Applied to files:

  • router/core/graph_server.go
📚 Learning: 2025-07-21T15:06:36.664Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2067
File: router/pkg/config/config.schema.json:1637-1644
Timestamp: 2025-07-21T15:06:36.664Z
Learning: In the Cosmo router project, when extending JSON schema validation for security-sensitive fields like JWKS secrets, backwards compatibility is maintained by implementing warnings in the Go code rather than hard validation constraints in the schema. This allows existing configurations to continue working while alerting users to potential security issues.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-07-21T14:46:34.879Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2067
File: router/pkg/authentication/jwks_token_decoder.go:80-106
Timestamp: 2025-07-21T14:46:34.879Z
Learning: In the Cosmo router project, required field validation for JWKS configuration (Secret, Algorithm, KeyId) is handled at the JSON schema level in config.schema.json rather than through runtime validation in the Go code at router/pkg/authentication/jwks_token_decoder.go.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-06-30T20:39:02.387Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.387Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router/core/ratelimiter_test.go
📚 Learning: 2025-08-26T17:19:28.178Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.178Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.

Applied to files:

  • router/core/ratelimiter_test.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.23+ minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/core/ratelimiter.go
📚 Learning: 2025-09-24T12:54:00.765Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2222
File: router-tests/websocket_test.go:2238-2302
Timestamp: 2025-09-24T12:54:00.765Z
Learning: The wundergraph/cosmo project uses Go 1.25 (Go 1.25 minimum), so fmt.Appendf and other newer Go standard library functions are available and can be used without compatibility concerns.

Applied to files:

  • router/core/ratelimiter.go
🧬 Code graph analysis (3)
router/core/graph_server.go (4)
router/pkg/config/config.go (1)
  • RateLimitSimpleOverride (566-570)
router/core/ratelimiter.go (1)
  • RateLimitOverride (89-93)
router/pkg/logging/logging.go (1)
  • BufferedLogger (129-132)
router/internal/graphqlmetrics/prometheus_exporter.go (1)
  • PrometheusMetricsExporter (14-16)
router/core/ratelimiter_test.go (1)
router/core/ratelimiter.go (3)
  • NewCosmoRateLimiter (38-72)
  • CosmoRateLimiterOptions (27-36)
  • RateLimitOverride (89-93)
router/core/ratelimiter.go (3)
router/internal/expr/expr.go (1)
  • Context (35-39)
router/internal/expr/resolvers.go (1)
  • ResolveStringExpression (22-34)
demo/pkg/subgraphs/family/subgraph/employees.go (1)
  • String (8-10)
🔇 Additional comments (14)
router/core/graph_server.go (3)

1422-1422: Wire-up looks correct.

The overrides are properly converted and passed to the rate limiter options during initialization.


537-542: LGTM! Formatting changes only.

The alignment changes to the graphMux struct fields have no semantic impact.


117-130: LGTM! Clean conversion function with appropriate nil handling.

The function correctly converts config-level overrides to runtime overrides with proper map pre-allocation and nil return for empty inputs. The configuration schema validates all override fields—rate and burst must be integers ≥ 1, and period must be a string (parsed to time.Duration), with all three fields required—making the direct field copy safe.

router/pkg/config/config.go (3)

556-564: LGTM! Clean addition of override support.

The Overrides field properly extends RateLimitSimpleStrategy to support per-key rate limit configuration with appropriate YAML/JSON tags.


362-382: LGTM! Formatting changes only.

The alignment adjustments to EngineDebugConfiguration struct tags improve readability with no functional impact.


566-570: LGTM! Struct definition is clean and appropriate.

The RateLimitSimpleOverride struct correctly defines the per-key override configuration with appropriate field types. Configuration schema validation in config.schema.json enforces constraints: rate and burst require minimum values of 1, and period is a required string field, preventing invalid override values at the configuration level.

router/core/ratelimiter.go (4)

38-72: LGTM! Constructor properly handles override initialization.

The override map is correctly trimmed of whitespace keys, converted to redis_rate.Limit values, and stored in the limiter. Empty keys are properly skipped.


85-93: LGTM! Interface and override type are well-designed.

The redisLimiter interface enables clean unit testing with fakes, and RateLimitOverride mirrors the redis_rate.Limit structure appropriately.


119-136: LGTM! Key generation correctly returns both key and suffix.

The updated signature enables tests to verify suffix extraction independently. The suffix represents the raw expression result before being appended to the base key.


138-148: LGTM! Override lookup with fallback is correctly implemented.

The method properly checks the override map first and falls back to the context's default rate limit options.

router/core/ratelimiter_test.go (4)

43-157: LGTM! Existing tests properly updated for new signature.

All test cases correctly capture and assert both the key and suffix values from the updated generateKey return signature. Coverage includes header-derived, claim-derived, and fallback scenarios.


160-248: LGTM! Override tests provide good coverage.

The tests correctly verify:

  • Override is applied when full key matches (including prefix)
  • Fallback to defaults when override key doesn't match
  • Claim-based suffix extraction with overrides

250-282: LGTM! End-to-end test validates override behavior with Redis.

The test correctly verifies that:

  • Override limits (Rate: 1, Burst: 1) take precedence over defaults (Rate: 5, Burst: 5)
  • First request succeeds (uses burst allowance)
  • Second request is rate-limited

Using miniredis.RunT(t) ensures proper cleanup and test isolation.


284-297: LGTM! Clean test double implementation.

The fakeLimiter properly implements the redisLimiter interface and captures all parameters for test assertions.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/pkg/config/config.schema.json (1)

1988-1993: Add description to the override period field for consistency.

The period field in the override object (lines 1988-1993) is missing the description present in the parent simple_strategy.period (lines 1954-1958). Including this description would improve user clarity about the expected format and supported units.

Apply this diff to add the description:

                  "period": {
                    "type": "string",
+                   "description": "The period of time over which the rate limit is enforced. The period is specified as a string with a number and a unit, e.g. 10ms, 1s, 1m, 1h. The supported units are 'ms', 's', 'm', 'h'.",
                    "duration": {
                      "minimum": "1s"
                    }
                  }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b08158 and cc81d94.

📒 Files selected for processing (1)
  • router/pkg/config/config.schema.json (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-07-21T15:06:36.664Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2067
File: router/pkg/config/config.schema.json:1637-1644
Timestamp: 2025-07-21T15:06:36.664Z
Learning: In the Cosmo router project, when extending JSON schema validation for security-sensitive fields like JWKS secrets, backwards compatibility is maintained by implementing warnings in the Go code rather than hard validation constraints in the schema. This allows existing configurations to continue working while alerting users to potential security issues.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-07-21T14:46:34.879Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2067
File: router/pkg/authentication/jwks_token_decoder.go:80-106
Timestamp: 2025-07-21T14:46:34.879Z
Learning: In the Cosmo router project, required field validation for JWKS configuration (Secret, Algorithm, KeyId) is handled at the JSON schema level in config.schema.json rather than through runtime validation in the Go code at router/pkg/authentication/jwks_token_decoder.go.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-06-30T20:39:02.387Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.387Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-07-30T09:29:46.660Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2090
File: router/pkg/config/config.schema.json:0-0
Timestamp: 2025-07-30T09:29:46.660Z
Learning: The "operation_name_trim_limit" configuration property in router/pkg/config/config.schema.json should be placed at the security level as a sibling to complexity_limits, not inside the complexity_limits object.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router/pkg/config/config.schema.json

@rajendersaini
Copy link
Author

Example config
local-rate-limit.config.yaml

@rajendersaini
Copy link
Author

End to end test file
test-rate-limit.sh

@rajendersaini rajendersaini changed the title Feat:2395 rate limiter override local test Feat:2395 Per-key rate-limit overrides: configure distinct rate, burst, and period for specific keys/subgraphs; applied at runtime. Dec 12, 2025
@rajendersaini
Copy link
Author

wundergraph/cosmo-docs#211 - docs

@github-actions
Copy link

github-actions bot commented Jan 1, 2026

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jan 1, 2026
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jan 16, 2026
@StarpTech StarpTech reopened this Feb 8, 2026
@github-actions github-actions bot removed the Stale label Feb 9, 2026
@rajbayer
Copy link

@StarpTech we are looking fwd for this MR to get approved.

@endigma
Copy link
Member

endigma commented Feb 18, 2026

Hi @rajbayer, is there an issue you can link for the problem this is solving?

I'd encourage you to read through our CONTRIBUTING.md as well.

In particular, we encourage contributors to create an issue to discuss their proposed changes ahead of time, or in some simple cases include a good PR description describing the issue, the proposed solution, alternatives, etc.

Without this, it's difficult to determine the intention of the PR and if the proposed solution is the best way to solve the underlying issue/use case.

@rajbayer
Copy link

rajbayer commented Feb 18, 2026

@endigma - I did open issue for this long ago - #2395. There were also some issues already there for this as well . #1996. But seems like no traction on this issue.

@endigma
Copy link
Member

endigma commented Feb 24, 2026

@rajendersaini I see, perhaps you could expand the PR description with an example of using the new feature and the problems it solves? If we add new features, it's best if we internally have a good understanding of the approach, tradeoffs, considerations that went into merging it, even if it comes from an OSS contributor.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Mar 11, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 15, 2026

CodeRabbit chat interactions are restricted to organization members for this repository. Ask an organization member to interact with CodeRabbit, or set chat.allow_non_org_members: true in your configuration.

@rajendersaini rajendersaini requested a review from SkArchon as a code owner March 15, 2026 17:32
@rajendersaini
Copy link
Author

rajendersaini commented Mar 15, 2026

@endigma Thanks for taking the time to review this. I've updated the PR description with a detailed example, design rationale, and trade-off notes as requested. The corresponding docs update is linked and ready to merge alongside this.

A few points worth highlighting:

  • Zero architectural changes — the override lookup is a single map read bolted onto the existing limitFor path; no new middleware, no new Redis data structures, no behavioural change when overrides is omitted.
  • No impact on existing users — the field is optional (omitempty), so current configs continue to work identically.
  • Small surface area — the entire feature is ~50 lines of production Go plus config/schema additions, with full unit and integration test coverage.

Despite being a small change, per-key overrides unlock a commonly requested capability — tiered rate limiting by client identity — without requiring operators to deploy separate router instances or external proxy rules. Happy to address any further feedback.

- Add overrides map to simple_strategy config for per-key rate/burst/period
- Validate override values (rate, burst, period > 0) at startup
- Add additionalProperties:false to schema override objects
- Revert unrelated whitespace reformatting in config.go and graph_server.go
- Add unit tests for override validation, matching, fallback, and e2e
@github-actions github-actions bot removed the Stale label Mar 16, 2026
@rajendersaini
Copy link
Author

@endigma - did you see the update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants