Feat/ai rate limiting standard headers#13049
Open
iakuf wants to merge 4 commits intoapache:masterfrom
Open
Conversation
added 4 commits
February 26, 2026 22:10
When using openai-compatible provider with Anthropic-format endpoints (e.g. DeepSeek's /anthropic/v1/messages), the response returns input_tokens/output_tokens instead of prompt_tokens/completion_tokens. This patch adds fallback support for both field names in both streaming and non-streaming paths, so token usage statistics work correctly regardless of which format the upstream LLM returns. Fixes token stats being 0 when proxying to Anthropic-compatible endpoints.
…uter-compatible rate-limit headers
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: feat(ai-rate-limiting): add
standard_headersoption for OpenAI/OpenRouter-compatible rate-limit headersSummary
Add a
standard_headersboolean option to theai-rate-limitingplugin.When enabled, the plugin emits rate-limit response headers that follow the
OpenAI / OpenRouter convention, allowing IDE extensions (Cursor, Continue, etc.)
to detect quota exhaustion and apply automatic back-off without any custom
client-side configuration.
Issue / Motivation
The current
ai-rate-limitingplugin outputs headers in the format:This format is APISIX-specific and not recognized by popular AI IDE extensions
such as Cursor and Continue. These tools look for the OpenAI/OpenRouter
standard headers:
Without these headers, IDE extensions cannot detect that they are being
rate-limited and will keep retrying immediately, causing a poor developer
experience and wasting quota.
Changes
apisix/plugins/ai-rate-limiting.luaAdded
standard_headersfield to the JSON Schema (boolean, defaultfalse).In
transform_limit_conf(): whenstandard_headersistrue, thelimit_header,remaining_header, andreset_headerfields passed tolimit-countare set to the standard names with a suffix derived fromlimit_strategy:limit_strategytotal_tokensTokensprompt_tokensPromptTokenscompletion_tokensCompletionTokensWhen
standard_headersisfalse(default), the originalX-AI-RateLimit-*-{instance_name}headers are used — fully backwardcompatible.
New / updated files
apisix/plugins/ai-rate-limiting.luat/plugin/ai-rate-limiting-standard-headers.tdocs/en/latest/plugins/ai-rate-limiting.mdTest Cases
The new test file
t/plugin/ai-rate-limiting-standard-headers.tcovers:standard_headers: trueis accepted bycheck_schema.standard_headersdefaults tofalse.standard_headers: truereturns all three
X-RateLimit-*-Tokensheaders with numeric values.X-RateLimit-Remaining-Tokens: 0.prompt_tokenssuffix —limit_strategy: prompt_tokensproducesX-RateLimit-*-PromptTokensheaders.completion_tokenssuffix —limit_strategy: completion_tokensproducesX-RateLimit-*-CompletionTokensheaders.standard_headers: falsestill produces thelegacy
X-AI-RateLimit-*-{instance_name}headers.Running the tests locally
Documentation
See
docs/en/latest/plugins/ai-rate-limiting-standard-headers-patch.mdfor theparameter reference table, configuration example, and sample response headers.
Checklist
standard_headersdefaults tofalse)CHANGELOGentry (to be added before merge)Related