feat: meter Gemini thinking tokens and grounding requests by Aaryan-Dadu · Pull Request #3178 · HeyPuter/puter

Aaryan-Dadu · 2026-05-28T09:53:11Z

Summary

Thinking tokens: Extracted from standard completion tokens to ensure they are billed accurately at the correct model specific rate.
Grounding requests: Added flat-fee metering for Google Search by tracking grounding_metadata across both streaming and non-streaming responses.
Pricing updates: Corrected stale rates for Gemini 2.5 Flash output, cached tokens, thinking tokens, and grounding requests.

Test

All pre-existing tests pass.
5 unit tests for the corresponding changes have been added
4 pre-existing test assertions updated to include thinking_tokens: 0 and grounding_requests: 0 in the expected usage shapes

- Thinking tokens: Extracted from standard completion tokens to ensure they are billed accurately at the correct model-specific rate. - Grounding requests: Added flat-fee metering for Google Search by tracking grounding_metadata across both streaming and non-streaming responses. - Pricing updates: Corrected stale rates for Gemini 2.5 Flash output, cached tokens, thinking tokens, and grounding requests.

CLAassistant · 2026-05-28T09:53:20Z

All committers have signed the CLA.

ProgrammerIn-wonderland · 2026-05-29T18:48:34Z

are thinking tokens not already included in output tokens in the usage object?

Aaryan-Dadu · 2026-05-29T19:02:54Z

are thinking tokens not already included in output tokens in the usage object?

Yes they are already included but we split them because they are billed at different rates, like this: thinking_rate*thinking_tokens + standard_rate*(completion_tokens - thinking_tokens)

Copilot

Pull request overview

This PR updates Gemini metering in the AI chat driver to correctly account for Gemini “thinking” tokens (billed at a distinct rate) and to add flat-fee metering for grounded Google Search requests by detecting grounding_metadata in both streaming and non-streaming responses.

Changes:

Split Gemini reasoning_tokens (“thinking tokens”) out of completion_tokens and meter each at its own model-specific rate.
Add grounding_requests usage metering (1 per response when grounding_metadata is present) for streaming and non-streaming Gemini completions.
Update Gemini model pricing entries to include thinking_tokens and grounding_requests rates (and refresh some existing token rates).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/backend/drivers/ai-chat/utils/OpenAIUtil.js`	Captures streamed `extra_content` and forwards it into the usage calculator for provider-specific metering.
`src/backend/drivers/ai-chat/providers/gemini/models.ts`	Adds/updates Gemini cost keys for `thinking_tokens` and `grounding_requests` (and adjusts some stale token rates).
`src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.ts`	Implements Gemini-specific usage shaping: cached token exclusion, thinking token split, and grounding request detection.
`src/backend/drivers/ai-chat/providers/gemini/GeminiChatProvider.test.ts`	Updates expected usage shapes and adds unit tests for thinking-token and grounding-request metering (streaming + non-streaming).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                // Cast to access Gemini-specific extras passed alongside usage:
+                // - choices: non-stream grounding metadata lives in choices[0].message.extra_content
+                // - extra_content: streaming grounding metadata accumulated by the stream handler
+                const { usage, choices, extra_content } = args as {


@@ -285,6 +286,7 @@ export const create_chat_stream_handler =
                // Apps have to choose to handle extra_content themselves, it doesn't seem like theres a way we can do it in a backwards
                // compatible fashion since most streaming apps will handle chat history by continuously updating content themselves
                // This doesn't present us a chance to add in an extra object for gemini's chat continuing features
+                last_extra_content = choice.delta.extra_content;


Salazareo · 2026-06-10T00:04:58Z

@ProgrammerIn-wonderland is this mergable?

Salazareo

code itself looks ok, but prices are a bit off, im gonna fix those and merge this

thanks for the contribution!

Salazareo · 2026-06-20T01:28:17Z

            prompt_tokens: 30,
-            completion_tokens: 250,
-            cached_tokens: 3,
+            completion_tokens: 100,


these look off pretty sure they're still 250

https://ai.google.dev/gemini-api/docs/pricing#gemini-2.5-flash

Salazareo · 2026-06-20T01:30:24Z

            completion_tokens: 300,
+            thinking_tokens: 300,
            cached_tokens: 5,
+            grounding_requests: 1_400_000,


these are cheaper on some of these models

Pricing corrections (verified against ai.google.dev/gemini-api/docs/pricing): - gemini-2.5-flash output is $2.50/M, not $1.00/M: restore completion_tokens to 250 and bill thinking_tokens at the same output rate (250). The previous 100 under-billed output ~60%, and this is the provider's default model. - gemini-2.5-flash cache read is $0.03/M: restore cached_tokens to 3 (the 7.5 value over-billed). - Grounding with Google Search is $35 / 1,000 requests for Gemini 2.x models and $14 / 1,000 for 3.x. Set grounding_requests to 3_500_000 for gemini-2.0-flash, gemini-2.5-flash, gemini-2.5-flash-lite and gemini-2.5-pro; 3.x models keep 1_400_000. Streaming robustness: - In create_chat_stream_handler, don't let a later extra_content chunk without grounding_metadata overwrite an earlier one that carried it, so grounding requests are still metered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Aaryan-Dadu mentioned this pull request May 28, 2026

Investigate & possible fix metering for gemini models search and caching #3132

Closed

ProgrammerIn-wonderland self-assigned this May 28, 2026

Salazareo requested a review from Copilot June 7, 2026 01:33

Copilot started reviewing on behalf of Salazareo June 7, 2026 01:33 View session

Copilot AI reviewed Jun 7, 2026

View reviewed changes

Salazareo approved these changes Jun 20, 2026

View reviewed changes

Salazareo merged commit c6d6402 into HeyPuter:main Jun 20, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: meter Gemini thinking tokens and grounding requests#3178

feat: meter Gemini thinking tokens and grounding requests#3178
Salazareo merged 2 commits into
HeyPuter:mainfrom
Aaryan-Dadu:feat/3132

Aaryan-Dadu commented May 28, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 28, 2026 •

edited

Loading

Uh oh!

ProgrammerIn-wonderland commented May 29, 2026

Uh oh!

Aaryan-Dadu commented May 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Salazareo commented Jun 10, 2026

Uh oh!

Salazareo left a comment

Uh oh!

Salazareo Jun 20, 2026

Uh oh!

Salazareo Jun 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Aaryan-Dadu commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test

Uh oh!

CLAassistant commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProgrammerIn-wonderland commented May 29, 2026

Uh oh!

Aaryan-Dadu commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Salazareo commented Jun 10, 2026

Uh oh!

Salazareo left a comment

Choose a reason for hiding this comment

Uh oh!

Salazareo Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Salazareo Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Aaryan-Dadu commented May 28, 2026 •

edited

Loading

CLAassistant commented May 28, 2026 •

edited

Loading

Aaryan-Dadu commented May 29, 2026 •

edited

Loading