Skip to content

Conversation

@snimu
Copy link
Contributor

@snimu snimu commented Feb 11, 2026

Description

Quick fix for missing reasoning content in GLM 4.7-Flash.

Normalize reasoning from provider-specific fields (reasoning_content, reasoning) into tags in content, and strip them on outbound requests to providers that use separate reasoning fields.

  • New verifiers/utils/reasoning_utils.py with pure utility functions
  • Inbound: extract reasoning from responses, prepend as tags
  • Outbound: strip tags from assistant messages for providers that use reasoning_content/reasoning fields
  • Auto-detect reasoning format from first response
  • Accept reasoning-only responses in validation (no EmptyModelResponseError)

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Note

Medium Risk
Touches core request/response handling and validation for model calls; behavior changes depend on provider response shape and could affect prompts or error handling if detection/stripping misfires.

Overview
Normalizes provider-specific reasoning fields into a consistent <think>...</think> prefix on inbound chat responses, so models that return reasoning_content/reasoning no longer drop reasoning.

Adds reasoning_format handling to Environment: strips <think> blocks from outbound assistant messages for providers that expect separate reasoning fields, auto-detects the format from the first response, and relaxes response validation to accept reasoning-only replies. Includes a new reasoning_utils module and comprehensive unit/integration tests covering extraction, normalization, stripping, and provider message prep.

Written by Cursor Bugbot for commit fcf95fa. This will update automatically on new commits. Configure here.

Normalize reasoning from provider-specific fields (reasoning_content,
reasoning) into <think> tags in content, and strip them on outbound
requests to providers that use separate reasoning fields.

- New verifiers/utils/reasoning_utils.py with pure utility functions
- Inbound: extract reasoning from responses, prepend as <think> tags
- Outbound: strip <think> tags from assistant messages for providers
  that use reasoning_content/reasoning fields
- Auto-detect reasoning format from first response
- Accept reasoning-only responses in validation (no EmptyModelResponseError)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

f"Model returned {len(response.choices)} choices, expected 1"
)
if isinstance(response.choices[0], Choice):
if not (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think tags not stripped for interleaved rollout path

Medium Severity

prepare_messages_for_provider is called in get_model_response_with_messages to strip <think> tags from assistant messages before sending to providers that use separate reasoning fields, but get_model_response_with_tokens (used when interleaved_rollouts=True and trajectory is non-empty) sends prompt directly as messages without any such stripping. In multi-turn interleaved rollouts with providers using reasoning_content/reasoning, assistant messages will still contain <think> tags the provider doesn't expect.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant