-
-
Notifications
You must be signed in to change notification settings - Fork 36
Add configurable reasoning preservation for openai completions #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a94e839 to
1be0ff3
Compare
CHANGELOG.md
Outdated
|
|
||
| ## Unreleased | ||
|
|
||
| - (OpenAI Chat) - Configurable reasoning history via `keepHistoryReasoning` (model-level, default: prune) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this default for all openai-chat is a good idea or if we should have it enabled only for google provider in config.clj, also shouldn't this be a flag by provider and not model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I've found in Opencode and OpenAI SDK it is default for every model.
- OpenCode transform
<think>block to message of type reasoning and this type is ignored by OpenAI SDK and never reach the LLM. - Delta reasoning messages are usually persisted in one turn and then thrown away (as you can see in GLM docs bellow, first vs second diagram)
It shouldn't be by provider because you might want to configure it only for e.g. GLM-4.7 (second diagram here ->
https://docs.z.ai/guides/capabilities/thinking-mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's keep an eye on that if affects any openai-chat model, I still think it's a little bit dangerous move, but if openai sdk is really doing that it may be ok
|
@ericdallo If you approve I will merge it tommorow after fixing integration tests. |
8e3e3f2 to
776c92f
Compare
c2d8af8 to
446bbb1
Compare
Standard behavior: discard reasoning before last user message (based on OpenAI SDK). Model-level config enables preservation (e.g. GLM-4.7 with preserved thinking). Changes: - prune-history: add keep-history-reasoning parameter - Tests: cover both pruning and preservation - Docs: add keepHistoryReasoning to model schema
Introduce a more granular control for reasoning retention in requests: "all" (default, send everything) "turn" (current turn only) "off (discard all) Both delta-reasoning (reasoning_content) and think-tag reasoning are handled uniformly. DB storage is unaffected. Reasoning is always persisted for UI display. This setting only controls what gets sent back to the model.
446bbb1 to
4a4d990
Compare
ericdallo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that looks better!
Discard reasoning before last user message (based on OpenAI SDK).
Model-level config enables preservation (e.g. GLM-4.7 with preserved thinking).
Changes:
prune-history: add keep-history-reasoning parameter
Tests: cover both pruning and preservation
Docs: add keepHistoryReasoning to model schema
I added a entry in changelog under unreleased section.
@ericdallo Subjectively, the behavior seems better now. Since it's difficult to evaluate, I went through the OpenAI SDK with Opencode via Opus and it found that they discard it as well. And I couldn't find any documentation that would suggest we should do otherwise by default.