Add near-complete Codex VSCode Support, full OAI Responses bridge#3
Add near-complete Codex VSCode Support, full OAI Responses bridge#3michaelw9999 wants to merge 2 commits into
Conversation
Codex CLI compatibility: - Skip non-function tool types (web_search, code_interpreter) - Merge developer/system messages into position 0 for Qwen templates - Strip Responses-only request keys (store, include, prompt_cache_key) - Restore refusal content type handling Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted): - Add 24 missing Response object fields per OpenAI spec - Fix function_call id/call_id field mapping - Add sequence_number, output_index, content_index to ALL streaming events - Full response object in response.created/in_progress events - Accept input_text type and EasyInputMessage for multi-turn input - output_text convenience field, output_tokens_details 14 pytest tests, E2E tested with async OpenAI SDK and Codex CLI. Refs: ggml-org#19138, ggml-org#19720, ggml-org#21174
Cherry-pick of ggml-org#20819 by European-tech. Persist context checkpoints in a companion .checkpoints file alongside slot saves. Without this, restoring a slot for hybrid/recurrent models triggers full prompt reprocessing (23s for 26K tokens on Qwen3.5-27B). With checkpoint persistence, restore takes 75ms. Binary format with magic 0x4C4C4350 ("LLCP"), versioned, backward compatible (old saves without companion file load normally).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d72b0819db
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| {"sequence_number", seq_num++}, | ||
| {"output_index", output_idx++}, | ||
| {"item", json { | ||
| {"id", oai_resp_fc_item_id}, |
There was a problem hiding this comment.
Emit a fresh function-call item id for each added tool call
server_task_result_cmpl_partial::update() only assigns state.oai_resp_fc_item_id after snapshotting state into the chunk fields, so to_json_oaicompat_resp() can emit response.output_item.added with {"id": oai_resp_fc_item_id} from the previous value (often empty on the first streamed tool call). This makes streamed response.function_call_arguments.delta.item_id/final output_item.done.item.id inconsistent with the announced item, which breaks clients that stitch function-call argument deltas by item_id.
Useful? React with 👍 / 👎.
| if (checkpoints.empty()) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Remove stale checkpoint sidecar when no checkpoints exist
When checkpoints is empty, slot_checkpoints_save() returns without touching <filepath>.checkpoints, so reusing the same save filename can leave an old sidecar file behind. A later restore will then load stale checkpoint metadata for a different KV snapshot, which can trigger invalid recurrent-state restore attempts or unnecessary full prompt reprocessing.
Useful? React with 👍 / 👎.
Things brings in automatic compaction,
web_searchandfile_searchand is super easy to configure, for example:For the automatic compaction to work, you must set
model_context_windowandmodel_auto_compact_token_limit. Summary boxes and clickable diffs with the undo button ususally needmodel_supports_reasoning_summaries = trueandmodel_reasoning_summary = "auto".Just install tavily (but shell command is
tvly) andrgor any other preferred web search MCP or file search/locator tool, it will wrap it through the shell and integrate it more natively and intuitively. If left out, it will hide these tools from the model.