fix(chat): Lazy boot skill sandboxes and simplify session flow#170
Merged
fix(chat): Lazy boot skill sandboxes and simplify session flow#170
Conversation
Defer sandbox boot until tool execution so turns that only load skill instructions or read host-backed skill references avoid startup cost. Preserve sandbox reuse metadata after lazy boot and only use host-backed skill reads before any sandbox state exists, falling through to sandbox for missing or already-materialized skill files. Fixes GH-112 Co-Authored-By: GPT-5 Codex <noreply@openai.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Pass the sandbox acquired during keepalive setup into tool executor initialization so the first tool call does not resync sandbox files a second time. Add a unit regression that proves the initial bash execution only performs one sandbox file sync. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Split sandbox lifecycle, skill sync, eval shim, and error classification into focused sandbox modules while keeping the executor surface unchanged. Flatten the session manager control flow so sandbox acquisition and bash execution read top-down instead of mixing state resets, retry branches, and temporary policy handling in one block. Preserve existing sandbox lifecycle behavior while making the remaining complexity live in smaller, easier-to-read files. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Run sandbox readiness checks before returning cached tool executors so each tool execution still extends keepalive and can recreate expired sandboxes. Add focused executor regressions for per-execution keepalive extension and cached sandbox recovery to keep the refactor aligned with the previous runtime behavior. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Tighten the prompt and sandbox tool descriptions so ordinary conversational turns reply directly instead of reaching for sandbox-backed tools by default. Keep the lazy sandbox path intact while making bash, file, and attach-file usage read as opt-in for real workspace tasks rather than generic evidence gathering. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Restore the previous sandbox tool guidance and add temporary structured logs at the actual sandbox boot entry points so we can see what requested a sandbox in a live turn. Also make the local agent instructions explicit that logs, spans, and other monitoring output are not behavior contracts and should not be mocked or asserted in ordinary tests. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Add temporary structured logs alongside sandbox status emission so live turns show whether the visible sandbox status came from fresh runtime boot or snapshot boot. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Read sandbox reuse metadata from the live sandbox executor on error returns instead of relying on the lazy workspace callback to update cached locals. This keeps executor-backed sandbox turns from dropping reuse state after the sandbox has already booted. Add a regression covering an executor-backed sandbox boot followed by an error so cross-turn reuse metadata stays intact. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Keep loadSkillFromHost module-private because it is only used by the load skill tool itself. This trims one exported helper without changing behavior. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Wrap the raw Vercel sandbox before exposing it through SandboxExecutor.createSandbox. This keeps the lazy workspace path pinned to the narrower SandboxWorkspace contract and avoids leaking the SDK instance directly. Update the sandbox executor tests to assert delegation through the workspace interface instead of raw object identity. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Invalidate the lazily cached sandbox workspace when the executor switches to a different sandbox. This keeps workspace-backed tools aligned with sandbox recovery during the same turn instead of reusing a stale adapter. Add a regression that exercises a workspace call, sandbox replacement through the executor path, and a second workspace call in the same turn. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 459109e. Configure here.
Capture the lazy workspace cache key from the created workspace instead of the executor's mutable current sandbox id. This keeps concurrent sandbox replacement from leaving respond.ts with a stale workspace bound to the wrong sandbox. Add a regression that starts a workspace-backed tool, replaces the sandbox through the executor path before the workspace resolves, and verifies the next workspace call refreshes correctly. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Avoid re-uploading skill files every time the current turn reuses the same sandbox. Cached in-memory sandboxes are already synced, so use a cheap workspace reachability probe instead and keep full syncs for fresh boots and sandbox-id restores. Keep the stopped-sandbox recovery path covered with focused regressions so cached tool executors still recreate dead sandboxes before reuse. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Keep eval sandboxes deterministic by returning an empty object for unhandled gh api routes instead of falling through to the host gh binary. Cover the behavior by executing the generated stub script directly so future route handling changes keep the same contract. Co-Authored-By: GPT-5 Codex <noreply@openai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Lazy boot sandbox-backed skill tooling and simplify the sandbox session flow.
This keeps the original GH-112 behavior change: sandbox creation is deferred until a tool actually needs it, and host-backed skill reads stay on the host only before a sandbox exists. It also fixes the first tool execution path to reuse the sandbox acquired during tool executor initialization so the first
bash,readFile, orwriteFilecall does not immediately reacquire the sandbox and resync skills.The follow-up refactor moves sandbox lifecycle, skill sync, the eval
ghshim, and sandbox error classification into focused modules. The goal here is not more abstraction. It is to keep the executor surface small and make the remaining snapshot, reuse, and temporary network-policy logic readable enough to change safely.I considered leaving the lifecycle code in one file and only patching the sandbox reuse bug, but the old handler had become hard to reason about because dispatch, lifecycle, syncing, and error wrapping were mixed together. Splitting the code along those existing responsibilities keeps the interfaces tight without changing the public sandbox executor contract.
Validated with targeted Vitest coverage for lazy boot, sandbox execution, respond error paths, and progressive loading, plus package typecheck and oxlint.
Fixes GH-112