Skip to content

fix(chat): Lazy boot skill sandboxes and simplify session flow#170

Merged
dcramer merged 14 commits intomainfrom
codex/lazy-sandbox-skill-loading
Apr 9, 2026
Merged

fix(chat): Lazy boot skill sandboxes and simplify session flow#170
dcramer merged 14 commits intomainfrom
codex/lazy-sandbox-skill-loading

Conversation

@dcramer
Copy link
Copy Markdown
Member

@dcramer dcramer commented Apr 9, 2026

Lazy boot sandbox-backed skill tooling and simplify the sandbox session flow.

This keeps the original GH-112 behavior change: sandbox creation is deferred until a tool actually needs it, and host-backed skill reads stay on the host only before a sandbox exists. It also fixes the first tool execution path to reuse the sandbox acquired during tool executor initialization so the first bash, readFile, or writeFile call does not immediately reacquire the sandbox and resync skills.

The follow-up refactor moves sandbox lifecycle, skill sync, the eval gh shim, and sandbox error classification into focused modules. The goal here is not more abstraction. It is to keep the executor surface small and make the remaining snapshot, reuse, and temporary network-policy logic readable enough to change safely.

I considered leaving the lifecycle code in one file and only patching the sandbox reuse bug, but the old handler had become hard to reason about because dispatch, lifecycle, syncing, and error wrapping were mixed together. Splitting the code along those existing responsibilities keeps the interfaces tight without changing the public sandbox executor contract.

Validated with targeted Vitest coverage for lazy boot, sandbox execution, respond error paths, and progressive loading, plus package typecheck and oxlint.

Fixes GH-112

Defer sandbox boot until tool execution so turns that only load skill instructions or read host-backed skill references avoid startup cost.

Preserve sandbox reuse metadata after lazy boot and only use host-backed skill reads before any sandbox state exists, falling through to sandbox for missing or already-materialized skill files.

Fixes GH-112
Co-Authored-By: GPT-5 Codex <noreply@openai.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
junior-docs Ready Ready Preview, Comment Apr 9, 2026 6:58pm

Request Review

Pass the sandbox acquired during keepalive setup into tool executor initialization so the first tool call does not resync sandbox files a second time.

Add a unit regression that proves the initial bash execution only performs one sandbox file sync.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Split sandbox lifecycle, skill sync, eval shim, and error classification into focused sandbox modules while keeping the executor surface unchanged.

Flatten the session manager control flow so sandbox acquisition and bash execution read top-down instead of mixing state resets, retry branches, and temporary policy handling in one block.

Preserve existing sandbox lifecycle behavior while making the remaining complexity live in smaller, easier-to-read files.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
@dcramer dcramer changed the title fix(chat): Lazy boot sandbox for skill tools fix(chat): Lazy boot skill sandboxes and simplify session flow Apr 9, 2026
Run sandbox readiness checks before returning cached tool executors so each tool execution still extends keepalive and can recreate expired sandboxes.

Add focused executor regressions for per-execution keepalive extension and cached sandbox recovery to keep the refactor aligned with the previous runtime behavior.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Tighten the prompt and sandbox tool descriptions so ordinary conversational turns reply directly instead of reaching for sandbox-backed tools by default.

Keep the lazy sandbox path intact while making bash, file, and attach-file usage read as opt-in for real workspace tasks rather than generic evidence gathering.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Restore the previous sandbox tool guidance and add temporary structured logs at the actual sandbox boot entry points so we can see what requested a sandbox in a live turn.

Also make the local agent instructions explicit that logs, spans, and other monitoring output are not behavior contracts and should not be mocked or asserted in ordinary tests.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Add temporary structured logs alongside sandbox status emission so live turns show whether the visible sandbox status came from fresh runtime boot or snapshot boot.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Read sandbox reuse metadata from the live sandbox executor on error returns instead of relying on the lazy workspace callback to update cached locals. This keeps executor-backed sandbox turns from dropping reuse state after the sandbox has already booted.

Add a regression covering an executor-backed sandbox boot followed by an error so cross-turn reuse metadata stays intact.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Keep loadSkillFromHost module-private because it is only used by the load skill tool itself. This trims one exported helper without changing behavior.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Wrap the raw Vercel sandbox before exposing it through SandboxExecutor.createSandbox. This keeps the lazy workspace path pinned to the narrower SandboxWorkspace contract and avoids leaking the SDK instance directly.

Update the sandbox executor tests to assert delegation through the workspace interface instead of raw object identity.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Invalidate the lazily cached sandbox workspace when the executor switches to a different sandbox. This keeps workspace-backed tools aligned with sandbox recovery during the same turn instead of reusing a stale adapter.

Add a regression that exercises a workspace call, sandbox replacement through the executor path, and a second workspace call in the same turn.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 459109e. Configure here.

Capture the lazy workspace cache key from the created workspace instead of the executor's mutable current sandbox id. This keeps concurrent sandbox replacement from leaving respond.ts with a stale workspace bound to the wrong sandbox.

Add a regression that starts a workspace-backed tool, replaces the sandbox through the executor path before the workspace resolves, and verifies the next workspace call refreshes correctly.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Avoid re-uploading skill files every time the current turn reuses the same sandbox. Cached in-memory sandboxes are already synced, so use a cheap workspace reachability probe instead and keep full syncs for fresh boots and sandbox-id restores.

Keep the stopped-sandbox recovery path covered with focused regressions so cached tool executors still recreate dead sandboxes before reuse.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
Keep eval sandboxes deterministic by returning an empty object for unhandled gh api routes instead of falling through to the host gh binary.

Cover the behavior by executing the generated stub script directly so future route handling changes keep the same contract.

Co-Authored-By: GPT-5 Codex <noreply@openai.com>
@dcramer dcramer merged commit e5ae954 into main Apr 9, 2026
14 checks passed
@dcramer dcramer deleted the codex/lazy-sandbox-skill-loading branch April 9, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lazy-boot sandbox instead of always initializing on startup

1 participant