fix(chat): Lazy boot skill sandboxes and simplify session flow by dcramer · Pull Request #170 · getsentry/junior

dcramer · 2026-04-09T03:46:19Z

Lazy boot sandbox-backed skill tooling and simplify the sandbox session flow.

This keeps the original GH-112 behavior change: sandbox creation is deferred until a tool actually needs it, and host-backed skill reads stay on the host only before a sandbox exists. It also fixes the first tool execution path to reuse the sandbox acquired during tool executor initialization so the first bash, readFile, or writeFile call does not immediately reacquire the sandbox and resync skills.

The follow-up refactor moves sandbox lifecycle, skill sync, the eval gh shim, and sandbox error classification into focused modules. The goal here is not more abstraction. It is to keep the executor surface small and make the remaining snapshot, reuse, and temporary network-policy logic readable enough to change safely.

I considered leaving the lifecycle code in one file and only patching the sandbox reuse bug, but the old handler had become hard to reason about because dispatch, lifecycle, syncing, and error wrapping were mixed together. Splitting the code along those existing responsibilities keeps the interfaces tight without changing the public sandbox executor contract.

Validated with targeted Vitest coverage for lazy boot, sandbox execution, respond error paths, and progressive loading, plus package typecheck and oxlint.

Fixes GH-112

Defer sandbox boot until tool execution so turns that only load skill instructions or read host-backed skill references avoid startup cost. Preserve sandbox reuse metadata after lazy boot and only use host-backed skill reads before any sandbox state exists, falling through to sandbox for missing or already-materialized skill files. Fixes GH-112 Co-Authored-By: GPT-5 Codex <noreply@openai.com>

vercel · 2026-04-09T03:46:30Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
junior-docs	Ready	Preview, Comment	Apr 9, 2026 6:58pm

packages/junior/src/chat/sandbox/sandbox.ts

Pass the sandbox acquired during keepalive setup into tool executor initialization so the first tool call does not resync sandbox files a second time. Add a unit regression that proves the initial bash execution only performs one sandbox file sync. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

Split sandbox lifecycle, skill sync, eval shim, and error classification into focused sandbox modules while keeping the executor surface unchanged. Flatten the session manager control flow so sandbox acquisition and bash execution read top-down instead of mixing state resets, retry branches, and temporary policy handling in one block. Preserve existing sandbox lifecycle behavior while making the remaining complexity live in smaller, easier-to-read files. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

packages/junior/src/chat/sandbox/session.ts

Run sandbox readiness checks before returning cached tool executors so each tool execution still extends keepalive and can recreate expired sandboxes. Add focused executor regressions for per-execution keepalive extension and cached sandbox recovery to keep the refactor aligned with the previous runtime behavior. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

Tighten the prompt and sandbox tool descriptions so ordinary conversational turns reply directly instead of reaching for sandbox-backed tools by default. Keep the lazy sandbox path intact while making bash, file, and attach-file usage read as opt-in for real workspace tasks rather than generic evidence gathering. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

Restore the previous sandbox tool guidance and add temporary structured logs at the actual sandbox boot entry points so we can see what requested a sandbox in a live turn. Also make the local agent instructions explicit that logs, spans, and other monitoring output are not behavior contracts and should not be mocked or asserted in ordinary tests. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

packages/junior/src/chat/sandbox/session.ts

packages/junior/src/chat/tools/skill/load-skill.ts

Add temporary structured logs alongside sandbox status emission so live turns show whether the visible sandbox status came from fresh runtime boot or snapshot boot. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

packages/junior/src/chat/respond.ts

Read sandbox reuse metadata from the live sandbox executor on error returns instead of relying on the lazy workspace callback to update cached locals. This keeps executor-backed sandbox turns from dropping reuse state after the sandbox has already booted. Add a regression covering an executor-backed sandbox boot followed by an error so cross-turn reuse metadata stays intact. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

packages/junior/src/chat/sandbox/sandbox.ts

Keep loadSkillFromHost module-private because it is only used by the load skill tool itself. This trims one exported helper without changing behavior. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

Wrap the raw Vercel sandbox before exposing it through SandboxExecutor.createSandbox. This keeps the lazy workspace path pinned to the narrower SandboxWorkspace contract and avoids leaking the SDK instance directly. Update the sandbox executor tests to assert delegation through the workspace interface instead of raw object identity. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

packages/junior/src/chat/respond.ts

Invalidate the lazily cached sandbox workspace when the executor switches to a different sandbox. This keeps workspace-backed tools aligned with sandbox recovery during the same turn instead of reusing a stale adapter. Add a regression that exercises a workspace call, sandbox replacement through the executor path, and a second workspace call in the same turn. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

packages/junior/src/chat/sandbox/session.ts

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 459109e. Configure here.}

packages/junior/src/chat/respond.ts

Capture the lazy workspace cache key from the created workspace instead of the executor's mutable current sandbox id. This keeps concurrent sandbox replacement from leaving respond.ts with a stale workspace bound to the wrong sandbox. Add a regression that starts a workspace-backed tool, replaces the sandbox through the executor path before the workspace resolves, and verifies the next workspace call refreshes correctly. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

packages/junior/src/chat/sandbox/eval-gh-stub.ts

Avoid re-uploading skill files every time the current turn reuses the same sandbox. Cached in-memory sandboxes are already synced, so use a cheap workspace reachability probe instead and keep full syncs for fresh boots and sandbox-id restores. Keep the stopped-sandbox recovery path covered with focused regressions so cached tool executors still recreate dead sandboxes before reuse. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

Keep eval sandboxes deterministic by returning an empty object for unhandled gh api routes instead of falling through to the host gh binary. Cover the behavior by executing the generated stub script directly so future route handling changes keep the same contract. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

vercel bot deployed to Preview – junior-docs April 9, 2026 03:46 View deployment

dcramer marked this pull request as ready for review April 9, 2026 03:49

sentry bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/sandbox/sandbox.ts Outdated Show resolved Hide resolved

vercel bot deployed to Preview – junior-docs April 9, 2026 05:01 View deployment

dcramer changed the title ~~fix(chat): Lazy boot sandbox for skill tools~~ fix(chat): Lazy boot skill sandboxes and simplify session flow Apr 9, 2026

vercel bot deployed to Preview – junior-docs April 9, 2026 06:01 View deployment

cursor bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/sandbox/session.ts Show resolved Hide resolved

vercel bot deployed to Preview – junior-docs April 9, 2026 06:16 View deployment

vercel bot deployed to Preview – junior-docs April 9, 2026 14:09 View deployment

vercel bot deployed to Preview – junior-docs April 9, 2026 14:17 View deployment

sentry bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/sandbox/session.ts Show resolved Hide resolved

cursor bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/tools/skill/load-skill.ts Show resolved Hide resolved

chore(chat): Trace sandbox status sources

c531006

Add temporary structured logs alongside sandbox status emission so live turns show whether the visible sandbox status came from fresh runtime boot or snapshot boot. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

vercel bot deployed to Preview – junior-docs April 9, 2026 14:29 View deployment

cursor bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/respond.ts Show resolved Hide resolved

sentry bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/sandbox/sandbox.ts Show resolved Hide resolved

vercel bot deployed to Preview – junior-docs April 9, 2026 15:07 View deployment

ref(chat): Reduce load-skill surface area

74edadc

Keep loadSkillFromHost module-private because it is only used by the load skill tool itself. This trims one exported helper without changing behavior. Co-Authored-By: GPT-5 Codex <noreply@openai.com>

vercel bot deployed to Preview – junior-docs April 9, 2026 18:01 View deployment

vercel bot deployed to Preview – junior-docs April 9, 2026 18:07 View deployment

sentry bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/respond.ts Show resolved Hide resolved

vercel bot deployed to Preview – junior-docs April 9, 2026 18:16 View deployment

sentry bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/sandbox/session.ts Outdated Show resolved Hide resolved

cursor bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/respond.ts Show resolved Hide resolved

vercel bot deployed to Preview – junior-docs April 9, 2026 18:49 View deployment

sentry bot reviewed Apr 9, 2026

View reviewed changes

packages/junior/src/chat/sandbox/eval-gh-stub.ts Show resolved Hide resolved

vercel bot deployed to Preview – junior-docs April 9, 2026 18:56 View deployment

vercel bot deployed to Preview – junior-docs April 9, 2026 18:58 View deployment

dcramer merged commit e5ae954 into main Apr 9, 2026
14 checks passed

dcramer deleted the codex/lazy-sandbox-skill-loading branch April 9, 2026 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(chat): Lazy boot skill sandboxes and simplify session flow#170

fix(chat): Lazy boot skill sandboxes and simplify session flow#170
dcramer merged 14 commits intomainfrom
codex/lazy-sandbox-skill-loading

dcramer commented Apr 9, 2026 •

edited

Loading

Uh oh!

vercel bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dcramer commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dcramer commented Apr 9, 2026 •

edited

Loading

vercel bot commented Apr 9, 2026 •

edited

Loading