Skip to content

Bridge offline sandboxes to Verifiers interception#1507

Open
xeophon wants to merge 11 commits into
v1-sandbox-runtime-scalabilityfrom
codex/offline-sandbox-host-programs
Open

Bridge offline sandboxes to Verifiers interception#1507
xeophon wants to merge 11 commits into
v1-sandbox-runtime-scalabilityfrom
codex/offline-sandbox-host-programs

Conversation

@xeophon

@xeophon xeophon commented May 31, 2026

Copy link
Copy Markdown
Member

Summary

  • replace the host-side program.sandbox=false approach with a sandbox-local interception bridge on 127.0.0.1:13131
  • forward sandbox HTTP requests back into the existing Verifiers rollout endpoint, so /v1/..., /vf/tools, /vf/user, and /vf/stop all reuse the normal interception path
  • keep program.post_setup as a sandbox-only phase after setup and before rollout state/channel setup, so offline images can install an agent first and then write bridge-aware runtime config
  • remove the endpoint tunnel mutation and host command VF_SANDBOX_ID path from the final tree

Validation

  • uv run ruff check --fix .
  • uv run ruff format
  • uv run pytest tests/test_v1_runtime_lifecycle.py
  • uv run pytest tests/test_v1_harbor_cli.py
  • focused real-sandbox bridge coverage: callable tool interception and MCP proxy communication both pass

Note: the pre-existing local uv.lock change is still not included.

Note

Bridge offline sandboxes to Verifier endpoint interception via in-sandbox HTTP proxy

  • Introduces sandbox_interception_bridge in sandbox_utils.py, an async context manager that uploads and starts a lightweight HTTP proxy inside the sandbox, rewrites state endpoint URLs to a local bridge address for the duration, and tears down on exit.
  • Adds run_sandbox_bridge_forwarder and forward_sandbox_bridge_request to poll the sandbox request directory and relay HTTP requests to the host endpoint server, returning responses back into the sandbox.
  • Adds post_setup field to ProgramConfig and related utilities, executed after setup and before state input is uploaded, allowing additional in-sandbox preparation steps.
  • Reworks Runtime.teardown so failed sandbox deletions retain their leases and keep the client and runtime live for retry; only successful deletions remove leases and close the client.
  • Risk: the bridge introduces an async forwarder loop and file-based request/response passing inside the sandbox, adding latency and a new failure mode if the proxy or forwarder stalls.

Macroscope summarized 86802ee.


Note

High Risk
Changes sandbox networking, endpoint URLs seen by programs, and teardown/registry behavior for failed deletes—security- and reliability-sensitive execution paths.

Overview
Adds program.post_setup, a sandbox-only command phase that runs after setup and before rollout state is uploaded, so agents can be installed first and then configured for the bridge.

Sandboxed command and Python programs now run inside sandbox_interception_bridge: an in-sandbox HTTP proxy on 127.0.0.1:13131 with host-side forwarding into the normal rollout interception server. Harness passes endpoint and defers prepare_program until the bridge is active, so OPENAI_BASE_URL (and related env) inside setup/command point at the local bridge URL instead of relying on Endpoint(use_tunnel=...).

Runtime teardown keeps sandbox leases and the registered runtime when sandbox delete fails after retries, allowing a later teardown to succeed. Docs and tests cover ordering, bridge env, and delete retry behavior.

Reviewed by Cursor Bugbot for commit 6cfcd27. Bugbot is set up for automated code reviews on this repo. Configure here.

Fixes VER-108

Stack

  1. v1-sandbox-runtime-scalability
  2. Bridge offline sandboxes to Verifiers interception #1507 👈 current

Comment thread verifiers/v1/harness.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f702dd6af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/utils/program_utils.py Outdated
@macroscopeapp

macroscopeapp Bot commented May 31, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

2 blocking correctness issues found. This PR introduces a new sandbox interception bridge feature with ~230+ lines of new runtime code. Multiple unresolved review comments identify substantive bugs including state restoration issues and incorrect bridge activation conditions that warrant human review before merging.

You can customize Macroscope's approvability policy. Learn more.

Comment thread verifiers/v1/harness.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 175a1a2fc9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/harness.py Outdated
Comment thread verifiers/v1/utils/sandbox_utils.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c6ee8c7c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/harness.py Outdated
Comment thread verifiers/v1/harness.py
@xeophon xeophon changed the title Support host-side agents for offline sandboxes Bridge offline sandboxes to Verifiers interception May 31, 2026
Comment thread verifiers/v1/utils/program_utils.py
Comment thread verifiers/v1/utils/sandbox_utils.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e72b8e85e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if workdir:
await lease.client.execute_command(
lease.id, f"mkdir -p {shlex.quote(workdir)}"
async with sandbox_interception_bridge(lease, endpoint, state):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid forcing the bridge for commands that do not use it

When a sandboxed command runs through the normal harness path, run_intercepted_program has already registered an endpoint, so this unconditionally starts the bridge for every sandbox command. start_sandbox_bridge_proxy() then requires python3 or python inside the sandbox before the user command runs; a simple command using a custom non-Python image (for example an Alpine/utility image with no Python and no MCP/model calls) now fails during bridge startup even though the command itself does not need endpoint forwarding. Only start the bridge when the program/setup actually needs the intercepted endpoint, or keep a fallback path for non-Python sandboxes.

Useful? React with 👍 / 👎.

@xeophon xeophon force-pushed the codex/offline-sandbox-host-programs branch from e72b8e8 to c2bc229 Compare May 31, 2026 20:19
Comment thread docs/byo-harness.md
Comment thread verifiers/v1/utils/sandbox_utils.py Outdated
Comment thread verifiers/v1/utils/sandbox_utils.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2bc229ee4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread verifiers/v1/utils/sandbox_utils.py Outdated
Comment on lines +490 to +491
state["endpoint_root_url"] = f"http://127.0.0.1:{BRIDGE_PORT}/rollout/{rollout_key}"
state["endpoint_base_url"] = f"{state['endpoint_root_url']}/v1"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep host setup handlers on the host endpoint

When a sandboxed rollout has any regular harness/taskset @setup handler that uses state.get_client() or endpoint_root_url, this rewrites the shared state to the sandbox-local bridge before runtime.setup_rollout(...) runs. Those host-side setup handlers then try to connect to 127.0.0.1:13131 on the host, but the bridge proxy is listening inside the sandbox, so endpoint calls that worked before fail; scope the bridge URL override to the sandbox program setup/command path rather than all setup handlers.

Useful? React with 👍 / 👎.

@xeophon xeophon changed the base branch from main to v1-sandbox-runtime-scalability June 2, 2026 19:30
Comment thread verifiers/v1/utils/sandbox_utils.py Outdated
raw_request = await lease.read_file(request_path)
if hasattr(raw_request, "content"):
raw_request = raw_request.content
request = json.loads(str(raw_request))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium utils/sandbox_utils.py:679

When raw_request is bytes (after .content extraction on line 678), calling str(raw_request) on line 679 produces a string like "b'{...}'" rather than decoding the bytes. This causes json.loads to throw a JSONDecodeError or parse incorrectly. Consider decoding bytes explicitly with .decode('utf-8') instead of using str().

-    request = json.loads(str(raw_request))
+    if isinstance(raw_request, bytes):
+        raw_request = raw_request.decode('utf-8')
+    request = json.loads(raw_request)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/utils/sandbox_utils.py around line 679:

When `raw_request` is `bytes` (after `.content` extraction on line 678), calling `str(raw_request)` on line 679 produces a string like `"b'{...}'"` rather than decoding the bytes. This causes `json.loads` to throw a `JSONDecodeError` or parse incorrectly. Consider decoding bytes explicitly with `.decode('utf-8')` instead of using `str()`.

Evidence trail:
verifiers/v1/utils/sandbox_utils.py lines 670-683 (REVIEWED_COMMIT): forward_sandbox_bridge_request function showing lines 677-679 where .content (bytes) is extracted then passed through str() to json.loads. verifiers/v1/utils/sandbox_utils.py line 275 and 129: read_file returns object. verifiers/v1/utils/sandbox_utils.py line 316-329: call_sandbox_client passes through arbitrary return types. Python docs: str(bytes_object) returns repr-like "b'...'" not decoded text; json.loads accepts bytes directly since Python 3.6.

Comment thread verifiers/v1/utils/sandbox_utils.py Outdated
Comment on lines +618 to +621
if isinstance(original_root, str):
state["endpoint_root_url"] = original_root
if isinstance(original_base, str):
state["endpoint_base_url"] = original_base

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High utils/sandbox_utils.py:618

When state["endpoint_root_url"] or state["endpoint_base_url"] are None (or absent) on entry, the finally block at lines 618-621 fails to restore them. The isinstance(original_root, str) checks skip restoration when the original value is None, leaving the temporary bridge URLs (http://127.0.0.1:{BRIDGE_PORT}/...) in state after the context manager exits. Consider unconditionally restoring the original values so callers see consistent state.

-        if isinstance(original_root, str):
-            state["endpoint_root_url"] = original_root
-        if isinstance(original_base, str):
-            state["endpoint_base_url"] = original_base
+        state["endpoint_root_url"] = original_root
+        state["endpoint_base_url"] = original_base
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/utils/sandbox_utils.py around lines 618-621:

When `state["endpoint_root_url"]` or `state["endpoint_base_url"]` are `None` (or absent) on entry, the `finally` block at lines 618-621 fails to restore them. The `isinstance(original_root, str)` checks skip restoration when the original value is `None`, leaving the temporary bridge URLs (`http://127.0.0.1:{BRIDGE_PORT}/...`) in state after the context manager exits. Consider unconditionally restoring the original values so callers see consistent state.

Evidence trail:
verifiers/v1/utils/sandbox_utils.py lines 560-561 (original values captured via state.get, can be None), lines 604-607 (temporary bridge URLs written to state), lines 618-621 (isinstance(original_root, str) guard skips restoration when original is None)


if __name__ == "__main__":
main()
"""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bridge proxy source embedded as raw string literal

Low Severity

The BRIDGE_PROXY_SOURCE is an ~90-line Python HTTP server embedded as a raw string literal inside sandbox_utils.py. This makes the proxy code invisible to linters, type checkers, and test tooling. A separate .py file read via importlib.resources (the same pattern already used in this module for other resource files) would allow the proxy code to be linted, tested, and maintained independently.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f189a3d. Configure here.

return await call_sandbox_client(
self.client.read_file,
sandbox_id=self.id,
file_path=path,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate keyword args passed to read_file client

Low Severity

SandboxLease.read_file passes both file_path=path and path=path to call_sandbox_client. When the underlying client method accepts **kwargs, both keys are forwarded. If the client implementation interprets file_path and path as distinct parameters (e.g., a path vs. a remote file path), this could silently produce incorrect behavior. Only one canonical name matching the SandboxClient protocol (path) is needed; the fallback file_path turns the call into a guessing game.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit f189a3d. Configure here.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cfcd27763

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +661 to +662
if pending:
await asyncio.gather(*pending)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cancel bridge requests when the sandbox command exits

If the sandboxed program exits while a bridge request is still in flight (for example it starts a background model/tool call, or the host endpoint stalls), stopping the bridge waits for all pending forwarders instead of cancelling them. Since each forwarder uses an unbounded httpx.Timeout(None), rollout cleanup can hang indefinitely after the command has already finished; the stop path should cancel or bound these pending bridge tasks.

Useful? React with 👍 / 👎.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 86802ee. Configure here.

f"http://127.0.0.1:{endpoint.server.port}{path}",
headers=headers,
content=body,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New httpx client created per forwarded bridge request

Low Severity

forward_sandbox_bridge_request creates a fresh httpx.AsyncClient (and thus a fresh connection pool and TCP connection) for every single intercepted request. Since the destination is always the same local interception server (127.0.0.1:{port}), a single shared client would avoid repeated connection setup overhead, especially during multi-turn rollouts with many model calls.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 86802ee. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant