Bridge offline sandboxes to Verifiers interception#1507
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4f702dd6af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ApprovabilityVerdict: Needs human review 2 blocking correctness issues found. This PR introduces a new sandbox interception bridge feature with ~230+ lines of new runtime code. Multiple unresolved review comments identify substantive bugs including state restoration issues and incorrect bridge activation conditions that warrant human review before merging. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 175a1a2fc9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c6ee8c7c8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e72b8e85e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if workdir: | ||
| await lease.client.execute_command( | ||
| lease.id, f"mkdir -p {shlex.quote(workdir)}" | ||
| async with sandbox_interception_bridge(lease, endpoint, state): |
There was a problem hiding this comment.
Avoid forcing the bridge for commands that do not use it
When a sandboxed command runs through the normal harness path, run_intercepted_program has already registered an endpoint, so this unconditionally starts the bridge for every sandbox command. start_sandbox_bridge_proxy() then requires python3 or python inside the sandbox before the user command runs; a simple command using a custom non-Python image (for example an Alpine/utility image with no Python and no MCP/model calls) now fails during bridge startup even though the command itself does not need endpoint forwarding. Only start the bridge when the program/setup actually needs the intercepted endpoint, or keep a fallback path for non-Python sandboxes.
Useful? React with 👍 / 👎.
e72b8e8 to
c2bc229
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2bc229ee4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| state["endpoint_root_url"] = f"http://127.0.0.1:{BRIDGE_PORT}/rollout/{rollout_key}" | ||
| state["endpoint_base_url"] = f"{state['endpoint_root_url']}/v1" |
There was a problem hiding this comment.
Keep host setup handlers on the host endpoint
When a sandboxed rollout has any regular harness/taskset @setup handler that uses state.get_client() or endpoint_root_url, this rewrites the shared state to the sandbox-local bridge before runtime.setup_rollout(...) runs. Those host-side setup handlers then try to connect to 127.0.0.1:13131 on the host, but the bridge proxy is listening inside the sandbox, so endpoint calls that worked before fail; scope the bridge URL override to the sandbox program setup/command path rather than all setup handlers.
Useful? React with 👍 / 👎.
| raw_request = await lease.read_file(request_path) | ||
| if hasattr(raw_request, "content"): | ||
| raw_request = raw_request.content | ||
| request = json.loads(str(raw_request)) |
There was a problem hiding this comment.
🟡 Medium utils/sandbox_utils.py:679
When raw_request is bytes (after .content extraction on line 678), calling str(raw_request) on line 679 produces a string like "b'{...}'" rather than decoding the bytes. This causes json.loads to throw a JSONDecodeError or parse incorrectly. Consider decoding bytes explicitly with .decode('utf-8') instead of using str().
- request = json.loads(str(raw_request))
+ if isinstance(raw_request, bytes):
+ raw_request = raw_request.decode('utf-8')
+ request = json.loads(raw_request)🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/utils/sandbox_utils.py around line 679:
When `raw_request` is `bytes` (after `.content` extraction on line 678), calling `str(raw_request)` on line 679 produces a string like `"b'{...}'"` rather than decoding the bytes. This causes `json.loads` to throw a `JSONDecodeError` or parse incorrectly. Consider decoding bytes explicitly with `.decode('utf-8')` instead of using `str()`.
Evidence trail:
verifiers/v1/utils/sandbox_utils.py lines 670-683 (REVIEWED_COMMIT): forward_sandbox_bridge_request function showing lines 677-679 where .content (bytes) is extracted then passed through str() to json.loads. verifiers/v1/utils/sandbox_utils.py line 275 and 129: read_file returns object. verifiers/v1/utils/sandbox_utils.py line 316-329: call_sandbox_client passes through arbitrary return types. Python docs: str(bytes_object) returns repr-like "b'...'" not decoded text; json.loads accepts bytes directly since Python 3.6.
…into codex/offline-sandbox-host-programs
| if isinstance(original_root, str): | ||
| state["endpoint_root_url"] = original_root | ||
| if isinstance(original_base, str): | ||
| state["endpoint_base_url"] = original_base |
There was a problem hiding this comment.
🟠 High utils/sandbox_utils.py:618
When state["endpoint_root_url"] or state["endpoint_base_url"] are None (or absent) on entry, the finally block at lines 618-621 fails to restore them. The isinstance(original_root, str) checks skip restoration when the original value is None, leaving the temporary bridge URLs (http://127.0.0.1:{BRIDGE_PORT}/...) in state after the context manager exits. Consider unconditionally restoring the original values so callers see consistent state.
- if isinstance(original_root, str):
- state["endpoint_root_url"] = original_root
- if isinstance(original_base, str):
- state["endpoint_base_url"] = original_base
+ state["endpoint_root_url"] = original_root
+ state["endpoint_base_url"] = original_base🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file verifiers/v1/utils/sandbox_utils.py around lines 618-621:
When `state["endpoint_root_url"]` or `state["endpoint_base_url"]` are `None` (or absent) on entry, the `finally` block at lines 618-621 fails to restore them. The `isinstance(original_root, str)` checks skip restoration when the original value is `None`, leaving the temporary bridge URLs (`http://127.0.0.1:{BRIDGE_PORT}/...`) in state after the context manager exits. Consider unconditionally restoring the original values so callers see consistent state.
Evidence trail:
verifiers/v1/utils/sandbox_utils.py lines 560-561 (original values captured via state.get, can be None), lines 604-607 (temporary bridge URLs written to state), lines 618-621 (isinstance(original_root, str) guard skips restoration when original is None)
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
| """ |
There was a problem hiding this comment.
Bridge proxy source embedded as raw string literal
Low Severity
The BRIDGE_PROXY_SOURCE is an ~90-line Python HTTP server embedded as a raw string literal inside sandbox_utils.py. This makes the proxy code invisible to linters, type checkers, and test tooling. A separate .py file read via importlib.resources (the same pattern already used in this module for other resource files) would allow the proxy code to be linted, tested, and maintained independently.
Reviewed by Cursor Bugbot for commit f189a3d. Configure here.
| return await call_sandbox_client( | ||
| self.client.read_file, | ||
| sandbox_id=self.id, | ||
| file_path=path, |
There was a problem hiding this comment.
Duplicate keyword args passed to read_file client
Low Severity
SandboxLease.read_file passes both file_path=path and path=path to call_sandbox_client. When the underlying client method accepts **kwargs, both keys are forwarded. If the client implementation interprets file_path and path as distinct parameters (e.g., a path vs. a remote file path), this could silently produce incorrect behavior. Only one canonical name matching the SandboxClient protocol (path) is needed; the fallback file_path turns the call into a guessing game.
Reviewed by Cursor Bugbot for commit f189a3d. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6cfcd27763
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if pending: | ||
| await asyncio.gather(*pending) |
There was a problem hiding this comment.
Cancel bridge requests when the sandbox command exits
If the sandboxed program exits while a bridge request is still in flight (for example it starts a background model/tool call, or the host endpoint stalls), stopping the bridge waits for all pending forwarders instead of cancelling them. Since each forwarder uses an unbounded httpx.Timeout(None), rollout cleanup can hang indefinitely after the command has already finished; the stop path should cancel or bound these pending bridge tasks.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 86802ee. Configure here.
| f"http://127.0.0.1:{endpoint.server.port}{path}", | ||
| headers=headers, | ||
| content=body, | ||
| ) |
There was a problem hiding this comment.
New httpx client created per forwarded bridge request
Low Severity
forward_sandbox_bridge_request creates a fresh httpx.AsyncClient (and thus a fresh connection pool and TCP connection) for every single intercepted request. Since the destination is always the same local interception server (127.0.0.1:{port}), a single shared client would avoid repeated connection setup overhead, especially during multi-turn rollouts with many model calls.
Reviewed by Cursor Bugbot for commit 86802ee. Configure here.


Summary
program.sandbox=falseapproach with a sandbox-local interception bridge on127.0.0.1:13131/v1/...,/vf/tools,/vf/user, and/vf/stopall reuse the normal interception pathprogram.post_setupas a sandbox-only phase aftersetupand before rollout state/channel setup, so offline images can install an agent first and then write bridge-aware runtime configVF_SANDBOX_IDpath from the final treeValidation
uv run ruff check --fix .uv run ruff formatuv run pytest tests/test_v1_runtime_lifecycle.pyuv run pytest tests/test_v1_harbor_cli.pyNote: the pre-existing local
uv.lockchange is still not included.Note
Bridge offline sandboxes to Verifier endpoint interception via in-sandbox HTTP proxy
sandbox_interception_bridgein sandbox_utils.py, an async context manager that uploads and starts a lightweight HTTP proxy inside the sandbox, rewrites state endpoint URLs to a local bridge address for the duration, and tears down on exit.run_sandbox_bridge_forwarderandforward_sandbox_bridge_requestto poll the sandbox request directory and relay HTTP requests to the host endpoint server, returning responses back into the sandbox.post_setupfield toProgramConfigand related utilities, executed aftersetupand before state input is uploaded, allowing additional in-sandbox preparation steps.Runtime.teardownso failed sandbox deletions retain their leases and keep the client and runtime live for retry; only successful deletions remove leases and close the client.Macroscope summarized 86802ee.
Note
High Risk
Changes sandbox networking, endpoint URLs seen by programs, and teardown/registry behavior for failed deletes—security- and reliability-sensitive execution paths.
Overview
Adds
program.post_setup, a sandbox-only command phase that runs aftersetupand before rollout state is uploaded, so agents can be installed first and then configured for the bridge.Sandboxed command and Python programs now run inside
sandbox_interception_bridge: an in-sandbox HTTP proxy on127.0.0.1:13131with host-side forwarding into the normal rollout interception server.Harnesspassesendpointand defersprepare_programuntil the bridge is active, soOPENAI_BASE_URL(and related env) inside setup/command point at the local bridge URL instead of relying onEndpoint(use_tunnel=...).Runtime teardown keeps sandbox leases and the registered runtime when sandbox delete fails after retries, allowing a later teardown to succeed. Docs and tests cover ordering, bridge env, and delete retry behavior.
Reviewed by Cursor Bugbot for commit 6cfcd27. Bugbot is set up for automated code reviews on this repo. Configure here.
Fixes VER-108
Stack
v1-sandbox-runtime-scalability