Bridge offline sandboxes to Verifiers interception by xeophon · Pull Request #1507 · PrimeIntellect-ai/verifiers

xeophon · 2026-05-31T17:56:24Z

Summary

replace the host-side program.sandbox=false approach with a sandbox-local interception bridge on 127.0.0.1:13131
forward sandbox HTTP requests back into the existing Verifiers rollout endpoint, so /v1/..., /vf/tools, /vf/user, and /vf/stop all reuse the normal interception path
keep program.post_setup as a sandbox-only phase after setup and before rollout state/channel setup, so offline images can install an agent first and then write bridge-aware runtime config
remove the endpoint tunnel mutation and host command VF_SANDBOX_ID path from the final tree

Validation

uv run ruff check --fix .
uv run ruff format
uv run pytest tests/test_v1_runtime_lifecycle.py
uv run pytest tests/test_v1_harbor_cli.py
focused real-sandbox bridge coverage: callable tool interception and MCP proxy communication both pass

Note: the pre-existing local uv.lock change is still not included.

Note

Bridge offline sandboxes to Verifier endpoint interception via in-sandbox HTTP proxy

Introduces sandbox_interception_bridge in sandbox_utils.py, an async context manager that uploads and starts a lightweight HTTP proxy inside the sandbox, rewrites state endpoint URLs to a local bridge address for the duration, and tears down on exit.
Adds run_sandbox_bridge_forwarder and forward_sandbox_bridge_request to poll the sandbox request directory and relay HTTP requests to the host endpoint server, returning responses back into the sandbox.
Adds post_setup field to ProgramConfig and related utilities, executed after setup and before state input is uploaded, allowing additional in-sandbox preparation steps.
Reworks Runtime.teardown so failed sandbox deletions retain their leases and keep the client and runtime live for retry; only successful deletions remove leases and close the client.
Risk: the bridge introduces an async forwarder loop and file-based request/response passing inside the sandbox, adding latency and a new failure mode if the proxy or forwarder stalls.

^{Macroscope summarized 86802ee.}

Note

High Risk
Changes sandbox networking, endpoint URLs seen by programs, and teardown/registry behavior for failed deletes—security- and reliability-sensitive execution paths.

Overview
Adds program.post_setup, a sandbox-only command phase that runs after setup and before rollout state is uploaded, so agents can be installed first and then configured for the bridge.

Sandboxed command and Python programs now run inside sandbox_interception_bridge: an in-sandbox HTTP proxy on 127.0.0.1:13131 with host-side forwarding into the normal rollout interception server. Harness passes endpoint and defers prepare_program until the bridge is active, so OPENAI_BASE_URL (and related env) inside setup/command point at the local bridge URL instead of relying on Endpoint(use_tunnel=...).

Runtime teardown keeps sandbox leases and the registered runtime when sandbox delete fails after retries, allowing a later teardown to succeed. Docs and tests cover ordering, bridge env, and delete retry behavior.

^{Reviewed by Cursor Bugbot for commit 6cfcd27. Bugbot is set up for automated code reviews on this repo. Configure here.}

Fixes VER-108

Stack

v1-sandbox-runtime-scalability
Bridge offline sandboxes to Verifiers interception #1507 👈 current

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f702dd6af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

macroscopeapp · 2026-05-31T17:59:49Z

Approvability

Verdict: Needs human review

2 blocking correctness issues found. This PR introduces a new sandbox interception bridge feature with ~230+ lines of new runtime code. Multiple unresolved review comments identify substantive bugs including state restoration issues and incorrect bridge activation conditions that warrant human review before merging.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 175a1a2fc9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c6ee8c7c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e72b8e85e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T20:19:34Z

-        if workdir:
-            await lease.client.execute_command(
-                lease.id, f"mkdir -p {shlex.quote(workdir)}"
+        async with sandbox_interception_bridge(lease, endpoint, state):


Avoid forcing the bridge for commands that do not use it

When a sandboxed command runs through the normal harness path, run_intercepted_program has already registered an endpoint, so this unconditionally starts the bridge for every sandbox command. start_sandbox_bridge_proxy() then requires python3 or python inside the sandbox before the user command runs; a simple command using a custom non-Python image (for example an Alpine/utility image with no Python and no MCP/model calls) now fails during bridge startup even though the command itself does not need endpoint forwarding. Only start the bridge when the program/setup actually needs the intercepted endpoint, or keep a fallback path for non-Python sandboxes.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2bc229ee4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-31T20:23:27Z

+    state["endpoint_root_url"] = f"http://127.0.0.1:{BRIDGE_PORT}/rollout/{rollout_key}"
+    state["endpoint_base_url"] = f"{state['endpoint_root_url']}/v1"


Keep host setup handlers on the host endpoint

When a sandboxed rollout has any regular harness/taskset @setup handler that uses state.get_client() or endpoint_root_url, this rewrites the shared state to the sandbox-local bridge before runtime.setup_rollout(...) runs. Those host-side setup handlers then try to connect to 127.0.0.1:13131 on the host, but the bridge proxy is listening inside the sandbox, so endpoint calls that worked before fail; scope the bridge URL override to the sandbox program setup/command path rather than all setup handlers.

Useful? React with 👍 / 👎.

macroscopeapp · 2026-06-02T19:33:34Z

+    raw_request = await lease.read_file(request_path)
+    if hasattr(raw_request, "content"):
+        raw_request = raw_request.content
+    request = json.loads(str(raw_request))


🟡 Medium utils/sandbox_utils.py:679

When raw_request is bytes (after .content extraction on line 678), calling str(raw_request) on line 679 produces a string like "b'{...}'" rather than decoding the bytes. This causes json.loads to throw a JSONDecodeError or parse incorrectly. Consider decoding bytes explicitly with .decode('utf-8') instead of using str().

- request = json.loads(str(raw_request)) + if isinstance(raw_request, bytes): + raw_request = raw_request.decode('utf-8') + request = json.loads(raw_request)

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file verifiers/v1/utils/sandbox_utils.py around line 679: When `raw_request` is `bytes` (after `.content` extraction on line 678), calling `str(raw_request)` on line 679 produces a string like `"b'{...}'"` rather than decoding the bytes. This causes `json.loads` to throw a `JSONDecodeError` or parse incorrectly. Consider decoding bytes explicitly with `.decode('utf-8')` instead of using `str()`. Evidence trail: verifiers/v1/utils/sandbox_utils.py lines 670-683 (REVIEWED_COMMIT): forward_sandbox_bridge_request function showing lines 677-679 where .content (bytes) is extracted then passed through str() to json.loads. verifiers/v1/utils/sandbox_utils.py line 275 and 129: read_file returns object. verifiers/v1/utils/sandbox_utils.py line 316-329: call_sandbox_client passes through arbitrary return types. Python docs: str(bytes_object) returns repr-like "b'...'" not decoded text; json.loads accepts bytes directly since Python 3.6.

…into codex/offline-sandbox-host-programs

macroscopeapp · 2026-06-02T20:01:16Z

+        if isinstance(original_root, str):
+            state["endpoint_root_url"] = original_root
+        if isinstance(original_base, str):
+            state["endpoint_base_url"] = original_base


🟠 High utils/sandbox_utils.py:618

When state["endpoint_root_url"] or state["endpoint_base_url"] are None (or absent) on entry, the finally block at lines 618-621 fails to restore them. The isinstance(original_root, str) checks skip restoration when the original value is None, leaving the temporary bridge URLs (http://127.0.0.1:{BRIDGE_PORT}/...) in state after the context manager exits. Consider unconditionally restoring the original values so callers see consistent state.

- if isinstance(original_root, str): - state["endpoint_root_url"] = original_root - if isinstance(original_base, str): - state["endpoint_base_url"] = original_base + state["endpoint_root_url"] = original_root + state["endpoint_base_url"] = original_base

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:

In file verifiers/v1/utils/sandbox_utils.py around lines 618-621: When `state["endpoint_root_url"]` or `state["endpoint_base_url"]` are `None` (or absent) on entry, the `finally` block at lines 618-621 fails to restore them. The `isinstance(original_root, str)` checks skip restoration when the original value is `None`, leaving the temporary bridge URLs (`http://127.0.0.1:{BRIDGE_PORT}/...`) in state after the context manager exits. Consider unconditionally restoring the original values so callers see consistent state. Evidence trail: verifiers/v1/utils/sandbox_utils.py lines 560-561 (original values captured via state.get, can be None), lines 604-607 (temporary bridge URLs written to state), lines 618-621 (isinstance(original_root, str) guard skips restoration when original is None)

cursor · 2026-06-02T20:06:10Z

+
+if __name__ == "__main__":
+    main()
+"""


Bridge proxy source embedded as raw string literal

Low Severity

The BRIDGE_PROXY_SOURCE is an ~90-line Python HTTP server embedded as a raw string literal inside sandbox_utils.py. This makes the proxy code invisible to linters, type checkers, and test tooling. A separate .py file read via importlib.resources (the same pattern already used in this module for other resource files) would allow the proxy code to be linted, tested, and maintained independently.

^{Reviewed by Cursor Bugbot for commit f189a3d. Configure here.}

cursor · 2026-06-02T20:06:10Z

+        return await call_sandbox_client(
            self.client.read_file,
            sandbox_id=self.id,
+            file_path=path,


Duplicate keyword args passed to read_file client

Low Severity

SandboxLease.read_file passes both file_path=path and path=path to call_sandbox_client. When the underlying client method accepts **kwargs, both keys are forwarded. If the client implementation interprets file_path and path as distinct parameters (e.g., a path vs. a remote file path), this could silently produce incorrect behavior. Only one canonical name matching the SandboxClient protocol (path) is needed; the fallback file_path turns the call into a guessing game.

^{Reviewed by Cursor Bugbot for commit f189a3d. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cfcd27763

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T20:17:43Z

+    if pending:
+        await asyncio.gather(*pending)


Cancel bridge requests when the sandbox command exits

If the sandboxed program exits while a bridge request is still in flight (for example it starts a background model/tool call, or the host endpoint stalls), stopping the bridge waits for all pending forwarders instead of cancelling them. Since each forwarder uses an unbounded httpx.Timeout(None), rollout cleanup can hang indefinitely after the command has already finished; the stop path should cancel or bound these pending bridge tasks.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 86802ee. Configure here.}

cursor · 2026-06-02T20:32:55Z

+                f"http://127.0.0.1:{endpoint.server.port}{path}",
+                headers=headers,
+                content=body,
+            )


New httpx client created per forwarded bridge request

Low Severity

forward_sandbox_bridge_request creates a fresh httpx.AsyncClient (and thus a fresh connection pool and TCP connection) for every single intercepted request. Since the destination is always the same local interception server (127.0.0.1:{port}), a single shared client would avoid repeated connection setup overhead, especially during multi-turn rollouts with many model calls.

^{Reviewed by Cursor Bugbot for commit 86802ee. Configure here.}

Support host-side agents for offline sandboxes

4f702dd

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/harness.py Outdated

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/utils/program_utils.py Outdated

Reject local sandbox setup without primary sandbox

175a1a2

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/harness.py Outdated

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/harness.py Outdated

Fix host sandbox endpoint routing

1c6ee8c

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/utils/sandbox_utils.py Outdated

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/harness.py Outdated

macroscopeapp Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/harness.py

xeophon changed the title ~~Support host-side agents for offline sandboxes~~ Bridge offline sandboxes to Verifiers interception May 31, 2026

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread verifiers/v1/utils/program_utils.py

Comment thread verifiers/v1/utils/sandbox_utils.py

Replace offline host agents with sandbox bridge

c2bc229

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

xeophon force-pushed the codex/offline-sandbox-host-programs branch from e72b8e8 to c2bc229 Compare May 31, 2026 20:19

cursor Bot reviewed May 31, 2026

View reviewed changes

Comment thread docs/byo-harness.md

Comment thread verifiers/v1/utils/sandbox_utils.py Outdated

Comment thread verifiers/v1/utils/sandbox_utils.py Outdated

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

xeophon added 2 commits June 1, 2026 11:26

Fix sandbox program test double

ae5f525

Merge sandbox runtime refactor into bridge

e38ad0a

xeophon changed the base branch from main to v1-sandbox-runtime-scalability June 2, 2026 19:30

macroscopeapp Bot reviewed Jun 2, 2026

View reviewed changes

xeophon added 3 commits June 2, 2026 21:35

Merge latest sandbox lifecycle updates

c11bf99

Merge remote-tracking branch 'origin/v1-sandbox-runtime-scalability' …

89983df

…into codex/offline-sandbox-host-programs

Simplify sandbox bridge implementation

f189a3d

macroscopeapp Bot reviewed Jun 2, 2026

View reviewed changes

cursor Bot reviewed Jun 2, 2026

View reviewed changes

Trim offline bridge diff

6cfcd27

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

Simplify sandbox bridge control flow

86802ee

cursor Bot reviewed Jun 2, 2026

View reviewed changes

		state["endpoint_root_url"] = f"http://127.0.0.1:{BRIDGE_PORT}/rollout/{rollout_key}"
		state["endpoint_base_url"] = f"{state['endpoint_root_url']}/v1"

Conversation

xeophon commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Bridge offline sandboxes to Verifier endpoint interception via in-sandbox HTTP proxy

Stack

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

macroscopeapp Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

Bridge proxy source embedded as raw string literal

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

Duplicate keyword args passed to read_file client

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

New httpx client created per forwarded bridge request

Uh oh!

Reviewers

Assignees

xeophon commented May 31, 2026 •

edited

Loading

macroscopeapp Bot commented May 31, 2026 •

edited

Loading

Duplicate keyword args passed to `read_file` client