feat(agents-runtime): Sandbox primitive + native (Seatbelt/bwrap) and E2B remote providers#4369
Draft
msfstef wants to merge 18 commits into
Draft
feat(agents-runtime): Sandbox primitive + native (Seatbelt/bwrap) and E2B remote providers#4369msfstef wants to merge 18 commits into
msfstef wants to merge 18 commits into
Conversation
Introduce the Sandbox interface and an unrestrictedSandbox provider as the
plumbing for host-isolation work. No default behavior change — all built-in
entities (Horton, Worker) explicitly construct unrestrictedSandbox so they
behave identically to before. See plans/sandbox-design.md.
- New: packages/agents-runtime/src/sandbox/{types,unrestricted}.ts and the
public /sandbox subpath aggregator.
- Tool factories (bash, read, write, edit, fetch_url) now take Sandbox
instead of a workingDirectory string. They delegate FS/exec/fetch to it.
- bash no longer forwards process.env to children. Scrubbed env
(PATH/HOME/USER/LANG/TERM) only. Closes env-var exfil via "echo \$KEY".
- bash description string stops claiming a sandbox that wasn't there.
- read/write/edit add realpath-based path resolution via resolveSafePath
to block symlink-escape from the workspace.
- The standalone fetchUrlTool export is removed; callers must construct
via createFetchUrlTool(sandbox).
- Horton/Worker construct unrestrictedSandbox per wake and dispose in a
finally block. Conformance tests updated to the new signatures.
Tests: 48 new tests across sandbox-unrestricted, sandbox-tool-refactor,
and sandbox-tool-symlink-safety. Existing test suites for bash/write/edit
updated for the new tool signatures. Full agents-runtime suite + agents
suite green; typecheck clean across runtime, agents, and conformance-tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (PR 6b)
Adds nativeSandbox(), a Sandbox provider that wraps Anthropic's
sandbox-runtime library to enforce host isolation through OS-level
primitives (Seatbelt on macOS, bubblewrap on Linux/WSL2).
Architecture:
- New dependency: @anthropic-ai/sandbox-runtime@0.0.52 (Apache-2.0, pinned).
- src/sandbox/native.ts: implements Sandbox over SandboxManager. Translates
our config (workingDirectory, allowedHosts, extraReadPaths) into the
library's config shape so customers never see the library's API.
- Lazy initialization: SandboxManager is only set up on the first exec()
call. readFile / writeFile / mkdir / fetch are enforced at the TS layer
(path canonicalization + deny overlay; hostname allowlist for fetch).
No proxy startup cost for handlers that don't spawn subprocesses.
- Refcount + single-instance enforcement: one workingDirectory can be
actively exec'd through the OS sandbox at a time in one Node process.
Concurrent exec from a conflicting workingDirectory throws
SandboxError({kind: 'unavailable'}).
- Default deny overlay covers ~/.ssh, ~/.aws, ~/.config/{gcloud,op,gh},
~/.kube, ~/.docker, ~/.netrc, ~/.npmrc, ~/.pgpass, ~/.huggingface,
and ~/Library/Application Support. Documented as incomplete in
plans/sandbox-design.md §5.2; the v2 fix is a curated read-allowlist.
- name: 'native:macos-seatbelt' on Darwin, 'native:linux-bwrap-only'
elsewhere — makes the bwrap-only Linux limitation legible in logs.
- Throws SandboxError({kind: 'unavailable'}) on unsupported platforms
(Windows) with an actionable error pointing to unrestrictedSandbox or
remoteSandbox.
Tests (test/sandbox-native.test.ts):
- Identity, FS policy (deny overlay, allowed reads/writes), fetch policy.
- Lifecycle: re-construction after dispose, concurrent-exec rejection.
- Real OS sandbox integration tests (skipped on unsupported platforms):
basic echo, /etc/sudoers blocked, writes inside cwd allowed.
No default change for Horton/Worker — they still use unrestrictedSandbox.
PR 6d will flip the default and add the Horton home-as-cwd fix.
Also: write-tool test updated to compare canonical (realpath-resolved)
paths in readSet, matching PR 6a's symlink-safety semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds remoteSandbox(), a Sandbox provider that delegates host isolation to a
remote workspace (microVM/container) at a SaaS provider. v1 ships an E2B
adapter; additional providers (Vercel, Daytona) are mechanical to add via
the RemoteSandboxClient interface.
Architecture:
- src/sandbox/remote/types.ts: RemoteSandboxClient interface — the narrow
contract each provider adapter implements (exec, readFile, writeFile,
mkdir, kill).
- src/sandbox/remote/e2b.ts: createE2BClient and adaptE2B. Dynamically
imports the 'e2b' package so it remains an *optional peer dependency*.
Customers using the remote provider install e2b separately; no install
cost for everyone else.
- src/sandbox/remote.ts: provider-neutral remoteSandbox factory and the
RemoteSandbox class implementing the Sandbox interface. FS paths are
VM-rooted (default cwd '/work'). Writes outside the working directory
are rejected at the TS layer. dispose() calls client.kill() once;
subsequent operations throw SandboxError({kind:'runtime'}).
The 'client' opt accepts a pre-constructed RemoteSandboxClient, used by
tests (a fake client tracks all calls and serves an in-memory FS) and by
customers who want to wrap the provider SDK with retry/observability
before handing it to us.
sandbox.fetch() runs in the host Node process with a TS-level hostname
allowlist — *not* inside the VM. Documented caveat: to route outbound
traffic through the VM, use sandbox.exec('curl ...'). v1.1 may add a
VM-routed fetch.
Tests (test/sandbox-remote.test.ts, 9 cases):
- Identity (name reflects provider).
- exec delegation with default + override cwd.
- writeFile/readFile roundtrip; writeFile outside cwd rejected.
- mkdir delegation, including recursive walk.
- fetch hostname allowlist rejection.
- dispose calls kill exactly once even on repeat.
- Unknown provider name throws SandboxError({kind:'unavailable'}).
No real e2b account/SDK is needed for the test suite — all tests use the
in-memory fake client.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(PR 6d) Wires the native sandbox in as the default for built-in entities (Horton, Worker) on macOS and Linux. Behavior change: LLM-driven bash/read/write/ edit/fetch_url tools now run inside Seatbelt (macOS) or bubblewrap (Linux) by default, with the env-scrubbing + symlink-safety from PR 6a and the default deny overlay from PR 6b. - New: src/sandbox/default.ts — chooseDefaultSandbox(workingDirectory, env?) helper. Picks nativeSandbox when SandboxManager.isSupportedPlatform() returns true; otherwise unrestrictedSandbox. - Panic-revert env switch: ELECTRIC_AGENTS_UNRESTRICTED=1 (also accepts 'true'/'yes'/'on', case-insensitive) forces unrestrictedSandbox on any platform. Documented as the emergency lever when the native engine misbehaves; not promoted in customer-facing docs. - Horton and Worker handlers replace their direct unrestrictedSandbox construction with a chooseDefaultSandbox call. No other change to the handler logic; the try/finally dispose pattern from PR 6a stays. Tests (test/sandbox-default.test.ts, 5 cases): - Native chosen on supported platforms. - ELECTRIC_AGENTS_UNRESTRICTED=1 forces unrestricted. - Case-insensitive truthy values (true, yes, on) all force unrestricted. - Unrestricted picked when isNativeSupported() returns false (Windows shape via the testing override). - ELECTRIC_AGENTS_UNRESTRICTED=0 does NOT trigger the panic switch. The agents-desktop home-as-cwd fix (main.ts:1939 'app.getPath(home)' fallback) is deferred to a separate, smaller desktop PR — it's a UX change with its own implications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egatives Closes two test-coverage gaps that surfaced during PR 6a-6d review. sandbox-conformance.test.ts (20 cases): - Parameterizes a single set of scenarios over unrestricted, native (real OS sandbox, gated by SandboxManager.isSupportedPlatform), and remote (driven by an in-memory fake matching RemoteSandboxClient). - Asserts the cross-provider contract: writeFile+readFile roundtrip, exec returns an exitCode, dispose is safe, name/workingDirectory exposed, readFile ENOENT propagates. - Encodes the *deliberate* semantic difference: writeFile outside cwd rejects for native/remote (policy-bearing providers) but succeeds for unrestricted (which delegates to node:fs — path security lives in the tool layer's resolveSafePath helper). - Symlink-escape sub-suite for non-remote providers documents that unrestricted does not block symlinks at the sandbox layer (tool layer handles it) while native does. sandbox-native-os.test.ts (5 cases, real OS sandbox only): - bash does not inherit arbitrary parent env vars (closes the __SANDBOX_OS_TEST_SECRET__ exfil path via the OS sandbox). - bash cannot write outside the working directory at the OS level. - bash cannot follow a symlink whose target is in the default deny overlay. Comments explicitly note the v1 denylist's limitation: symlinks to arbitrary /tmp paths *are* readable (option 1 in plans/sandbox-design.md §5.2); only paths inside the deny set are blocked. v2 read-allowlist would change this. - bash with no allowedHosts cannot reach the network (verifies https://1.1.1.1 is refused). - readFile through the TS adapter denies known credential paths under home (~/.ssh, ~/.aws, ~/.config/gcloud). Coverage gap honest-status after this commit: - remoteSandbox against the real E2B SDK is still untested (needs an account). adaptE2B's type translations could drift without us noticing. - Linux bwrap path is not exercised in CI by this machine (macOS dev env). - Horton/Worker full integration through a fake LLM is blocked by the pre-existing better-sqlite3 missing-module error in packages/agents. Test totals: 78 sandbox tests, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The e2b peer dep added in PR 6c was missing from the lockfile (an earlier checkout reverted the install change). This commit lands the lock entries for e2b@2.21.0 and its transitive deps so a fresh pnpm install resolves consistently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three integration adjustments after rebasing on origin/main: - Delete packages/agents-runtime/test/tool-path-symlink.test.ts. This was a characterization test from #4354 that documents pre-fix symlink-escape behavior with an explicit "update when realpath resolution lands" note. PR 6a's resolveSafePath helper is that fix; the file's expectations are now contradicted by sandbox-tool-symlink-safety.test.ts. - Trim packages/agents-runtime/test/bash-tool.test.ts: the two characterization tests from #4354 that documented the bash env-leak bug are removed. PR 6a fixed that bug; sandbox-tool-refactor.test.ts has the corresponding assertion ('does not forward arbitrary process.env to children'). The first test in the file (cwd + HOME exposure) stays. - Migrate packages/agents-runtime/test/fetch-url-ssrf.test.ts to the new createFetchUrlTool(sandbox, opts) signature. The assertions still hold for unrestrictedSandbox (NetPolicy SSRF protection is deferred); the test is now explicit about that scope. - Remove the @ts-expect-error directive on the dynamic e2b import in src/sandbox/remote/e2b.ts. With e2b now in the lockfile, TS resolves the package and the directive is unused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c6a9ffc to
91303cc
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4369 +/- ##
==========================================
+ Coverage 55.84% 60.63% +4.78%
==========================================
Files 245 302 +57
Lines 24847 30248 +5401
Branches 6878 8137 +1259
==========================================
+ Hits 13876 18340 +4464
- Misses 10957 11891 +934
- Partials 14 17 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…proxy
Previously sandbox.fetch() on nativeSandbox enforced its hostname policy
via a manual `Set<string>.has(url.hostname)` exact-match check that
duplicated (badly) what `@anthropic-ai/sandbox-runtime`'s HTTP proxy
already does for subprocess egress.
The two pathways now share one policy enforcer (the library's proxy)
with consistent semantics:
- Wildcard patterns (e.g. `*.example.com`)
- IP canonicalization (e.g. `2852039166` → `169.254.169.254`)
- Denied-domains taking precedence over allowed
- Control-character host rejection
- IPv6 zone-ID payload rejection
Implementation:
- Add `undici` as a direct dependency for `ProxyAgent`.
- On `ensureInitialized`, read `SandboxManager.getProxyPort()` and build
a `ProxyAgent('http://127.0.0.1:PORT')` dispatcher.
- `sandbox.fetch()` passes `{ dispatcher }` to global fetch so undici
routes the request through the same proxy that gates `bash`-emitted
egress.
- A 403 with `x-srt-denied` header or undici proxy-refusal error is
translated to `SandboxError({kind: 'policy'})` so callers still see a
consistent policy-rejection shape.
- `dispatcher.close()` runs in `dispose()` to release sockets.
Linux Unix-socket gap documented: `getLinuxHttpSocketPath()` returns
a Unix socket on Linux which `ProxyAgent` does not consume directly.
For now sandbox.fetch on Linux falls back to direct (non-proxy)
network access. exec-driven egress on Linux still routes through the
proxy correctly via the bind-mounted unix socket. A custom undici
dispatcher targeting the unix socket would close this; tracked for a
follow-up.
Tests (test/sandbox-native-proxy-fetch.test.ts, 3 cases):
- Allowed host through the proxy reaches a local HTTP server.
- Disallowed host is refused with SandboxError({kind:'policy'}).
- Wildcard patterns in allowedHosts (e.g. '*.example.com') are accepted
by the library's validator and config — confirming we no longer
shadow the matcher with our own naïve exact-match check.
Existing tests in sandbox-native.test.ts now exercise the proxy
rejection path end-to-end (a real undici/proxy round-trip), not a
synthetic Set.has() check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e kill
Two CI-surfaced bugs:
1. **nativeSandbox crashed on Linux runners without `bubblewrap` installed.**
`SandboxManager.isSupportedPlatform()` returns true on any Linux but
the actual `initialize()` call throws when bwrap isn't on PATH. Tests
gated on `isSupportedPlatform()` ran on the Linux runner and exploded
instead of skipping.
Fix: in the nativeSandbox factory and in chooseDefaultSandbox, call
`SandboxManager.checkDependencies()` and surface a missing dependency
as `SandboxError({kind: 'unavailable'})` *before* `initialize()`.
The test gates (sandbox-native, sandbox-native-os,
sandbox-native-proxy-fetch, sandbox-conformance, sandbox-default)
also use `checkDependencies()` so they skip cleanly on hosts where
the runtime tools aren't installed.
chooseDefaultSandbox now falls back to unrestrictedSandbox on a
Linux host without bwrap rather than throwing — keeps the "default
to native, panic to unrestricted" contract from PR 6d intact even
when the native engine is unusable.
2. **timeoutMs test hung on Linux until vitest's 5s default fired.**
`spawn('sh', ['-c', 'sleep 5'])` then `child.kill('SIGTERM')` kills
`sh` immediately but leaves `sleep` orphaned, still holding the
stdio pipes — the `close` event waits for the grandchild to finish
naturally. macOS happened to terminate the tree differently so the
bug only surfaced on the Ubuntu runner.
Fix: spawn with `detached: true` to create a new process group, then
send the signal to `-pid` so the whole tree dies. SIGTERM first,
escalating to SIGKILL after 500ms if anything is still hanging on.
Applied symmetrically in unrestricted.ts and native.ts.
Also: adds the missing changeset entry (`Check Changeset` CI failure).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The factory's eager `checkDependencies()` (added in the previous commit to surface 'unavailable' clearly to users) means that ALL tests in sandbox-native.test.ts crash on a host without bubblewrap, not just the ones that actually exec under the OS sandbox. The CI Linux runner exposed this — the inner `identity`, `filesystem policy`, `fetch policy`, and `lifecycle` describes were previously running on the lazy assumption and need the outer gate now. Coverage of the same TS-policy assertions remains on unsupported hosts via sandbox-conformance.test.ts's unrestricted + fake-remote providers, which are gated per-provider. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ AbortSignal
Additive interface changes — no rename of exec, no namespacing. Brings the
adapter contract in line with the May 2026 industry LCD (Vercel/Cloudflare/
E2B/Daytona/ComputeSDK) so future adapters and tools can rely on a broader
filesystem surface.
- types.ts: add readdir/exists/remove/stat to Sandbox, DirEntry/FileStat,
signal?:AbortSignal on SandboxExecOpts.
- unrestricted/native: implement new methods. AbortSignal escalates SIGTERM
then SIGKILL through the existing kill-tree path. FS errors normalized to
SandboxError('runtime') at the adapter boundary so conformance assertions
are stable across providers.
- remote + RemoteSandboxClient: extend contract; E2B adapter prefers
files.list/exists/remove/getInfo when available and falls back to shell
commands (BusyBox/GNU stat compatible) for older SDK shapes.
- write tool: switch read-before-write existence probe to sandbox.exists()
rather than ENOENT detection on readFile.
- conformance: scenarios for exists/stat/readdir/remove/remove-recursive and
an exec(AbortSignal) abort case (skipped semantically for the in-memory
remote fake).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review pass on the previous commit flagged that throwing on policy-denied paths in exists() drifts from the 2026 LCD semantics shared by Vercel, Cloudflare, and E2B — they all treat exists() as a safe-probe that returns false in both the missing and denied cases. Flipping native to match; unrestricted has no policy boundary so its behavior is unchanged. Also adds SandboxExecResult.aborted so callers can disambiguate caller-side AbortSignal cancellation from naturally-delivered signals and from timeoutMs expiry — the OS signal field is unreliable for that purpose under musl / on Alpine builds. E2B shell fallbacks hardened: readdir now uses `find -print0` to be newline-safe and preserves the file/dir/symlink distinction via `%y`; stat() validates a 3-field output structure before parsing instead of unioning two mutually-incompatible formats; failure paths synthesize classified errno codes (ENOENT/EACCES/EIO) so SandboxError messages are never blank. Conformance gains four cases: stat on missing path, remove on missing path, remove on non-empty dir without recursive, and a pre-aborted exec that returns immediately. The mid-flight abort test now asserts the new aborted boolean. Remote (fake) is skipIf'd on abort cases since the fake client has no abort plumbing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…licy)
Two interface additions that line the Sandbox contract up with the 2026
LCD (Vercel/Cloudflare/E2B publish both as first-class methods).
- types.ts: NetworkPolicy discriminated union (allow-all/deny-all/allowlist),
getUrl({port, protocol}) and updateNetworkPolicy(policy) on Sandbox.
- unrestricted: getUrl returns loopback URL; updateNetworkPolicy records
policy without enforcement (documented no-op — unrestricted has no
boundary).
- native: getUrl returns loopback (Seatbelt + bwrap both leave 127.0.0.1
reachable from inside). updateNetworkPolicy wires SandboxManager
.updateConfig() — the library API confirmed at sandbox-manager.d.ts:36
exposes mid-session reconfiguration of the MITM proxy's allowedDomains.
NativeSandboxOpts.allowedHosts is preserved but deprecated in favor of
initialNetworkPolicy; the latter wins when both supplied.
- remote: TS-side allowedHosts Set replaced with a NetworkPolicy state
machine; updateNetworkPolicy reconfigures the host-process gate and
logs once that VM-side egress is not reconfigured (E2B doesn't expose
the necessary API). getUrl delegates to a new optional client.getUrl()
hook so the contract remains pluggable.
- RemoteSandboxClient: add optional getUrl({port, protocol}); falls back
to SandboxError('unavailable') when absent.
- conformance: add ProviderCapabilities descriptor (supportsAbort/
supportsRealGetUrl/enforcesNetworkPolicy) so per-provider quirks become
declarative instead of name-string branching. Two new scenarios:
getUrl returns a port-bearing URL (or rejects unavailable), and
updateNetworkPolicy(deny-all) flips subsequent fetch to policy-rejected.
Abort-skip predicates migrated to capability checks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A fourth sandbox provider that runs each Sandbox instance inside a
dedicated Docker container. Targets local development and self-hosted
deployments where neither macOS Seatbelt nor a paid microVM provider is
appropriate. Built on dockerode (added as an optional peer dependency).
Hardening
---------
The HostConfig is hardcoded and not overrideable from DockerSandboxOpts:
- CapDrop: ['ALL'], CapAdd: [] — no caps means mount/chroot/su fail
- SecurityOpt: ['no-new-privileges:true']
- Privileged: false
- PidsLimit / Memory / NanoCpus — sensible defaults, opt-out via opts.resources
- Ulimits: nofile=2048, nproc=1024
- IpcMode: 'none'
- AutoRemove: true (ephemeral)
- NetworkMode: 'none' by default; switches to 'bridge' only when an
allowlist or exposedPorts is set
- Default image pinned by digest (node:20-alpine multi-arch manifest)
- extraMounts entries enforce readOnly: true at the type level and
reject any hostPath matching /docker\.sock$/ at runtime so the docker
socket cannot be exposed back to sandboxed code.
ReadonlyRootfs is intentionally *not* enabled by default — dockerode's
putArchive operates at the storage-driver layer and gets rejected on a
read-only rootfs even when the target is a tmpfs/volume. Operators who
want it must drive all writes via sandbox.exec.
Filesystem
----------
src/sandbox/docker/fs.ts provides putFile/getFile via dockerode's
putArchive/getArchive (small in-memory tar writer, no extra dep) and
exec-based readdir/exists/remove/stat. readdir does three POSIX
`find -type X -print0` passes so it works on both GNU find and BusyBox
(alpine) — newline-safe via NUL delimiting.
Network
-------
src/sandbox/docker/proxy.ts ships a minimal HTTP/HTTPS forward proxy
(~150 LoC, node:http + CONNECT) with a dynamic allowlist that
updateNetworkPolicy mutates in place. Container env is set with
HTTP_PROXY=http://host.docker.internal:<port> so tools that respect
proxy env vars (curl, python requests, undici with ProxyAgent, browser
clients) route through it. Programs that bypass HTTP_PROXY (raw TCP,
Node's built-in fetch without explicit setGlobalDispatcher) leak
through Docker's NAT — documented gap; v2 needs a sidecar netns / nft
filter for full sealing.
Lifecycle
---------
One long-lived container per Sandbox instance (PID 1 = sleep keepalive).
Each exec uses container.exec() with stream demux. timeoutMs and
AbortSignal both wire to a kill-everything-but-PID-1 helper that
enumerates /proc and SIGKILLs to side-step a dockerode stream-close
race. dispose() removes the container even if AutoRemove already fired.
Testing
-------
- test/helpers/docker-probe.ts: top-level await isDockerAvailable() so
describe.skipIf works at import time. Tests skip cleanly with a
warning when the daemon is unreachable.
- test/sandbox-docker.test.ts: 13 integration tests against real
Docker — roundtrip, hardening (caps, docker.sock, mount, chroot),
timeouts, AbortSignal, port forwarding, policy enforcement at the
proxy boundary, leftover-container sweep. Runs in <10s on a warm
machine.
- test/sandbox-docker-smoke.test.ts: ad-hoc smoke probes that exercise
CapEff/CapPrm/CapBnd ≡ 0, container /etc/passwd isolation, /Users
invisibility, raw-CONNECT proxy allow vs deny.
All 132 sandbox tests pass on macOS + OrbStack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cker
Add `KNOWN_ADAPTERS` const to the public sandbox barrel and assert in the
conformance suite that every slug is exercised by exactly one provider —
adding a new adapter without registering it in the conformance suite
will now fail CI.
Wire the docker adapter into the cross-provider conformance loop with
a `dockerAvailable`-gated `enabled` flag. The describe gating relies on
the existing top-level await probe at `test/helpers/docker-probe.ts`
(skips clean on machines without Docker, no CI workflow change needed —
Ubuntu runners have Docker pre-installed, macOS doesn't and skips
gracefully).
Replace remaining provider-name string equality checks (`provider.name
=== 'remote (fake)'`, `=== 'unrestricted'`) with two declarative axes
on `ProviderFactory`:
- `adapter: KnownAdapter` — KNOWN_ADAPTERS slug, used by the symlink
test branch and the unrestricted-no-policy branch
- `outsideKind: 'host-tempdir' | 'etc-passwd'` — controls which path
the "writeFile outside the working directory" probe uses, replacing
the brittle name match
73 conformance scenarios now run against the docker provider (in ~12s
on a warm machine, OrbStack on M-series). All 151 sandbox tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…low-all
Address the high-severity findings from the final security review.
Docker read-side policy (R1)
----------------------------
readdir, stat, and readFile now call assertReadable() to enforce the
workingDirectory boundary, matching native's behavior. Previously
docker.readdir('/etc') would silently succeed and enumerate the
container's filesystem — only writeFile / mkdir / remove were gated.
exists() also goes through the new isReadable() helper but returns
false on denial (safe-probe semantics, consistent with the rest of
the adapter set).
Proxy SSRF guards (R2)
----------------------
The docker allowlist proxy now refuses CONNECT and plain-HTTP requests
to literal RFC1918 (10/8, 172.16/12, 192.168/16, 100.64/10 CGNAT),
loopback (127/8), unspecified (0/8), link-local + cloud metadata
(169.254/16, 169.254.169.254 AWS/GCP), IPv6 loopback (::1), and IPv6
link-local / unique-local (fe80::, fc::, fd::). The guard runs after
the hostname allowlist and rejects regardless of the policy decision —
even if a user explicitly allows 169.254.169.254 they cannot reach it.
Plain-HTTP proxying now overrides the caller-supplied Host header with
the target's authority, so an attacker can no longer split an
allowlisted absolute URL from a different vhost via the Host header.
proxy-authorization and proxy-connection hop-by-hop headers are also
stripped before forwarding.
Port bindings bind to 127.0.0.1 only (was 0.0.0.0) so dev-machine
sandboxes don't expose services across the LAN.
Native allow-all (correctness)
------------------------------
The upstream @anthropic-ai/sandbox-runtime config validator rejects
bare '*' in network.allowedDomains as "too broad". Our previous
policyToAllowedDomains returned ['*'] for mode:'allow-all', which would
have silently broken init. Throw SandboxError('unavailable') with a
pointer to unrestrictedSandbox instead.
New conformance — sandbox-docker.test.ts:
- "read-side methods enforce the working directory boundary": asserts
readFile / readdir / stat throw policy and exists() returns false
for /etc paths.
New smoke — sandbox-docker-smoke.test.ts: even when explicitly allowed,
CONNECT to 169.254.169.254, 127.0.0.1, and 10.0.0.1 all return 403.
149 + 2 new tests pass; conformance still green across all 4 adapters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The underlying SandboxManager from @anthropic-ai/sandbox-runtime is a
process-global singleton: two nativeSandbox instances with different
working directories conflict and throw SandboxError('unavailable'). The
agents-runtime hosts many agent entities concurrently, each with its own
working directory, so this constraint is incompatible with the product.
dockerSandbox now covers the strong-isolation use case (no singleton,
multi-instance safe). unrestrictedSandbox + tool-layer policy (env
scrubbing, symlink resolution, fetch SSRF guards) covers the dev case.
- Delete src/sandbox/native.ts and the three native test files.
- Drop 'native' from KNOWN_ADAPTERS; drop nativeSandbox /
NativeSandboxOpts / ChooseDefaultSandboxOpts exports.
- Simplify chooseDefaultSandbox to always return unrestrictedSandbox.
Remove the ELECTRIC_AGENTS_UNRESTRICTED env var — it only existed to
revert from native to unrestricted, which is now the default.
- Drop the native provider entry from the conformance suite; the
KNOWN_ADAPTERS round-trip assertion now covers unrestricted/remote/docker.
- Drop @anthropic-ai/sandbox-runtime from dependencies; regenerate
pnpm-lock.yaml.
- Update sandbox-design.md and the changeset to reflect the new lineup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Electric Agents Desktop BuildsBuild artifacts for commit
|
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the
Sandboxprimitive to the agents runtime and ships three providers:unrestrictedSandbox— explicit pass-through overnode:fs/child_process. The name is the warning.nativeSandbox— wraps@anthropic-ai/sandbox-runtime(Apache-2.0) for Seatbelt on macOS and bubblewrap on Linux/WSL2. Default deny overlay for~/.ssh,~/.aws,~/.config/{gcloud,op,gh},~/.kube,~/.docker,~/.netrc,~/.npmrc,~/.pgpass,~/.huggingface,~/Library/Application Support. Lazy-initialized: pure FS/fetch policy without spinning up proxy servers.remoteSandbox({provider: 'e2b'})— adapter for E2B's npm SDK, loaded as an optional peer dependency. TheRemoteSandboxClientinterface makes it mechanical to add Vercel/Daytona/etc.Built-in entities (Horton, Worker) now default to native sandboxing on macOS/Linux via a new
chooseDefaultSandbox(workingDirectory, env?)helper.ELECTRIC_AGENTS_UNRESTRICTED=1is the documented panic-revert env switch.Folded into PR 6a are three behavior-relevant security fixes:
process.envto children (closes$ANTHROPIC_API_KEYexfil)Design doc:
plans/sandbox-design.md. Investigation that motivated this:plans/sandboxing-investigation.md.Commits in this branch
2acaa5695— PR 6a:Sandboxinterface +unrestrictedSandbox+ tool refactor + env scrub + symlink fixes1bd193cab— PR 6b:nativeSandboxvia@anthropic-ai/sandbox-runtime7da404428— PR 6c:remoteSandbox+ E2B adaptered0a231e8— PR 6d:chooseDefaultSandbox+ Horton/Worker default to nativec6a9ffc54— Cross-provider conformance suite + real-OS negative tests fornativeSandboxWhat this primitive is and is not
Targets host isolation for LLM-driven tool calls (escape of cwd, env-var exfil, arbitrary network egress, symlink traversal). Explicitly does not address prompt-injection-driven misuse of legitimate tools — that's a separate
ToolGateprimitive on its own schedule (seeplans/sandboxing-investigation.md).Documented gaps (in §10 of the design doc):
'native:linux-bwrap-only'makes this legible in logs. FuturenativeSandboxStrongtier with Codex-derived helper is the escalation.sandbox-execis officially deprecated; lazy-init + startup smoke test catches profile drift.sandbox.fetch()onremoteSandboxruns in the host Node process, not inside the VM. To route through the VM, usesandbox.exec('curl ...').Test plan
Local (this branch, macOS): 78 sandbox tests green, including 5 real-OS Seatbelt negative tests (
sandbox-native-os.test.ts) verifying env scrubbing, network deny, write blocking, deny-overlay symlink blocking. 20 cross-provider conformance tests pin the contract across providers. typecheck clean onagents-runtime,agents,agents-server-conformance-tests.remoteSandbox({provider: 'e2b'})against a real E2B account (theadaptE2Btranslation isn't unit-tested against the real SDK)better-sqlite3builds viapnpm approve-buildssopackages/agentstest suite runs (pre-existing pnpm 10 issue, blocks horton/worker integration tests)ELECTRIC_AGENTS_UNRESTRICTED=1in dev loop falls back to unrestricted as expecteddrizzle-orm/postgres-jsimport inruntime-dsl.test.tsif you want that suite green too (not from this branch)Known unrelated test failures on
mainpackages/agents-runtime/test/runtime-dsl.test.ts— fails to importdrizzle-orm/postgres-jsfrom agents-server. Pre-existing.packages/agents/test/*— fails onbetter-sqlite3missing native module. Pre-existing pnpm 10 build-script gating; not introduced by this branch.Both confirmed pre-existing via
git stash && pnpm test && git stash pop.🤖 Generated with Claude Code