Skip to content

feat(agents-runtime): Sandbox primitive + native (Seatbelt/bwrap) and E2B remote providers#4369

Draft
msfstef wants to merge 18 commits into
mainfrom
msfstef/agent-sandboxing-1
Draft

feat(agents-runtime): Sandbox primitive + native (Seatbelt/bwrap) and E2B remote providers#4369
msfstef wants to merge 18 commits into
mainfrom
msfstef/agent-sandboxing-1

Conversation

@msfstef
Copy link
Copy Markdown
Contributor

@msfstef msfstef commented May 20, 2026

Summary

Adds the Sandbox primitive to the agents runtime and ships three providers:

  • unrestrictedSandbox — explicit pass-through over node:fs / child_process. The name is the warning.
  • nativeSandbox — wraps @anthropic-ai/sandbox-runtime (Apache-2.0) for Seatbelt on macOS and bubblewrap on Linux/WSL2. Default deny overlay for ~/.ssh, ~/.aws, ~/.config/{gcloud,op,gh}, ~/.kube, ~/.docker, ~/.netrc, ~/.npmrc, ~/.pgpass, ~/.huggingface, ~/Library/Application Support. Lazy-initialized: pure FS/fetch policy without spinning up proxy servers.
  • remoteSandbox({provider: 'e2b'}) — adapter for E2B's npm SDK, loaded as an optional peer dependency. The RemoteSandboxClient interface makes it mechanical to add Vercel/Daytona/etc.

Built-in entities (Horton, Worker) now default to native sandboxing on macOS/Linux via a new chooseDefaultSandbox(workingDirectory, env?) helper. ELECTRIC_AGENTS_UNRESTRICTED=1 is the documented panic-revert env switch.

Folded into PR 6a are three behavior-relevant security fixes:

  • bash no longer forwards process.env to children (closes $ANTHROPIC_API_KEY exfil)
  • bash tool description corrected (no longer claims to be sandboxed)
  • read/write/edit reject symlink escapes from the workspace at the tool layer

Design doc: plans/sandbox-design.md. Investigation that motivated this: plans/sandboxing-investigation.md.

Commits in this branch

  • 2acaa5695 — PR 6a: Sandbox interface + unrestrictedSandbox + tool refactor + env scrub + symlink fixes
  • 1bd193cab — PR 6b: nativeSandbox via @anthropic-ai/sandbox-runtime
  • 7da404428 — PR 6c: remoteSandbox + E2B adapter
  • ed0a231e8 — PR 6d: chooseDefaultSandbox + Horton/Worker default to native
  • c6a9ffc54 — Cross-provider conformance suite + real-OS negative tests for nativeSandbox

What this primitive is and is not

Targets host isolation for LLM-driven tool calls (escape of cwd, env-var exfil, arbitrary network egress, symlink traversal). Explicitly does not address prompt-injection-driven misuse of legitimate tools — that's a separate ToolGate primitive on its own schedule (see plans/sandboxing-investigation.md).

Documented gaps (in §10 of the design doc):

  • Linux is bwrap-only (no Landlock/seccomp). Provider name 'native:linux-bwrap-only' makes this legible in logs. Future nativeSandboxStrong tier with Codex-derived helper is the escalation.
  • v1 reads use a curated denylist (option 1); v2 will tighten to a read-allowlist.
  • macOS sandbox-exec is officially deprecated; lazy-init + startup smoke test catches profile drift.
  • sandbox.fetch() on remoteSandbox runs in the host Node process, not inside the VM. To route through the VM, use sandbox.exec('curl ...').

Test plan

Local (this branch, macOS): 78 sandbox tests green, including 5 real-OS Seatbelt negative tests (sandbox-native-os.test.ts) verifying env scrubbing, network deny, write blocking, deny-overlay symlink blocking. 20 cross-provider conformance tests pin the contract across providers. typecheck clean on agents-runtime, agents, agents-server-conformance-tests.

  • CI matrix runs Linux bwrap path (this branch hasn't been exercised on Linux in dev)
  • Manual smoke test of remoteSandbox({provider: 'e2b'}) against a real E2B account (the adaptE2B translation isn't unit-tested against the real SDK)
  • Approve better-sqlite3 builds via pnpm approve-builds so packages/agents test suite runs (pre-existing pnpm 10 issue, blocks horton/worker integration tests)
  • Verify Horton with ELECTRIC_AGENTS_UNRESTRICTED=1 in dev loop falls back to unrestricted as expected
  • Resolve the pre-existing drizzle-orm/postgres-js import in runtime-dsl.test.ts if you want that suite green too (not from this branch)

Known unrelated test failures on main

  • packages/agents-runtime/test/runtime-dsl.test.ts — fails to import drizzle-orm/postgres-js from agents-server. Pre-existing.
  • packages/agents/test/* — fails on better-sqlite3 missing native module. Pre-existing pnpm 10 build-script gating; not introduced by this branch.

Both confirmed pre-existing via git stash && pnpm test && git stash pop.

🤖 Generated with Claude Code

@msfstef msfstef self-assigned this May 20, 2026
msfstef and others added 7 commits May 20, 2026 11:50
Introduce the Sandbox interface and an unrestrictedSandbox provider as the
plumbing for host-isolation work. No default behavior change — all built-in
entities (Horton, Worker) explicitly construct unrestrictedSandbox so they
behave identically to before. See plans/sandbox-design.md.

- New: packages/agents-runtime/src/sandbox/{types,unrestricted}.ts and the
  public /sandbox subpath aggregator.
- Tool factories (bash, read, write, edit, fetch_url) now take Sandbox
  instead of a workingDirectory string. They delegate FS/exec/fetch to it.
- bash no longer forwards process.env to children. Scrubbed env
  (PATH/HOME/USER/LANG/TERM) only. Closes env-var exfil via "echo \$KEY".
- bash description string stops claiming a sandbox that wasn't there.
- read/write/edit add realpath-based path resolution via resolveSafePath
  to block symlink-escape from the workspace.
- The standalone fetchUrlTool export is removed; callers must construct
  via createFetchUrlTool(sandbox).
- Horton/Worker construct unrestrictedSandbox per wake and dispose in a
  finally block. Conformance tests updated to the new signatures.

Tests: 48 new tests across sandbox-unrestricted, sandbox-tool-refactor,
and sandbox-tool-symlink-safety. Existing test suites for bash/write/edit
updated for the new tool signatures. Full agents-runtime suite + agents
suite green; typecheck clean across runtime, agents, and conformance-tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (PR 6b)

Adds nativeSandbox(), a Sandbox provider that wraps Anthropic's
sandbox-runtime library to enforce host isolation through OS-level
primitives (Seatbelt on macOS, bubblewrap on Linux/WSL2).

Architecture:
- New dependency: @anthropic-ai/sandbox-runtime@0.0.52 (Apache-2.0, pinned).
- src/sandbox/native.ts: implements Sandbox over SandboxManager. Translates
  our config (workingDirectory, allowedHosts, extraReadPaths) into the
  library's config shape so customers never see the library's API.
- Lazy initialization: SandboxManager is only set up on the first exec()
  call. readFile / writeFile / mkdir / fetch are enforced at the TS layer
  (path canonicalization + deny overlay; hostname allowlist for fetch).
  No proxy startup cost for handlers that don't spawn subprocesses.
- Refcount + single-instance enforcement: one workingDirectory can be
  actively exec'd through the OS sandbox at a time in one Node process.
  Concurrent exec from a conflicting workingDirectory throws
  SandboxError({kind: 'unavailable'}).
- Default deny overlay covers ~/.ssh, ~/.aws, ~/.config/{gcloud,op,gh},
  ~/.kube, ~/.docker, ~/.netrc, ~/.npmrc, ~/.pgpass, ~/.huggingface,
  and ~/Library/Application Support. Documented as incomplete in
  plans/sandbox-design.md §5.2; the v2 fix is a curated read-allowlist.
- name: 'native:macos-seatbelt' on Darwin, 'native:linux-bwrap-only'
  elsewhere — makes the bwrap-only Linux limitation legible in logs.
- Throws SandboxError({kind: 'unavailable'}) on unsupported platforms
  (Windows) with an actionable error pointing to unrestrictedSandbox or
  remoteSandbox.

Tests (test/sandbox-native.test.ts):
- Identity, FS policy (deny overlay, allowed reads/writes), fetch policy.
- Lifecycle: re-construction after dispose, concurrent-exec rejection.
- Real OS sandbox integration tests (skipped on unsupported platforms):
  basic echo, /etc/sudoers blocked, writes inside cwd allowed.

No default change for Horton/Worker — they still use unrestrictedSandbox.
PR 6d will flip the default and add the Horton home-as-cwd fix.

Also: write-tool test updated to compare canonical (realpath-resolved)
paths in readSet, matching PR 6a's symlink-safety semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds remoteSandbox(), a Sandbox provider that delegates host isolation to a
remote workspace (microVM/container) at a SaaS provider. v1 ships an E2B
adapter; additional providers (Vercel, Daytona) are mechanical to add via
the RemoteSandboxClient interface.

Architecture:
- src/sandbox/remote/types.ts: RemoteSandboxClient interface — the narrow
  contract each provider adapter implements (exec, readFile, writeFile,
  mkdir, kill).
- src/sandbox/remote/e2b.ts: createE2BClient and adaptE2B. Dynamically
  imports the 'e2b' package so it remains an *optional peer dependency*.
  Customers using the remote provider install e2b separately; no install
  cost for everyone else.
- src/sandbox/remote.ts: provider-neutral remoteSandbox factory and the
  RemoteSandbox class implementing the Sandbox interface. FS paths are
  VM-rooted (default cwd '/work'). Writes outside the working directory
  are rejected at the TS layer. dispose() calls client.kill() once;
  subsequent operations throw SandboxError({kind:'runtime'}).

The 'client' opt accepts a pre-constructed RemoteSandboxClient, used by
tests (a fake client tracks all calls and serves an in-memory FS) and by
customers who want to wrap the provider SDK with retry/observability
before handing it to us.

sandbox.fetch() runs in the host Node process with a TS-level hostname
allowlist — *not* inside the VM. Documented caveat: to route outbound
traffic through the VM, use sandbox.exec('curl ...'). v1.1 may add a
VM-routed fetch.

Tests (test/sandbox-remote.test.ts, 9 cases):
- Identity (name reflects provider).
- exec delegation with default + override cwd.
- writeFile/readFile roundtrip; writeFile outside cwd rejected.
- mkdir delegation, including recursive walk.
- fetch hostname allowlist rejection.
- dispose calls kill exactly once even on repeat.
- Unknown provider name throws SandboxError({kind:'unavailable'}).

No real e2b account/SDK is needed for the test suite — all tests use the
in-memory fake client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(PR 6d)

Wires the native sandbox in as the default for built-in entities (Horton,
Worker) on macOS and Linux. Behavior change: LLM-driven bash/read/write/
edit/fetch_url tools now run inside Seatbelt (macOS) or bubblewrap (Linux)
by default, with the env-scrubbing + symlink-safety from PR 6a and the
default deny overlay from PR 6b.

- New: src/sandbox/default.ts — chooseDefaultSandbox(workingDirectory, env?)
  helper. Picks nativeSandbox when SandboxManager.isSupportedPlatform()
  returns true; otherwise unrestrictedSandbox.
- Panic-revert env switch: ELECTRIC_AGENTS_UNRESTRICTED=1 (also accepts
  'true'/'yes'/'on', case-insensitive) forces unrestrictedSandbox on any
  platform. Documented as the emergency lever when the native engine
  misbehaves; not promoted in customer-facing docs.
- Horton and Worker handlers replace their direct unrestrictedSandbox
  construction with a chooseDefaultSandbox call. No other change to the
  handler logic; the try/finally dispose pattern from PR 6a stays.

Tests (test/sandbox-default.test.ts, 5 cases):
- Native chosen on supported platforms.
- ELECTRIC_AGENTS_UNRESTRICTED=1 forces unrestricted.
- Case-insensitive truthy values (true, yes, on) all force unrestricted.
- Unrestricted picked when isNativeSupported() returns false (Windows
  shape via the testing override).
- ELECTRIC_AGENTS_UNRESTRICTED=0 does NOT trigger the panic switch.

The agents-desktop home-as-cwd fix (main.ts:1939 'app.getPath(home)'
fallback) is deferred to a separate, smaller desktop PR — it's a UX
change with its own implications.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egatives

Closes two test-coverage gaps that surfaced during PR 6a-6d review.

sandbox-conformance.test.ts (20 cases):
- Parameterizes a single set of scenarios over unrestricted, native (real
  OS sandbox, gated by SandboxManager.isSupportedPlatform), and remote
  (driven by an in-memory fake matching RemoteSandboxClient).
- Asserts the cross-provider contract: writeFile+readFile roundtrip, exec
  returns an exitCode, dispose is safe, name/workingDirectory exposed,
  readFile ENOENT propagates.
- Encodes the *deliberate* semantic difference: writeFile outside cwd
  rejects for native/remote (policy-bearing providers) but succeeds for
  unrestricted (which delegates to node:fs — path security lives in the
  tool layer's resolveSafePath helper).
- Symlink-escape sub-suite for non-remote providers documents that
  unrestricted does not block symlinks at the sandbox layer (tool layer
  handles it) while native does.

sandbox-native-os.test.ts (5 cases, real OS sandbox only):
- bash does not inherit arbitrary parent env vars (closes the
  __SANDBOX_OS_TEST_SECRET__ exfil path via the OS sandbox).
- bash cannot write outside the working directory at the OS level.
- bash cannot follow a symlink whose target is in the default deny
  overlay. Comments explicitly note the v1 denylist's limitation:
  symlinks to arbitrary /tmp paths *are* readable (option 1 in
  plans/sandbox-design.md §5.2); only paths inside the deny set are
  blocked. v2 read-allowlist would change this.
- bash with no allowedHosts cannot reach the network (verifies
  https://1.1.1.1 is refused).
- readFile through the TS adapter denies known credential paths under
  home (~/.ssh, ~/.aws, ~/.config/gcloud).

Coverage gap honest-status after this commit:
- remoteSandbox against the real E2B SDK is still untested (needs an
  account). adaptE2B's type translations could drift without us noticing.
- Linux bwrap path is not exercised in CI by this machine (macOS dev env).
- Horton/Worker full integration through a fake LLM is blocked by the
  pre-existing better-sqlite3 missing-module error in packages/agents.

Test totals: 78 sandbox tests, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The e2b peer dep added in PR 6c was missing from the lockfile (an earlier
checkout reverted the install change). This commit lands the lock entries
for e2b@2.21.0 and its transitive deps so a fresh pnpm install resolves
consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three integration adjustments after rebasing on origin/main:

- Delete packages/agents-runtime/test/tool-path-symlink.test.ts. This was
  a characterization test from #4354 that documents pre-fix symlink-escape
  behavior with an explicit "update when realpath resolution lands" note.
  PR 6a's resolveSafePath helper is that fix; the file's expectations are
  now contradicted by sandbox-tool-symlink-safety.test.ts.

- Trim packages/agents-runtime/test/bash-tool.test.ts: the two
  characterization tests from #4354 that documented the bash env-leak bug
  are removed. PR 6a fixed that bug; sandbox-tool-refactor.test.ts has
  the corresponding assertion ('does not forward arbitrary process.env to
  children'). The first test in the file (cwd + HOME exposure) stays.

- Migrate packages/agents-runtime/test/fetch-url-ssrf.test.ts to the new
  createFetchUrlTool(sandbox, opts) signature. The assertions still hold
  for unrestrictedSandbox (NetPolicy SSRF protection is deferred); the
  test is now explicit about that scope.

- Remove the @ts-expect-error directive on the dynamic e2b import in
  src/sandbox/remote/e2b.ts. With e2b now in the lockfile, TS resolves
  the package and the directive is unused.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@msfstef msfstef force-pushed the msfstef/agent-sandboxing-1 branch from c6a9ffc to 91303cc Compare May 20, 2026 09:02
@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 76.18504% with 417 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.63%. Comparing base (1ab43f5) to head (4beddcf).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
packages/agents-runtime/src/sandbox/remote/e2b.ts 0.64% 153 Missing ⚠️
packages/agents-runtime/src/sandbox/docker.ts 85.14% 86 Missing ⚠️
...ackages/agents-runtime/src/sandbox/docker/proxy.ts 55.08% 75 Missing ⚠️
...ackages/agents-runtime/src/sandbox/unrestricted.ts 84.83% 32 Missing ⚠️
packages/agents-runtime/src/sandbox/docker/fs.ts 88.84% 31 Missing ⚠️
packages/agents-runtime/src/sandbox/remote.ts 90.43% 20 Missing ⚠️
...ckages/agents-runtime/src/sandbox/docker/loader.ts 72.72% 9 Missing ⚠️
packages/agents-runtime/src/tools/bash.ts 75.00% 6 Missing ⚠️
packages/agents/src/agents/worker.ts 55.55% 4 Missing ⚠️
packages/agents/src/agents/horton.ts 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4369      +/-   ##
==========================================
+ Coverage   55.84%   60.63%   +4.78%     
==========================================
  Files         245      302      +57     
  Lines       24847    30248    +5401     
  Branches     6878     8137    +1259     
==========================================
+ Hits        13876    18340    +4464     
- Misses      10957    11891     +934     
- Partials       14       17       +3     
Flag Coverage Δ
packages/agents 71.05% <66.66%> (+0.12%) ⬆️
packages/agents-mcp 77.54% <ø> (?)
packages/agents-runtime 80.47% <76.26%> (-0.81%) ⬇️
packages/agents-server 73.93% <ø> (ø)
packages/agents-server-ui 6.66% <ø> (ø)
packages/electric-ax 42.61% <ø> (ø)
packages/experimental 87.73% <ø> (?)
packages/react-hooks 86.48% <ø> (?)
packages/start 82.83% <ø> (?)
packages/typescript-client 94.39% <ø> (?)
packages/y-electric 56.05% <ø> (?)
typescript 60.63% <76.18%> (+4.78%) ⬆️
unit-tests 60.63% <76.18%> (+4.78%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

msfstef and others added 11 commits May 20, 2026 12:33
…proxy

Previously sandbox.fetch() on nativeSandbox enforced its hostname policy
via a manual `Set<string>.has(url.hostname)` exact-match check that
duplicated (badly) what `@anthropic-ai/sandbox-runtime`'s HTTP proxy
already does for subprocess egress.

The two pathways now share one policy enforcer (the library's proxy)
with consistent semantics:
- Wildcard patterns (e.g. `*.example.com`)
- IP canonicalization (e.g. `2852039166` → `169.254.169.254`)
- Denied-domains taking precedence over allowed
- Control-character host rejection
- IPv6 zone-ID payload rejection

Implementation:
- Add `undici` as a direct dependency for `ProxyAgent`.
- On `ensureInitialized`, read `SandboxManager.getProxyPort()` and build
  a `ProxyAgent('http://127.0.0.1:PORT')` dispatcher.
- `sandbox.fetch()` passes `{ dispatcher }` to global fetch so undici
  routes the request through the same proxy that gates `bash`-emitted
  egress.
- A 403 with `x-srt-denied` header or undici proxy-refusal error is
  translated to `SandboxError({kind: 'policy'})` so callers still see a
  consistent policy-rejection shape.
- `dispatcher.close()` runs in `dispose()` to release sockets.

Linux Unix-socket gap documented: `getLinuxHttpSocketPath()` returns
a Unix socket on Linux which `ProxyAgent` does not consume directly.
For now sandbox.fetch on Linux falls back to direct (non-proxy)
network access. exec-driven egress on Linux still routes through the
proxy correctly via the bind-mounted unix socket. A custom undici
dispatcher targeting the unix socket would close this; tracked for a
follow-up.

Tests (test/sandbox-native-proxy-fetch.test.ts, 3 cases):
- Allowed host through the proxy reaches a local HTTP server.
- Disallowed host is refused with SandboxError({kind:'policy'}).
- Wildcard patterns in allowedHosts (e.g. '*.example.com') are accepted
  by the library's validator and config — confirming we no longer
  shadow the matcher with our own naïve exact-match check.

Existing tests in sandbox-native.test.ts now exercise the proxy
rejection path end-to-end (a real undici/proxy round-trip), not a
synthetic Set.has() check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e kill

Two CI-surfaced bugs:

1. **nativeSandbox crashed on Linux runners without `bubblewrap` installed.**
   `SandboxManager.isSupportedPlatform()` returns true on any Linux but
   the actual `initialize()` call throws when bwrap isn't on PATH. Tests
   gated on `isSupportedPlatform()` ran on the Linux runner and exploded
   instead of skipping.

   Fix: in the nativeSandbox factory and in chooseDefaultSandbox, call
   `SandboxManager.checkDependencies()` and surface a missing dependency
   as `SandboxError({kind: 'unavailable'})` *before* `initialize()`.
   The test gates (sandbox-native, sandbox-native-os,
   sandbox-native-proxy-fetch, sandbox-conformance, sandbox-default)
   also use `checkDependencies()` so they skip cleanly on hosts where
   the runtime tools aren't installed.

   chooseDefaultSandbox now falls back to unrestrictedSandbox on a
   Linux host without bwrap rather than throwing — keeps the "default
   to native, panic to unrestricted" contract from PR 6d intact even
   when the native engine is unusable.

2. **timeoutMs test hung on Linux until vitest's 5s default fired.**
   `spawn('sh', ['-c', 'sleep 5'])` then `child.kill('SIGTERM')` kills
   `sh` immediately but leaves `sleep` orphaned, still holding the
   stdio pipes — the `close` event waits for the grandchild to finish
   naturally. macOS happened to terminate the tree differently so the
   bug only surfaced on the Ubuntu runner.

   Fix: spawn with `detached: true` to create a new process group, then
   send the signal to `-pid` so the whole tree dies. SIGTERM first,
   escalating to SIGKILL after 500ms if anything is still hanging on.
   Applied symmetrically in unrestricted.ts and native.ts.

Also: adds the missing changeset entry (`Check Changeset` CI failure).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The factory's eager `checkDependencies()` (added in the previous commit
to surface 'unavailable' clearly to users) means that ALL tests in
sandbox-native.test.ts crash on a host without bubblewrap, not just the
ones that actually exec under the OS sandbox. The CI Linux runner
exposed this — the inner `identity`, `filesystem policy`, `fetch
policy`, and `lifecycle` describes were previously running on the lazy
assumption and need the outer gate now.

Coverage of the same TS-policy assertions remains on unsupported hosts
via sandbox-conformance.test.ts's unrestricted + fake-remote providers,
which are gated per-provider.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ AbortSignal

Additive interface changes — no rename of exec, no namespacing. Brings the
adapter contract in line with the May 2026 industry LCD (Vercel/Cloudflare/
E2B/Daytona/ComputeSDK) so future adapters and tools can rely on a broader
filesystem surface.

- types.ts: add readdir/exists/remove/stat to Sandbox, DirEntry/FileStat,
  signal?:AbortSignal on SandboxExecOpts.
- unrestricted/native: implement new methods. AbortSignal escalates SIGTERM
  then SIGKILL through the existing kill-tree path. FS errors normalized to
  SandboxError('runtime') at the adapter boundary so conformance assertions
  are stable across providers.
- remote + RemoteSandboxClient: extend contract; E2B adapter prefers
  files.list/exists/remove/getInfo when available and falls back to shell
  commands (BusyBox/GNU stat compatible) for older SDK shapes.
- write tool: switch read-before-write existence probe to sandbox.exists()
  rather than ENOENT detection on readFile.
- conformance: scenarios for exists/stat/readdir/remove/remove-recursive and
  an exec(AbortSignal) abort case (skipped semantically for the in-memory
  remote fake).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review pass on the previous commit flagged that throwing on policy-denied
paths in exists() drifts from the 2026 LCD semantics shared by Vercel,
Cloudflare, and E2B — they all treat exists() as a safe-probe that returns
false in both the missing and denied cases. Flipping native to match;
unrestricted has no policy boundary so its behavior is unchanged.

Also adds SandboxExecResult.aborted so callers can disambiguate caller-side
AbortSignal cancellation from naturally-delivered signals and from timeoutMs
expiry — the OS signal field is unreliable for that purpose under musl /
on Alpine builds.

E2B shell fallbacks hardened: readdir now uses `find -print0` to be
newline-safe and preserves the file/dir/symlink distinction via `%y`;
stat() validates a 3-field output structure before parsing instead of
unioning two mutually-incompatible formats; failure paths synthesize
classified errno codes (ENOENT/EACCES/EIO) so SandboxError messages are
never blank.

Conformance gains four cases: stat on missing path, remove on missing
path, remove on non-empty dir without recursive, and a pre-aborted exec
that returns immediately. The mid-flight abort test now asserts the new
aborted boolean. Remote (fake) is skipIf'd on abort cases since the fake
client has no abort plumbing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…licy)

Two interface additions that line the Sandbox contract up with the 2026
LCD (Vercel/Cloudflare/E2B publish both as first-class methods).

- types.ts: NetworkPolicy discriminated union (allow-all/deny-all/allowlist),
  getUrl({port, protocol}) and updateNetworkPolicy(policy) on Sandbox.
- unrestricted: getUrl returns loopback URL; updateNetworkPolicy records
  policy without enforcement (documented no-op — unrestricted has no
  boundary).
- native: getUrl returns loopback (Seatbelt + bwrap both leave 127.0.0.1
  reachable from inside). updateNetworkPolicy wires SandboxManager
  .updateConfig() — the library API confirmed at sandbox-manager.d.ts:36
  exposes mid-session reconfiguration of the MITM proxy's allowedDomains.
  NativeSandboxOpts.allowedHosts is preserved but deprecated in favor of
  initialNetworkPolicy; the latter wins when both supplied.
- remote: TS-side allowedHosts Set replaced with a NetworkPolicy state
  machine; updateNetworkPolicy reconfigures the host-process gate and
  logs once that VM-side egress is not reconfigured (E2B doesn't expose
  the necessary API). getUrl delegates to a new optional client.getUrl()
  hook so the contract remains pluggable.
- RemoteSandboxClient: add optional getUrl({port, protocol}); falls back
  to SandboxError('unavailable') when absent.
- conformance: add ProviderCapabilities descriptor (supportsAbort/
  supportsRealGetUrl/enforcesNetworkPolicy) so per-provider quirks become
  declarative instead of name-string branching. Two new scenarios:
  getUrl returns a port-bearing URL (or rejects unavailable), and
  updateNetworkPolicy(deny-all) flips subsequent fetch to policy-rejected.
  Abort-skip predicates migrated to capability checks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A fourth sandbox provider that runs each Sandbox instance inside a
dedicated Docker container. Targets local development and self-hosted
deployments where neither macOS Seatbelt nor a paid microVM provider is
appropriate. Built on dockerode (added as an optional peer dependency).

Hardening
---------
The HostConfig is hardcoded and not overrideable from DockerSandboxOpts:

  - CapDrop: ['ALL'], CapAdd: []   — no caps means mount/chroot/su fail
  - SecurityOpt: ['no-new-privileges:true']
  - Privileged: false
  - PidsLimit / Memory / NanoCpus  — sensible defaults, opt-out via opts.resources
  - Ulimits: nofile=2048, nproc=1024
  - IpcMode: 'none'
  - AutoRemove: true (ephemeral)
  - NetworkMode: 'none' by default; switches to 'bridge' only when an
    allowlist or exposedPorts is set
  - Default image pinned by digest (node:20-alpine multi-arch manifest)
  - extraMounts entries enforce readOnly: true at the type level and
    reject any hostPath matching /docker\.sock$/ at runtime so the docker
    socket cannot be exposed back to sandboxed code.

ReadonlyRootfs is intentionally *not* enabled by default — dockerode's
putArchive operates at the storage-driver layer and gets rejected on a
read-only rootfs even when the target is a tmpfs/volume. Operators who
want it must drive all writes via sandbox.exec.

Filesystem
----------
src/sandbox/docker/fs.ts provides putFile/getFile via dockerode's
putArchive/getArchive (small in-memory tar writer, no extra dep) and
exec-based readdir/exists/remove/stat. readdir does three POSIX
`find -type X -print0` passes so it works on both GNU find and BusyBox
(alpine) — newline-safe via NUL delimiting.

Network
-------
src/sandbox/docker/proxy.ts ships a minimal HTTP/HTTPS forward proxy
(~150 LoC, node:http + CONNECT) with a dynamic allowlist that
updateNetworkPolicy mutates in place. Container env is set with
HTTP_PROXY=http://host.docker.internal:<port> so tools that respect
proxy env vars (curl, python requests, undici with ProxyAgent, browser
clients) route through it. Programs that bypass HTTP_PROXY (raw TCP,
Node's built-in fetch without explicit setGlobalDispatcher) leak
through Docker's NAT — documented gap; v2 needs a sidecar netns / nft
filter for full sealing.

Lifecycle
---------
One long-lived container per Sandbox instance (PID 1 = sleep keepalive).
Each exec uses container.exec() with stream demux. timeoutMs and
AbortSignal both wire to a kill-everything-but-PID-1 helper that
enumerates /proc and SIGKILLs to side-step a dockerode stream-close
race. dispose() removes the container even if AutoRemove already fired.

Testing
-------
- test/helpers/docker-probe.ts: top-level await isDockerAvailable() so
  describe.skipIf works at import time. Tests skip cleanly with a
  warning when the daemon is unreachable.
- test/sandbox-docker.test.ts: 13 integration tests against real
  Docker — roundtrip, hardening (caps, docker.sock, mount, chroot),
  timeouts, AbortSignal, port forwarding, policy enforcement at the
  proxy boundary, leftover-container sweep. Runs in <10s on a warm
  machine.
- test/sandbox-docker-smoke.test.ts: ad-hoc smoke probes that exercise
  CapEff/CapPrm/CapBnd ≡ 0, container /etc/passwd isolation, /Users
  invisibility, raw-CONNECT proxy allow vs deny.

All 132 sandbox tests pass on macOS + OrbStack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cker

Add `KNOWN_ADAPTERS` const to the public sandbox barrel and assert in the
conformance suite that every slug is exercised by exactly one provider —
adding a new adapter without registering it in the conformance suite
will now fail CI.

Wire the docker adapter into the cross-provider conformance loop with
a `dockerAvailable`-gated `enabled` flag. The describe gating relies on
the existing top-level await probe at `test/helpers/docker-probe.ts`
(skips clean on machines without Docker, no CI workflow change needed —
Ubuntu runners have Docker pre-installed, macOS doesn't and skips
gracefully).

Replace remaining provider-name string equality checks (`provider.name
=== 'remote (fake)'`, `=== 'unrestricted'`) with two declarative axes
on `ProviderFactory`:
  - `adapter: KnownAdapter` — KNOWN_ADAPTERS slug, used by the symlink
    test branch and the unrestricted-no-policy branch
  - `outsideKind: 'host-tempdir' | 'etc-passwd'` — controls which path
    the "writeFile outside the working directory" probe uses, replacing
    the brittle name match

73 conformance scenarios now run against the docker provider (in ~12s
on a warm machine, OrbStack on M-series). All 151 sandbox tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…low-all

Address the high-severity findings from the final security review.

Docker read-side policy (R1)
----------------------------
readdir, stat, and readFile now call assertReadable() to enforce the
workingDirectory boundary, matching native's behavior. Previously
docker.readdir('/etc') would silently succeed and enumerate the
container's filesystem — only writeFile / mkdir / remove were gated.
exists() also goes through the new isReadable() helper but returns
false on denial (safe-probe semantics, consistent with the rest of
the adapter set).

Proxy SSRF guards (R2)
----------------------
The docker allowlist proxy now refuses CONNECT and plain-HTTP requests
to literal RFC1918 (10/8, 172.16/12, 192.168/16, 100.64/10 CGNAT),
loopback (127/8), unspecified (0/8), link-local + cloud metadata
(169.254/16, 169.254.169.254 AWS/GCP), IPv6 loopback (::1), and IPv6
link-local / unique-local (fe80::, fc::, fd::). The guard runs after
the hostname allowlist and rejects regardless of the policy decision —
even if a user explicitly allows 169.254.169.254 they cannot reach it.

Plain-HTTP proxying now overrides the caller-supplied Host header with
the target's authority, so an attacker can no longer split an
allowlisted absolute URL from a different vhost via the Host header.
proxy-authorization and proxy-connection hop-by-hop headers are also
stripped before forwarding.

Port bindings bind to 127.0.0.1 only (was 0.0.0.0) so dev-machine
sandboxes don't expose services across the LAN.

Native allow-all (correctness)
------------------------------
The upstream @anthropic-ai/sandbox-runtime config validator rejects
bare '*' in network.allowedDomains as "too broad". Our previous
policyToAllowedDomains returned ['*'] for mode:'allow-all', which would
have silently broken init. Throw SandboxError('unavailable') with a
pointer to unrestrictedSandbox instead.

New conformance — sandbox-docker.test.ts:
- "read-side methods enforce the working directory boundary": asserts
  readFile / readdir / stat throw policy and exists() returns false
  for /etc paths.

New smoke — sandbox-docker-smoke.test.ts: even when explicitly allowed,
CONNECT to 169.254.169.254, 127.0.0.1, and 10.0.0.1 all return 403.

149 + 2 new tests pass; conformance still green across all 4 adapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The underlying SandboxManager from @anthropic-ai/sandbox-runtime is a
process-global singleton: two nativeSandbox instances with different
working directories conflict and throw SandboxError('unavailable'). The
agents-runtime hosts many agent entities concurrently, each with its own
working directory, so this constraint is incompatible with the product.

dockerSandbox now covers the strong-isolation use case (no singleton,
multi-instance safe). unrestrictedSandbox + tool-layer policy (env
scrubbing, symlink resolution, fetch SSRF guards) covers the dev case.

- Delete src/sandbox/native.ts and the three native test files.
- Drop 'native' from KNOWN_ADAPTERS; drop nativeSandbox /
  NativeSandboxOpts / ChooseDefaultSandboxOpts exports.
- Simplify chooseDefaultSandbox to always return unrestrictedSandbox.
  Remove the ELECTRIC_AGENTS_UNRESTRICTED env var — it only existed to
  revert from native to unrestricted, which is now the default.
- Drop the native provider entry from the conformance suite; the
  KNOWN_ADAPTERS round-trip assertion now covers unrestricted/remote/docker.
- Drop @anthropic-ai/sandbox-runtime from dependencies; regenerate
  pnpm-lock.yaml.
- Update sandbox-design.md and the changeset to reflect the new lineup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Electric Agents Desktop Builds

Build artifacts for commit 4beddcf.

Platform Status Artifact
macOS Apple Silicon Failed Unavailable
macOS Intel Failed Unavailable
Windows x64 Failed Unavailable
Linux x64 Failed Unavailable

Workflow run

@netlify
Copy link
Copy Markdown

netlify Bot commented May 20, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 4beddcf
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/6a0dce2d7ea3850008c1c24d
😎 Deploy Preview https://deploy-preview-4369--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant