Skip to content

Add browser computer-use toolset#1589

Draft
cdreetz wants to merge 1 commit into
mainfrom
browser-toolset
Draft

Add browser computer-use toolset#1589
cdreetz wants to merge 1 commit into
mainfrom
browser-toolset

Conversation

@cdreetz

@cdreetz cdreetz commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

A reusable v1 Toolset exposing the Claude computer-use action space over a raw Chrome DevTools Protocol browser, with a pluggable backend so any environment can drive a real browser.

  • verifiers/v1/toolsets/browser: raw-CDP client (session-id attach + re-attach-on-detach recovery), pluggable backends (Browserbase REST + generic CDP endpoint), the unified computer tool (computer_20250124 action enum) plus decomposed click/type/key/scroll/navigate tools, and xdotool-style key mapping. Screenshots are returned as image content parts.
  • environments/browser_toolset_example: a minimal v1 env demoing the toolset on a few web tasks, scored by a model-borrowing LLM judge.
  • pyproject: add websockets to the [browser] extra.
  • .semgrep: add the example env to the load_environment canonical-shim exclude list (same mechanism as other custom-loader envs).
  • tests: keymap, backends (HTTP faked), session/tools (CDP faked, incl. detach recovery), and runtime integration (schema hides session, object injection, dispatch).

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

Note

Add browser computer-use toolset with Browserbase and CDP backends

  • Adds a new verifiers/v1/toolsets/browser package implementing browser control over CDP, supporting both Browserbase and direct CDP (ws:// or http://) backends.
  • Provides two tool surfaces: a single computer tool (xdotool-style action space) and a set of decomposed per-action tools; mode is selectable via HarnessConfig.
  • Adds BrowserSession for high-level browser control (navigate, click, scroll, type, screenshot) with automatic page re-attach on detach errors.
  • Adds a browser_toolset_example environment with an LLM judge reward (task_success) that scores final answers 0–1.
  • Lazy-loads browser symbols from verifiers.v1.toolsets and raises a targeted ImportError prompting pip install verifiers[browser] when websockets is missing; websockets>=12.0 is added to the browser extra.
📊 Macroscope summarized 0124105. 14 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted

🗂️ Filtered Issues

No issues evaluated.

A reusable v1 Toolset exposing the Claude computer-use action space over a
raw Chrome DevTools Protocol browser, with a pluggable backend so any
environment can drive a real browser.

- verifiers/v1/toolsets/browser: raw-CDP client (session-id attach +
  re-attach-on-detach recovery), pluggable backends (Browserbase REST +
  generic CDP endpoint), the unified `computer` tool (computer_20250124
  action enum) plus decomposed click/type/key/scroll/navigate tools, and
  xdotool-style key mapping. Screenshots are returned as image content parts.
- environments/browser_toolset_example: a minimal v1 env demoing the toolset
  on a few web tasks, scored by a model-borrowing LLM judge.
- pyproject: add websockets to the [browser] extra.
- .semgrep: add the example env to the load_environment canonical-shim
  exclude list (same mechanism as other custom-loader envs).
- tests: keymap, backends (HTTP faked), session/tools (CDP faked, incl.
  detach recovery), and runtime integration (schema hides session, object
  injection, dispatch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines +77 to +78
if char.isalpha():
return KeyDef(char, f"Key{char.upper()}", ord(char.upper()))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low browser/keymap.py:77

For non-ASCII alphabetic characters like 'é', char.isalpha() returns True, so the function generates an invalid CDP code like "KeyÉ" instead of falling through to the generic case with an empty code string. CDP only recognizes `

-    if char.isalpha():
-        return KeyDef(char, f"Key{char.upper()}", ord(char.upper()))
+    if char.isalpha() and char.isascii():
+        return KeyDef(char, f"Key{char.upper()}", ord(char.upper()))
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/toolsets/browser/keymap.py around lines 77-78:

For non-ASCII alphabetic characters like `'é'`, `char.isalpha()` returns `True`, so the function generates an invalid CDP code like `"KeyÉ"` instead of falling through to the generic case with an empty code string. CDP only recognizes `

Evidence trail:
verifiers/v1/toolsets/browser/keymap.py lines 74-81 (REVIEWED_COMMIT): `_printable_keydef` function with `char.isalpha()` check at line 77 and `Key{char.upper()}` code generation at line 78. verifiers/v1/toolsets/browser/session.py lines 237-251 (REVIEWED_COMMIT): `press_key` sends `key.code` directly to CDP's `Input.dispatchKeyEvent` at line 250. Python docs: `str.isalpha()` returns True for Unicode alphabetic characters. W3C UI Events KeyboardEvent.code spec: only KeyA-KeyZ are valid Key* code values.

else []
)
if pages:
target_id = str(pages[-1].get("targetId"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High browser/session.py:71

At line 71, str(pages[-1].get("targetId")) converts None to the string "None", so the null check at line 72 (if target_id is None:) never succeeds. When a page lacks a targetId, the code attempts to attach to target "None" instead of creating a new target. Consider only wrapping in str() when the value is not None, or check for None before the conversion.

Suggested change
target_id = str(pages[-1].get("targetId"))
target_id = pages[-1].get("targetId")
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/toolsets/browser/session.py around line 71:

At line 71, `str(pages[-1].get("targetId"))` converts `None` to the string `"None"`, so the null check at line 72 (`if target_id is None:`) never succeeds. When a page lacks a `targetId`, the code attempts to attach to target `"None"` instead of creating a new target. Consider only wrapping in `str()` when the value is not `None`, or check for `None` before the conversion.

Evidence trail:
verifiers/v1/toolsets/browser/session.py lines 61, 70-79 at REVIEWED_COMMIT. Line 61: `target_id: str | None = None`. Line 71: `target_id = str(pages[-1].get("targetId"))` — `dict.get()` returns None if key absent, `str(None)` produces `"None"`. Line 72: `if target_id is None:` — this is an identity check against `None`, which will always be False when `target_id` is the string `"None"`.

Comment on lines +59 to +66
future.set_result(message)
except asyncio.CancelledError:
raise
except Exception as exc: # noqa: BLE001 - propagate to waiters
for future in self._pending.values():
if not future.done():
future.set_exception(exc)
self._pending.clear()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium browser/cdp.py:59

When close() cancels the reader task while a send() call is awaiting its response, the asyncio.CancelledError is re-raised on line 61 without resolving the futures in self._pending. This causes the await future in send() to hang indefinitely. Consider catching CancelledError separately and cancelling all pending futures before re-raising.

-        except asyncio.CancelledError:
-            raise
-        except Exception as exc:  # noqa: BLE001 - propagate to waiters
-            for future in self._pending.values():
-                if not future.done():
-                    future.set_exception(exc)
-            self._pending.clear()
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/toolsets/browser/cdp.py around lines 59-66:

When `close()` cancels the reader task while a `send()` call is awaiting its response, the `asyncio.CancelledError` is re-raised on line 61 without resolving the futures in `self._pending`. This causes the `await future` in `send()` to hang indefinitely. Consider catching `CancelledError` separately and cancelling all pending futures before re-raising.

Evidence trail:
verifiers/v1/toolsets/browser/cdp.py lines 49-66 (`_read_loop`): `CancelledError` (line 60-61) re-raises without cleaning up `self._pending`; `Exception` path (lines 62-66) resolves all pending futures. Lines 68-90 (`send`): `await future` on line 86 will hang if the future is never resolved. Lines 92-102 (`close`): cancels reader task (line 94), catches `CancelledError` (line 97-98), but never touches `self._pending`.

Comment on lines +48 to +51
handle = await self.backend.create()
self._provider_session_id = handle.session_id
self._client = CDPClient(handle.cdp_ws_url)
await self._client.connect()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High browser/session.py:48

If connect() raises an exception after self._client is assigned, _client remains set to a non-None but unconnected CDPClient. The next call to start() sees self._client is not None and returns early, leaving the session broken. Consider moving the assignment after connect() succeeds, or resetting _client = None in an exception handler.

        handle = await self.backend.create()
         self._provider_session_id = handle.session_id
-        self._client = CDPClient(handle.cdp_ws_url)
         await self._client.connect()
+        self._client = CDPClient(handle.cdp_ws_url)
         await self._attach_page(create=True)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file @verifiers/v1/toolsets/browser/session.py around lines 48-51:

If `connect()` raises an exception after `self._client` is assigned, `_client` remains set to a non-None but unconnected `CDPClient`. The next call to `start()` sees `self._client is not None` and returns early, leaving the session broken. Consider moving the assignment after `connect()` succeeds, or resetting `_client = None` in an exception handler.

Evidence trail:
verifiers/v1/toolsets/browser/session.py lines 44-55 at REVIEWED_COMMIT: line 50 assigns `self._client = CDPClient(...)`, line 51 calls `await self._client.connect()`, line 46 checks `if self._client is not None: return self`. No exception handler exists between lines 50-51 to reset `_client = None` on failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant