Local control: in-SDK sidecar + desktop/browser drivers#161
Conversation
Add a deny-by-default CapabilityPolicy that gates which command names a local browser/desktop driver will execute (shell, arbitrary scripts, cookies/storage, and secrets are opt-in), a name-keyed driver registry so one package can host many drivers, and the command-name contract mirroring the hai_drivers interfaces. Co-authored-by: Cursor <cursoragent@cursor.com>
Long-polling sidecar (single-owner lease, connect-time drain, command_uid replay cache + echo), capability policy (deny-by-default with opt-ins), driver registry, pyautogui desktop driver and Selenium browser driver. Co-authored-by: Cursor <cursoragent@cursor.com>
…e open Co-authored-by: Cursor <cursoragent@cursor.com>
…+ config knobs Policy now derives allowed commands from the driver's public methods minus the danger sets (shell/scripts/cookies/secrets), removing the hand-maintained method lists that duplicated the drivers. Replace the driver registry with a direct lazy factory and trim SidecarConfig to essentials. Co-authored-by: Cursor <cursoragent@cursor.com>
- serialize_result recurses into dicts (fixes get_observation_snapshot crash) - browser: reject file/chrome/js/data URLs; real markdown via markdownify; guard get_logs on CDP attach - desktop: run_command merges os.environ instead of replacing it - sidecar: interrupt long-poll on stop, reconnect on 404, back off on 429, tear down driver on shutdown - drop dead dedup cache + racy drain-on-connect (server delivers one cmd at a time, fresh uid, no replay) - split drivers into desktop/ and browser/ subpackages Co-authored-by: Cursor <cursoragent@cursor.com>
…constants Co-authored-by: Cursor <cursoragent@cursor.com>
…down - vendor h.js + defuddle.full.js; execute_script auto-injects hjs with iframe guard - extract_markdown -> Defuddle (main-content, in-browser) - get_viewport_html -> hjs_0x2a.collectViewportHTML() (screen-bounds pruned DOM) - viewport_markdown -> collectViewportHTML then CustomMarkdownify (markdownify), full-page fallback - ship js assets via wheel force-include Co-authored-by: Cursor <cursoragent@cursor.com>
…l` CLI Client now injects the local session_id for any source:"local" environment on create_agent/update_agent/patch_agent and on inline-agent create_session, so callers only pass source:"local" and the env id. Adds `hai local browser` and `hai local desktop` to run the sidecar from the CLI. Co-authored-by: Cursor <cursoragent@cursor.com>
…e, typed envs) - enter_secret clicks (x, y) to focus the target before typing, so the secret lands in the field the agent pointed at instead of stale focus. - get_tab_title honors tab_id by switching, reading, and restoring the tab. - close_active_tab guards against an empty handle list after closing the last tab. - localize_environments/localize_agent now wire source:"local" envs whether they arrive as dicts or typed Pydantic models. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
| if not allow_cookies: | ||
| allowed -= _COOKIES | ||
| if not allow_secrets: | ||
| allowed -= _SECRETS |
There was a problem hiding this comment.
Script policy bypass via helpers
High Severity
With allow_scripts disabled, CapabilityPolicy only removes execute_script, but other allowed browser driver commands such as get_viewport_html, extract_markdown, scroll_page, and observation_bundle still execute page JavaScript internally, so the CLI --allow-scripts gate does not actually block script execution.
Reviewed by Cursor Bugbot for commit 0d1c561. Configure here.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ce SidecarBusyError in CLI update_agent/patch_agent take agent_name positionally; the **kwargs-only wrappers raised TypeError on a positional call. Accept *args and pass through. `hai local browser/desktop` acquired the lease inside asyncio.run, outside the guarded block, so a busy sidecar dumped a raw traceback; route it to the CLI error. Co-authored-by: Cursor <cursoragent@cursor.com>
…ingle-source capability - sidecar: cache command_uid -> result and re-post on redelivery instead of re-running side effects (a transient result-POST failure left the command pending and re-executed it on the next poll). - desktop driver: route keyboard through pyautogui (matches the Key contract and the remote executor); drop pynput, whose member names (enter/esc) diverge from the contract names (return/escape) and silently failed. - policy: walk the full MRO so inherited driver methods are gated, not just those declared on the concrete class. - config/wiring: KIND_TO_CAPABILITY is the single source; _CAPABILITIES derives from it. - pyproject: drop unused pynput; collapse the all extra to a self-reference. Co-authored-by: Cursor <cursoragent@cursor.com>
…shutdown Remove CapabilityPolicy and the --allow-* CLI flags: the GUI keystroke and script paths reach the same surface anyway, so the gate was a formality. - _dispatch rejects unknown/private names cleanly instead of crashing the poll loop - build the driver only after the machine lease is acquired (no leak on busy lease) - SIGINT/SIGTERM now stop the sidecar cooperatively so in-flight commands finish - floor the 429 backoff so Retry-After: 0 can't busy-loop - guard malformed fetch bodies so a bad json() doesn't kill the loop Co-authored-by: Cursor <cursoragent@cursor.com>
- desktop snapshot emits screenshot_b64 (str), the field ObservationSnapshot requires; the old screenshot_png key raised a validation error on every observe - release_key clears its modifier bit instead of XOR (stray release no longer flips it back on); key mask mutates only after a successful perform - CDP mouse events carry the buttons bitmask so drags register, and moves use button "none" - _run_script keeps the iframe guard on during retries so transient blocks retry - _focus_new_tab switches to the genuinely new handle, not window_handles[-1] - block chrome-extension/devtools/filesystem URL schemes - a kind-less dict env defaults to web so session_id still autowires Co-authored-by: Cursor <cursoragent@cursor.com>
| self._action_builder = ActionBuilder | ||
| self._destroyed = False | ||
| self.cursor_x = 0 | ||
| self.cursor_y = 0 |
There was a problem hiding this comment.
Mouse position never synced
Medium Severity
SeleniumWebDriver keeps cursor_x/cursor_y for CDP mouse events but initializes them to (0, 0) and never reads the browser’s actual pointer. webpage_metadata and observation_bundle expose that stale position, and click, mouse_press, and scroll can act at the wrong coordinates when no prior mouse_move_to ran.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 7345734. Configure here.
…rt.py Three single-purpose helper modules become one; the defuddle bundle is now read lazily and cached on first extract_markdown instead of at import. Co-authored-by: Cursor <cursoragent@cursor.com>
Cosmetic: module tunables read as plain UPPER_CASE. Class-private methods and driver internals keep the underscore (the dispatch firewall keys off it). Co-authored-by: Cursor <cursoragent@cursor.com>
…sktop->pyautogui_desktop Package dirs now name their implementation. CLI commands, capability strings, install extras, and class names are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
||
| def click(self, x: int, y: int, button: str = "left") -> None: | ||
| self._pyautogui.click(x=x, y=y, button=button) | ||
| self._settle_after_click() |
There was a problem hiding this comment.
Desktop click coordinate space mismatch
High Severity
get_observation_snapshot reports the cursor in screenshot pixel space (including after width downscaling via screenshot_max_width), but click, mouse_move_to, and related input helpers forward those coordinates unchanged to PyAutoGUI, which expects logical screen coordinates from _screen_size. Agents aiming from the observation image will miss clicks whenever capture dimensions differ from the stored screen size.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit db90a85. Configure here.
… cached - localize_agent/create_agent now recurse into inline subagents, so a local browser/desktop child gets its session_id (was only top-level environments) - the result cache is now LRU: a cache hit refreshes recency so an actively redelivered command_uid is not evicted and re-executed mid-retry Co-authored-by: Cursor <cursoragent@cursor.com>
Stored but never read (vestigial in the upstream driver too); removing it so the constructor doesn't advertise an option that does nothing. Co-authored-by: Cursor <cursoragent@cursor.com>
Matches the consolidated hai_drivers desktop interface (single screenshot_b64 method); the command proxy forwards screenshot_b64, so the desktop driver must expose it rather than screenshot_png_bytes. Co-authored-by: Cursor <cursoragent@cursor.com>
| def click(self, button: str = "left", delay_before_release: float = 0.05) -> None: | ||
| self.mouse_press(button=button) | ||
| time.sleep(delay_before_release) | ||
| self.mouse_release(button=button) |
There was a problem hiding this comment.
Browser sidecar click args mismatch
High Severity
The sidecar invokes driver methods by RPC name with JSON args. Desktop control uses click with x/y, and sidecar tests dispatch the same shape, but SeleniumWebDriver.click only accepts button and delay_before_release. Browser click requests carrying coordinates raise a TypeError or never move the pointer before clicking.
Reviewed by Cursor Bugbot for commit 1c267dc. Configure here.
…oor empty polls - _dispatch resolves non-callable attrs (e.g. platform property) instead of rejecting them - ensure_channel treats HTTP 409 as already-created - poll loop enforces a minimum interval when long-poll returns empty early Co-authored-by: Cursor <cursoragent@cursor.com>
…5.18 Co-authored-by: Cursor <cursoragent@cursor.com>
| raise ValueError("api_key is required (or set HAI_API_KEY)") | ||
| if not self.session_id: | ||
| self.session_id = session_id_from_environment_id(self.environment_id, self.api_key, self.capability) | ||
| return self |
There was a problem hiding this comment.
Sidecar ignores API base URL
Medium Severity
SidecarConfig fills api_key from environment variables but always keeps base_url at the EU default unless passed explicitly. A sidecar started with SidecarConfig(...) while HAI_API_BASE_URL (or a non-EU SDK client) points elsewhere will poll and post results on the wrong host, so local control never attaches to the session the client created.
Reviewed by Cursor Bugbot for commit abf44a5. Configure here.
Co-authored-by: Cursor <cursoragent@cursor.com>
| self.mouse_press("left", click_count=1) | ||
| self.mouse_release("left", click_count=1) | ||
| self.mouse_press("left", click_count=2) | ||
| self.mouse_release("left", click_count=2) |
There was a problem hiding this comment.
Double click tab focus break
Medium Severity
double_click performs two press/release cycles via mouse_release, which is wrapped with _focus_new_tab. If the first release opens a tab, focus jumps before the second click, so the rest of the double-click runs on the wrong tab at stale coordinates.
Reviewed by Cursor Bugbot for commit 73d1fc9. Configure here.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
||
| running.thread = threading.Thread(target=serve, daemon=True, name=f"hai-sidecar-{config.capability}") | ||
| running.thread.start() | ||
| ready.wait(timeout=STOP_JOIN_TIMEOUT_S) |
There was a problem hiding this comment.
Sidecar ready before channel up
Medium Severity
_start_sidecar_thread sets ready as soon as the asyncio loop starts, then create_session continues while the sidecar may still be acquiring its lease, calling ensure_channel, or building the browser/desktop driver. Early session commands can fail or time out before the sidecar is actually polling.
Reviewed by Cursor Bugbot for commit 576e89a. Configure here.
| stdout=subprocess.DEVNULL, | ||
| stderr=subprocess.DEVNULL, | ||
| start_new_session=True, | ||
| ) |
There was a problem hiding this comment.
Duplicate Chrome launch race
Medium Severity
ensure_local_chrome uses a check-then-launch pattern with no lock. Parallel browser sidecar threads (e.g. parent and subagent user_device web envs) can both see the debug port closed and spawn separate Chrome processes against the same profile directory and port.
Reviewed by Cursor Bugbot for commit 576e89a. Configure here.
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 9 total unresolved issues (including 8 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4e8401e. Configure here.
| def create_session(self, **kwargs: typing.Any) -> typing.Any: | ||
| if "agent" in kwargs: | ||
| kwargs["agent"] = localize_agent(kwargs["agent"], self._raw_client._client_wrapper._get_api_key) | ||
| _ensure_local_sidecars(kwargs["agent"], self._raw_client._client_wrapper) |
There was a problem hiding this comment.
Named agents skip sidecar startup
High Severity
With auto sidecars enabled, create_session only starts local sidecars from configs collected off the inline agent payload. A registered agent passed as a string (the usual agent="my-agent" flow) is returned unchanged by localize_agent, and catalog environment entries that are plain id strings never match user_device in _local_target, so collect_sidecar_configs is empty and no sidecar is started.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 4e8401e. Configure here.


Made with Cursor
Note
High Risk
Introduces local desktop/browser automation (input, screenshots, subprocess/
run_command, file I/O) wired into session creation by default, which is security-sensitive and can affect real user machines if misconfigured.Overview
Adds local computer-use so agents with
host: user_deviceenvironments can drive the developer’s machine from the cloud via an in-process sidecar that long-polls the platform command API and executes driver methods.SDK wiring:
Client/AsyncClientnow expose wrappedagentsandsessionsclients that inject deterministicsession_ids for local web/desktop envs (from API key + env id + capability) on create/update/patch agent and oncreate_session. WhenHAI_AUTO_SIDECARis on (default),create_sessionalso starts background sidecar threads for each local env (async path usesasyncio.to_threadso Chrome startup does not block the loop).Sidecar stack:
SidecarClientauthenticates, ensures a trajectory channel, builds a desktop (LocalDesktopDriver/ pyautogui) or browser (SeleniumWebDriverattaching to Chrome on a debug port, with optional auto-launch viaensure_local_chrome) driver, dispatches remote commands with result caching and a per-session file lock. HTTP helpers live inCommandExchange; agent payload rewriting inwiring.Packaging & CLI: Optional extras
hai-agents[desktop]and[browser]; wheel bundles selenium JS assets.hai local browser|desktopruns a sidecar manually for sessions started elsewhere.Tests: Unit coverage for wiring, sidecar dispatch/lease, and driver edge cases; autouse fixture disables auto sidecars in tests.
Reviewed by Cursor Bugbot for commit 4e8401e. Bugbot is set up for automated code reviews on this repo. Configure here.