Local control: in-SDK sidecar + desktop/browser drivers by abonneth · Pull Request #161 · hcompai/hai-agents-python

abonneth · 2026-06-29T16:08:55Z

Made with Cursor

Note

High Risk
Introduces local desktop/browser automation (input, screenshots, subprocess/run_command, file I/O) wired into session creation by default, which is security-sensitive and can affect real user machines if misconfigured.

Overview
Adds local computer-use so agents with host: user_device environments can drive the developer’s machine from the cloud via an in-process sidecar that long-polls the platform command API and executes driver methods.

SDK wiring: Client / AsyncClient now expose wrapped agents and sessions clients that inject deterministic session_ids for local web/desktop envs (from API key + env id + capability) on create/update/patch agent and on create_session. When HAI_AUTO_SIDECAR is on (default), create_session also starts background sidecar threads for each local env (async path uses asyncio.to_thread so Chrome startup does not block the loop).

Sidecar stack: SidecarClient authenticates, ensures a trajectory channel, builds a desktop (LocalDesktopDriver / pyautogui) or browser (SeleniumWebDriver attaching to Chrome on a debug port, with optional auto-launch via ensure_local_chrome) driver, dispatches remote commands with result caching and a per-session file lock. HTTP helpers live in CommandExchange; agent payload rewriting in wiring.

Packaging & CLI: Optional extras hai-agents[desktop] and [browser]; wheel bundles selenium JS assets. hai local browser|desktop runs a sidecar manually for sessions started elsewhere.

Tests: Unit coverage for wiring, sidecar dispatch/lease, and driver edge cases; autouse fixture disables auto sidecars in tests.

^{Reviewed by Cursor Bugbot for commit 4e8401e. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add a deny-by-default CapabilityPolicy that gates which command names a local browser/desktop driver will execute (shell, arbitrary scripts, cookies/storage, and secrets are opt-in), a name-keyed driver registry so one package can host many drivers, and the command-name contract mirroring the hai_drivers interfaces. Co-authored-by: Cursor <cursoragent@cursor.com>

Long-polling sidecar (single-owner lease, connect-time drain, command_uid replay cache + echo), capability policy (deny-by-default with opt-ins), driver registry, pyautogui desktop driver and Selenium browser driver. Co-authored-by: Cursor <cursoragent@cursor.com>

…e open Co-authored-by: Cursor <cursoragent@cursor.com>

…+ config knobs Policy now derives allowed commands from the driver's public methods minus the danger sets (shell/scripts/cookies/secrets), removing the hand-maintained method lists that duplicated the drivers. Replace the driver registry with a direct lazy factory and trim SidecarConfig to essentials. Co-authored-by: Cursor <cursoragent@cursor.com>

- serialize_result recurses into dicts (fixes get_observation_snapshot crash) - browser: reject file/chrome/js/data URLs; real markdown via markdownify; guard get_logs on CDP attach - desktop: run_command merges os.environ instead of replacing it - sidecar: interrupt long-poll on stop, reconnect on 404, back off on 429, tear down driver on shutdown - drop dead dedup cache + racy drain-on-connect (server delivers one cmd at a time, fresh uid, no replay) - split drivers into desktop/ and browser/ subpackages Co-authored-by: Cursor <cursoragent@cursor.com>

…constants Co-authored-by: Cursor <cursoragent@cursor.com>

…down - vendor h.js + defuddle.full.js; execute_script auto-injects hjs with iframe guard - extract_markdown -> Defuddle (main-content, in-browser) - get_viewport_html -> hjs_0x2a.collectViewportHTML() (screen-bounds pruned DOM) - viewport_markdown -> collectViewportHTML then CustomMarkdownify (markdownify), full-page fallback - ship js assets via wheel force-include Co-authored-by: Cursor <cursoragent@cursor.com>

…l` CLI Client now injects the local session_id for any source:"local" environment on create_agent/update_agent/patch_agent and on inline-agent create_session, so callers only pass source:"local" and the env id. Adds `hai local browser` and `hai local desktop` to run the sidecar from the CLI. Co-authored-by: Cursor <cursoragent@cursor.com>

…e, typed envs) - enter_secret clicks (x, y) to focus the target before typing, so the secret lands in the field the agent pointed at instead of stale focus. - get_tab_title honors tab_id by switching, reading, and restoring the tab. - close_active_tab guards against an empty handle list after closing the last tab. - localize_environments/localize_agent now wire source:"local" envs whether they arrive as dicts or typed Pydantic models. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-29T21:38:35Z

+        if not allow_cookies:
+            allowed -= _COOKIES
+        if not allow_secrets:
+            allowed -= _SECRETS


Script policy bypass via helpers

High Severity

With allow_scripts disabled, CapabilityPolicy only removes execute_script, but other allowed browser driver commands such as get_viewport_html, extract_markdown, scroll_page, and observation_bundle still execute page JavaScript internally, so the CLI --allow-scripts gate does not actually block script execution.

^{Reviewed by Cursor Bugbot for commit 0d1c561. Configure here.}

Co-authored-by: Cursor <cursoragent@cursor.com>

…ce SidecarBusyError in CLI update_agent/patch_agent take agent_name positionally; the **kwargs-only wrappers raised TypeError on a positional call. Accept *args and pass through. `hai local browser/desktop` acquired the lease inside asyncio.run, outside the guarded block, so a busy sidecar dumped a raw traceback; route it to the CLI error. Co-authored-by: Cursor <cursoragent@cursor.com>

…ingle-source capability - sidecar: cache command_uid -> result and re-post on redelivery instead of re-running side effects (a transient result-POST failure left the command pending and re-executed it on the next poll). - desktop driver: route keyboard through pyautogui (matches the Key contract and the remote executor); drop pynput, whose member names (enter/esc) diverge from the contract names (return/escape) and silently failed. - policy: walk the full MRO so inherited driver methods are gated, not just those declared on the concrete class. - config/wiring: KIND_TO_CAPABILITY is the single source; _CAPABILITIES derives from it. - pyproject: drop unused pynput; collapse the all extra to a self-reference. Co-authored-by: Cursor <cursoragent@cursor.com>

…shutdown Remove CapabilityPolicy and the --allow-* CLI flags: the GUI keystroke and script paths reach the same surface anyway, so the gate was a formality. - _dispatch rejects unknown/private names cleanly instead of crashing the poll loop - build the driver only after the machine lease is acquired (no leak on busy lease) - SIGINT/SIGTERM now stop the sidecar cooperatively so in-flight commands finish - floor the 429 backoff so Retry-After: 0 can't busy-loop - guard malformed fetch bodies so a bad json() doesn't kill the loop Co-authored-by: Cursor <cursoragent@cursor.com>

- desktop snapshot emits screenshot_b64 (str), the field ObservationSnapshot requires; the old screenshot_png key raised a validation error on every observe - release_key clears its modifier bit instead of XOR (stray release no longer flips it back on); key mask mutates only after a successful perform - CDP mouse events carry the buttons bitmask so drags register, and moves use button "none" - _run_script keeps the iframe guard on during retries so transient blocks retry - _focus_new_tab switches to the genuinely new handle, not window_handles[-1] - block chrome-extension/devtools/filesystem URL schemes - a kind-less dict env defaults to web so session_id still autowires Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-30T20:57:21Z

+        self._action_builder = ActionBuilder
+        self._destroyed = False
+        self.cursor_x = 0
+        self.cursor_y = 0


Mouse position never synced

Medium Severity

SeleniumWebDriver keeps cursor_x/cursor_y for CDP mouse events but initializes them to (0, 0) and never reads the browser’s actual pointer. webpage_metadata and observation_bundle expose that stale position, and click, mouse_press, and scroll can act at the wrong coordinates when no prior mouse_move_to ran.

Additional Locations (1)

src/hai_agents/local/browser/driver.py#L538-L545

^{Reviewed by Cursor Bugbot for commit 7345734. Configure here.}

…rt.py Three single-purpose helper modules become one; the defuddle bundle is now read lazily and cached on first extract_markdown instead of at import. Co-authored-by: Cursor <cursoragent@cursor.com>

Cosmetic: module tunables read as plain UPPER_CASE. Class-private methods and driver internals keep the underscore (the dispatch firewall keys off it). Co-authored-by: Cursor <cursoragent@cursor.com>

…sktop->pyautogui_desktop Package dirs now name their implementation. CLI commands, capability strings, install extras, and class names are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-06-30T22:51:18Z

+
+    def click(self, x: int, y: int, button: str = "left") -> None:
+        self._pyautogui.click(x=x, y=y, button=button)
+        self._settle_after_click()


Desktop click coordinate space mismatch

High Severity

get_observation_snapshot reports the cursor in screenshot pixel space (including after width downscaling via screenshot_max_width), but click, mouse_move_to, and related input helpers forward those coordinates unchanged to PyAutoGUI, which expects logical screen coordinates from _screen_size. Agents aiming from the observation image will miss clicks whenever capture dimensions differ from the stored screen size.

Additional Locations (1)

src/hai_agents/local/pyautogui_desktop/driver.py#L79-L81

^{Reviewed by Cursor Bugbot for commit db90a85. Configure here.}

… cached - localize_agent/create_agent now recurse into inline subagents, so a local browser/desktop child gets its session_id (was only top-level environments) - the result cache is now LRU: a cache hit refreshes recency so an actively redelivered command_uid is not evicted and re-executed mid-retry Co-authored-by: Cursor <cursoragent@cursor.com>

Stored but never read (vestigial in the upstream driver too); removing it so the constructor doesn't advertise an option that does nothing. Co-authored-by: Cursor <cursoragent@cursor.com>

Matches the consolidated hai_drivers desktop interface (single screenshot_b64 method); the command proxy forwards screenshot_b64, so the desktop driver must expose it rather than screenshot_png_bytes. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-07-01T11:52:55Z

+    def click(self, button: str = "left", delay_before_release: float = 0.05) -> None:
+        self.mouse_press(button=button)
+        time.sleep(delay_before_release)
+        self.mouse_release(button=button)


Browser sidecar click args mismatch

High Severity

The sidecar invokes driver methods by RPC name with JSON args. Desktop control uses click with x/y, and sidecar tests dispatch the same shape, but SeleniumWebDriver.click only accepts button and delay_before_release. Browser click requests carrying coordinates raise a TypeError or never move the pointer before clicking.

^{Reviewed by Cursor Bugbot for commit 1c267dc. Configure here.}

…oor empty polls - _dispatch resolves non-callable attrs (e.g. platform property) instead of rejecting them - ensure_channel treats HTTP 409 as already-created - poll loop enforces a minimum interval when long-poll returns empty early Co-authored-by: Cursor <cursoragent@cursor.com>

…5.18 Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-07-01T19:56:46Z

+            raise ValueError("api_key is required (or set HAI_API_KEY)")
+        if not self.session_id:
+            self.session_id = session_id_from_environment_id(self.environment_id, self.api_key, self.capability)
+        return self


Sidecar ignores API base URL

Medium Severity

SidecarConfig fills api_key from environment variables but always keeps base_url at the EU default unless passed explicitly. A sidecar started with SidecarConfig(...) while HAI_API_BASE_URL (or a non-EU SDK client) points elsewhere will poll and post results on the wrong host, so local control never attaches to the session the client created.

^{Reviewed by Cursor Bugbot for commit abf44a5. Configure here.}

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-07-01T21:21:32Z

+        self.mouse_press("left", click_count=1)
+        self.mouse_release("left", click_count=1)
+        self.mouse_press("left", click_count=2)
+        self.mouse_release("left", click_count=2)


Double click tab focus break

Medium Severity

double_click performs two press/release cycles via mouse_release, which is wrapped with _focus_new_tab. If the first release opens a tab, focus jumps before the second click, so the rest of the double-click runs on the wrong tab at stale coordinates.

^{Reviewed by Cursor Bugbot for commit 73d1fc9. Configure here.}

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor · 2026-07-01T23:16:38Z

+
+    running.thread = threading.Thread(target=serve, daemon=True, name=f"hai-sidecar-{config.capability}")
+    running.thread.start()
+    ready.wait(timeout=STOP_JOIN_TIMEOUT_S)


Sidecar ready before channel up

Medium Severity

_start_sidecar_thread sets ready as soon as the asyncio loop starts, then create_session continues while the sidecar may still be acquiring its lease, calling ensure_channel, or building the browser/desktop driver. Early session commands can fail or time out before the sidecar is actually polling.

^{Reviewed by Cursor Bugbot for commit 576e89a. Configure here.}

cursor · 2026-07-01T23:16:38Z

+        stdout=subprocess.DEVNULL,
+        stderr=subprocess.DEVNULL,
+        start_new_session=True,
+    )


Duplicate Chrome launch race

Medium Severity

ensure_local_chrome uses a check-then-launch pattern with no lock. Parallel browser sidecar threads (e.g. parent and subagent user_device web envs) can both see the debug port closed and spawn separate Chrome processes against the same profile directory and port.

^{Reviewed by Cursor Bugbot for commit 576e89a. Configure here.}

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 9 total unresolved issues (including 8 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 4e8401e. Configure here.}

cursor · 2026-07-01T23:24:52Z

+    def create_session(self, **kwargs: typing.Any) -> typing.Any:
+        if "agent" in kwargs:
+            kwargs["agent"] = localize_agent(kwargs["agent"], self._raw_client._client_wrapper._get_api_key)
+            _ensure_local_sidecars(kwargs["agent"], self._raw_client._client_wrapper)


Named agents skip sidecar startup

High Severity

With auto sidecars enabled, create_session only starts local sidecars from configs collected off the inline agent payload. A registered agent passed as a string (the usual agent="my-agent" flow) is returned unchanged by localize_agent, and catalog environment entries that are plain id strings never match user_device in _local_target, so collect_sidecar_configs is empty and no sidecar is started.

Additional Locations (2)

src/hai_agents/local/wiring.py#L21-L30

src/hai_agents/local/wiring.py#L32-L40

^{Reviewed by Cursor Bugbot for commit 4e8401e. Configure here.}

abonneth and others added 7 commits June 29, 2026 17:31

fix(local): browser destroy stops chromedriver, leaves attached Chrom…

f6d3585

…e open Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(local): drop leading underscore on new module-level helpers/…

a26e295

…constants Co-authored-by: Cursor <cursoragent@cursor.com>

abonneth marked this pull request as ready for review June 29, 2026 18:15

abonneth requested a review from adeprezh as a code owner June 29, 2026 18:15

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/selenium_browser/driver.py

Comment thread src/hai_agents/local/browser/driver.py Outdated

Comment thread src/hai_agents/local/browser/driver.py Outdated

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/wiring.py

abonneth and others added 2 commits June 29, 2026 23:10

refactor(local): source values user_device/cloud (was local/remote)

0d1c561

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 29, 2026

View reviewed changes

refactor(local): source->host in autowiring

8a1810f

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread src/hai_agents/local/browser/driver.py Outdated

abonneth and others added 4 commits June 30, 2026 13:57

cursor Bot reviewed Jun 30, 2026

View reviewed changes

abonneth and others added 2 commits June 30, 2026 23:01

refactor(local/browser): consolidate hjs/defuddle/markdown into suppo…

32ce8a5

…rt.py Three single-purpose helper modules become one; the defuddle bundle is now read lazily and cached on first extract_markdown instead of at import. Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(local): drop leading underscore on module-level constants

aeca68a

Cosmetic: module tunables read as plain UPPER_CASE. Class-private methods and driver internals keep the underscore (the dispatch firewall keys off it). Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/hai_agents/local/wiring.py Outdated

Comment thread src/hai_agents/local/sidecar.py

refactor(local): rename driver packages browser->selenium_browser, de…

db90a85

…sktop->pyautogui_desktop Package dirs now name their implementation. CLI commands, capability strings, install extras, and class names are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread src/hai_agents/local/selenium_browser/driver.py Outdated

refactor(local/browser): drop unused disable_html flag

2a055f1

Stored but never read (vestigial in the upstream driver too); removing it so the constructor doesn't advertise an option that does nothing. Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jul 1, 2026

View reviewed changes

abonneth and others added 2 commits July 1, 2026 21:49

chore(local): lock desktop/browser extras and bump ruff to pinned 0.1…

abf44a5

…5.18 Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jul 1, 2026

View reviewed changes

Log sidecar session id and enable INFO logging in hai local

73d1fc9

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jul 1, 2026

View reviewed changes

Auto-start local sidecars when a session uses a user_device environment

576e89a

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jul 1, 2026

View reviewed changes

Simplify sidecar runtime to module-level functions

4e8401e

Co-authored-by: Cursor <cursoragent@cursor.com>

cursor Bot reviewed Jul 1, 2026

View reviewed changes

Uh oh!

Conversation

abonneth commented Jun 29, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 29, 2026

Choose a reason for hiding this comment

Script policy bypass via helpers

Uh oh!

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Mouse position never synced

Uh oh!

Uh oh!

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Desktop click coordinate space mismatch

Uh oh!

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Browser sidecar click args mismatch

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Sidecar ignores API base URL

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Double click tab focus break

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Sidecar ready before channel up

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Duplicate Chrome launch race

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jul 1, 2026

Choose a reason for hiding this comment

Named agents skip sidecar startup

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abonneth commented Jun 29, 2026 •

edited by cursor Bot

Loading