Skip to content

Local control: in-SDK sidecar + desktop/browser drivers#161

Open
abonneth wants to merge 26 commits into
mainfrom
antoine/local-control
Open

Local control: in-SDK sidecar + desktop/browser drivers#161
abonneth wants to merge 26 commits into
mainfrom
antoine/local-control

Conversation

@abonneth

@abonneth abonneth commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Made with Cursor


Note

High Risk
Introduces local desktop/browser automation (input, screenshots, subprocess/run_command, file I/O) wired into session creation by default, which is security-sensitive and can affect real user machines if misconfigured.

Overview
Adds local computer-use so agents with host: user_device environments can drive the developer’s machine from the cloud via an in-process sidecar that long-polls the platform command API and executes driver methods.

SDK wiring: Client / AsyncClient now expose wrapped agents and sessions clients that inject deterministic session_ids for local web/desktop envs (from API key + env id + capability) on create/update/patch agent and on create_session. When HAI_AUTO_SIDECAR is on (default), create_session also starts background sidecar threads for each local env (async path uses asyncio.to_thread so Chrome startup does not block the loop).

Sidecar stack: SidecarClient authenticates, ensures a trajectory channel, builds a desktop (LocalDesktopDriver / pyautogui) or browser (SeleniumWebDriver attaching to Chrome on a debug port, with optional auto-launch via ensure_local_chrome) driver, dispatches remote commands with result caching and a per-session file lock. HTTP helpers live in CommandExchange; agent payload rewriting in wiring.

Packaging & CLI: Optional extras hai-agents[desktop] and [browser]; wheel bundles selenium JS assets. hai local browser|desktop runs a sidecar manually for sessions started elsewhere.

Tests: Unit coverage for wiring, sidecar dispatch/lease, and driver edge cases; autouse fixture disables auto sidecars in tests.

Reviewed by Cursor Bugbot for commit 4e8401e. Bugbot is set up for automated code reviews on this repo. Configure here.

abonneth and others added 7 commits June 29, 2026 17:31
Add a deny-by-default CapabilityPolicy that gates which command names a local
browser/desktop driver will execute (shell, arbitrary scripts, cookies/storage,
and secrets are opt-in), a name-keyed driver registry so one package can host
many drivers, and the command-name contract mirroring the hai_drivers interfaces.

Co-authored-by: Cursor <cursoragent@cursor.com>
Long-polling sidecar (single-owner lease, connect-time drain, command_uid
replay cache + echo), capability policy (deny-by-default with opt-ins),
driver registry, pyautogui desktop driver and Selenium browser driver.

Co-authored-by: Cursor <cursoragent@cursor.com>
…e open

Co-authored-by: Cursor <cursoragent@cursor.com>
…+ config knobs

Policy now derives allowed commands from the driver's public methods minus the
danger sets (shell/scripts/cookies/secrets), removing the hand-maintained method
lists that duplicated the drivers. Replace the driver registry with a direct lazy
factory and trim SidecarConfig to essentials.

Co-authored-by: Cursor <cursoragent@cursor.com>
- serialize_result recurses into dicts (fixes get_observation_snapshot crash)
- browser: reject file/chrome/js/data URLs; real markdown via markdownify; guard get_logs on CDP attach
- desktop: run_command merges os.environ instead of replacing it
- sidecar: interrupt long-poll on stop, reconnect on 404, back off on 429, tear down driver on shutdown
- drop dead dedup cache + racy drain-on-connect (server delivers one cmd at a time, fresh uid, no replay)
- split drivers into desktop/ and browser/ subpackages

Co-authored-by: Cursor <cursoragent@cursor.com>
…constants

Co-authored-by: Cursor <cursoragent@cursor.com>
…down

- vendor h.js + defuddle.full.js; execute_script auto-injects hjs with iframe guard
- extract_markdown -> Defuddle (main-content, in-browser)
- get_viewport_html -> hjs_0x2a.collectViewportHTML() (screen-bounds pruned DOM)
- viewport_markdown -> collectViewportHTML then CustomMarkdownify (markdownify), full-page fallback
- ship js assets via wheel force-include

Co-authored-by: Cursor <cursoragent@cursor.com>
@abonneth abonneth marked this pull request as ready for review June 29, 2026 18:15
@abonneth abonneth requested a review from adeprezh as a code owner June 29, 2026 18:15
Comment thread src/hai_agents/local/selenium_browser/driver.py
Comment thread src/hai_agents/local/browser/driver.py Outdated
Comment thread src/hai_agents/local/browser/driver.py Outdated
…l` CLI

Client now injects the local session_id for any source:"local" environment on
create_agent/update_agent/patch_agent and on inline-agent create_session, so
callers only pass source:"local" and the env id. Adds `hai local browser` and
`hai local desktop` to run the sidecar from the CLI.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/hai_agents/local/wiring.py
abonneth and others added 2 commits June 29, 2026 23:10
…e, typed envs)

- enter_secret clicks (x, y) to focus the target before typing, so the secret
  lands in the field the agent pointed at instead of stale focus.
- get_tab_title honors tab_id by switching, reading, and restoring the tab.
- close_active_tab guards against an empty handle list after closing the last tab.
- localize_environments/localize_agent now wire source:"local" envs whether they
  arrive as dicts or typed Pydantic models.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/hai_agents/local/policy.py Outdated
if not allow_cookies:
allowed -= _COOKIES
if not allow_secrets:
allowed -= _SECRETS

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script policy bypass via helpers

High Severity

With allow_scripts disabled, CapabilityPolicy only removes execute_script, but other allowed browser driver commands such as get_viewport_html, extract_markdown, scroll_page, and observation_bundle still execute page JavaScript internally, so the CLI --allow-scripts gate does not actually block script execution.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0d1c561. Configure here.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/hai_agents/local/browser/driver.py Outdated
abonneth and others added 4 commits June 30, 2026 13:57
…ce SidecarBusyError in CLI

update_agent/patch_agent take agent_name positionally; the **kwargs-only
wrappers raised TypeError on a positional call. Accept *args and pass through.

`hai local browser/desktop` acquired the lease inside asyncio.run, outside the
guarded block, so a busy sidecar dumped a raw traceback; route it to the CLI error.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ingle-source capability

- sidecar: cache command_uid -> result and re-post on redelivery instead of
  re-running side effects (a transient result-POST failure left the command
  pending and re-executed it on the next poll).
- desktop driver: route keyboard through pyautogui (matches the Key contract
  and the remote executor); drop pynput, whose member names (enter/esc) diverge
  from the contract names (return/escape) and silently failed.
- policy: walk the full MRO so inherited driver methods are gated, not just
  those declared on the concrete class.
- config/wiring: KIND_TO_CAPABILITY is the single source; _CAPABILITIES derives
  from it.
- pyproject: drop unused pynput; collapse the all extra to a self-reference.

Co-authored-by: Cursor <cursoragent@cursor.com>
…shutdown

Remove CapabilityPolicy and the --allow-* CLI flags: the GUI keystroke and
script paths reach the same surface anyway, so the gate was a formality.

- _dispatch rejects unknown/private names cleanly instead of crashing the poll loop
- build the driver only after the machine lease is acquired (no leak on busy lease)
- SIGINT/SIGTERM now stop the sidecar cooperatively so in-flight commands finish
- floor the 429 backoff so Retry-After: 0 can't busy-loop
- guard malformed fetch bodies so a bad json() doesn't kill the loop

Co-authored-by: Cursor <cursoragent@cursor.com>
- desktop snapshot emits screenshot_b64 (str), the field ObservationSnapshot
  requires; the old screenshot_png key raised a validation error on every observe
- release_key clears its modifier bit instead of XOR (stray release no longer
  flips it back on); key mask mutates only after a successful perform
- CDP mouse events carry the buttons bitmask so drags register, and moves use
  button "none"
- _run_script keeps the iframe guard on during retries so transient blocks retry
- _focus_new_tab switches to the genuinely new handle, not window_handles[-1]
- block chrome-extension/devtools/filesystem URL schemes
- a kind-less dict env defaults to web so session_id still autowires

Co-authored-by: Cursor <cursoragent@cursor.com>
self._action_builder = ActionBuilder
self._destroyed = False
self.cursor_x = 0
self.cursor_y = 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mouse position never synced

Medium Severity

SeleniumWebDriver keeps cursor_x/cursor_y for CDP mouse events but initializes them to (0, 0) and never reads the browser’s actual pointer. webpage_metadata and observation_bundle expose that stale position, and click, mouse_press, and scroll can act at the wrong coordinates when no prior mouse_move_to ran.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7345734. Configure here.

abonneth and others added 2 commits June 30, 2026 23:01
…rt.py

Three single-purpose helper modules become one; the defuddle bundle is now
read lazily and cached on first extract_markdown instead of at import.

Co-authored-by: Cursor <cursoragent@cursor.com>
Cosmetic: module tunables read as plain UPPER_CASE. Class-private methods and
driver internals keep the underscore (the dispatch firewall keys off it).

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/hai_agents/local/wiring.py Outdated
Comment thread src/hai_agents/local/sidecar.py
…sktop->pyautogui_desktop

Package dirs now name their implementation. CLI commands, capability strings,
install extras, and class names are unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>

def click(self, x: int, y: int, button: str = "left") -> None:
self._pyautogui.click(x=x, y=y, button=button)
self._settle_after_click()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desktop click coordinate space mismatch

High Severity

get_observation_snapshot reports the cursor in screenshot pixel space (including after width downscaling via screenshot_max_width), but click, mouse_move_to, and related input helpers forward those coordinates unchanged to PyAutoGUI, which expects logical screen coordinates from _screen_size. Agents aiming from the observation image will miss clicks whenever capture dimensions differ from the stored screen size.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit db90a85. Configure here.

… cached

- localize_agent/create_agent now recurse into inline subagents, so a local
  browser/desktop child gets its session_id (was only top-level environments)
- the result cache is now LRU: a cache hit refreshes recency so an actively
  redelivered command_uid is not evicted and re-executed mid-retry

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/hai_agents/local/selenium_browser/driver.py Outdated
Stored but never read (vestigial in the upstream driver too); removing it so the
constructor doesn't advertise an option that does nothing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Matches the consolidated hai_drivers desktop interface (single
screenshot_b64 method); the command proxy forwards screenshot_b64, so the
desktop driver must expose it rather than screenshot_png_bytes.

Co-authored-by: Cursor <cursoragent@cursor.com>
def click(self, button: str = "left", delay_before_release: float = 0.05) -> None:
self.mouse_press(button=button)
time.sleep(delay_before_release)
self.mouse_release(button=button)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Browser sidecar click args mismatch

High Severity

The sidecar invokes driver methods by RPC name with JSON args. Desktop control uses click with x/y, and sidecar tests dispatch the same shape, but SeleniumWebDriver.click only accepts button and delay_before_release. Browser click requests carrying coordinates raise a TypeError or never move the pointer before clicking.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1c267dc. Configure here.

abonneth and others added 2 commits July 1, 2026 21:49
…oor empty polls

- _dispatch resolves non-callable attrs (e.g. platform property) instead of rejecting them
- ensure_channel treats HTTP 409 as already-created
- poll loop enforces a minimum interval when long-poll returns empty early

Co-authored-by: Cursor <cursoragent@cursor.com>
…5.18

Co-authored-by: Cursor <cursoragent@cursor.com>
raise ValueError("api_key is required (or set HAI_API_KEY)")
if not self.session_id:
self.session_id = session_id_from_environment_id(self.environment_id, self.api_key, self.capability)
return self

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidecar ignores API base URL

Medium Severity

SidecarConfig fills api_key from environment variables but always keeps base_url at the EU default unless passed explicitly. A sidecar started with SidecarConfig(...) while HAI_API_BASE_URL (or a non-EU SDK client) points elsewhere will poll and post results on the wrong host, so local control never attaches to the session the client created.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit abf44a5. Configure here.

Co-authored-by: Cursor <cursoragent@cursor.com>
self.mouse_press("left", click_count=1)
self.mouse_release("left", click_count=1)
self.mouse_press("left", click_count=2)
self.mouse_release("left", click_count=2)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double click tab focus break

Medium Severity

double_click performs two press/release cycles via mouse_release, which is wrapped with _focus_new_tab. If the first release opens a tab, focus jumps before the second click, so the rest of the double-click runs on the wrong tab at stale coordinates.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 73d1fc9. Configure here.

Comment thread src/hai_agents/local/runtime.py Outdated

running.thread = threading.Thread(target=serve, daemon=True, name=f"hai-sidecar-{config.capability}")
running.thread.start()
ready.wait(timeout=STOP_JOIN_TIMEOUT_S)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidecar ready before channel up

Medium Severity

_start_sidecar_thread sets ready as soon as the asyncio loop starts, then create_session continues while the sidecar may still be acquiring its lease, calling ensure_channel, or building the browser/desktop driver. Early session commands can fail or time out before the sidecar is actually polling.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 576e89a. Configure here.

stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate Chrome launch race

Medium Severity

ensure_local_chrome uses a check-then-launch pattern with no lock. Parallel browser sidecar threads (e.g. parent and subagent user_device web envs) can both see the debug port closed and spawn separate Chrome processes against the same profile directory and port.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 576e89a. Configure here.

Co-authored-by: Cursor <cursoragent@cursor.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 9 total unresolved issues (including 8 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4e8401e. Configure here.

Comment thread src/hai_agents/client.py
def create_session(self, **kwargs: typing.Any) -> typing.Any:
if "agent" in kwargs:
kwargs["agent"] = localize_agent(kwargs["agent"], self._raw_client._client_wrapper._get_api_key)
_ensure_local_sidecars(kwargs["agent"], self._raw_client._client_wrapper)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Named agents skip sidecar startup

High Severity

With auto sidecars enabled, create_session only starts local sidecars from configs collected off the inline agent payload. A registered agent passed as a string (the usual agent="my-agent" flow) is returned unchanged by localize_agent, and catalog environment entries that are plain id strings never match user_device in _local_target, so collect_sidecar_configs is empty and no sidecar is started.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4e8401e. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant