Skip to content

feat(computer-use): integrate cua-driver-rs v0.6.8 for enhanced background automation#1324

Merged
bobleer merged 2 commits into
GCWing:mainfrom
bobleer:feat/computer-use-cua-integration
Jun 27, 2026
Merged

feat(computer-use): integrate cua-driver-rs v0.6.8 for enhanced background automation#1324
bobleer merged 2 commits into
GCWing:mainfrom
bobleer:feat/computer-use-cua-integration

Conversation

@bobleer

@bobleer bobleer commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

Integrates key source code from cua-driver-rs v0.6.8 to enhance BitFun's built-in Computer Use capability. This adopts the open-source community's strongest Computer Use techniques for both macOS and Windows background automation.

What's Changed

macOS Background Input (fully wired)

  • SkyLight SPI bridge (macos_skylight.rs): Loads private macOS framework symbols via dlopen/dlsym for background input (SLEventPostToPid, activate_without_raise, etc.)
  • Dual-post strategy (macos_bg_input.rs): Events posted via BOTH SkyLight SLEventPostToPid AND public CGEvent::post_to_pid for maximum compatibility
  • Chromium click recipe (bg_click_chromium): 5-event sequence with routing field stamping (f0 phase, f1 clickState, f40 pid, f51/f91/f92 windowID, f58 click-group) — solves the long-standing Chromium/Electron apps ignoring background clicks issue
  • Focus-without-raise (activate_without_raise SPI): Uses real CGWindowID from CGWindowListCopyWindowInfo to focus target app without raising its window
  • Terminal-safe typing: Detects terminal emulators by bundle ID, routes to per-keystroke key-event synthesis
  • AX tree improvements (macos_ax_dump.rs): AXChildren ∪ AXWindows union for background app tree completeness; Chromium AX enablement via AXManualAccessibility/AXEnhancedUserInterface; AXPlaceholderValue fallback
  • AX focus (macos_ax_write.rs): try_ax_focus wired into app_type_text for reliable text input

Windows Background Input & UI Automation (fully wired)

  • UIA batched cache (windows_ax_ui.rs): IUIAutomationCacheRequest with TreeScope_Subtree + ControlViewCondition() for efficient tree walking
  • Windows AX tree snapshot: get_app_state_snapshot with foreground pid, MSAA fallback for SAL/VCL windows
  • Background input (windows_bg_input.rs): post_click_screen, post_scroll_screen, post_drag_screen, inject_text_cloaked, inject_key_cloaked with VK parsing
  • Window capture (windows_capture.rs): PrintWindow/BitBlt with DWM crop; WindowCapture geometry for PointerMap construction
  • App enumeration (windows_list_apps.rs): EnumWindows + process image name resolution
  • Desktop host wiring (desktop_host.rs): app_click, app_type_text, app_scroll, app_key_chord, app_wait_for, interactive/visual views, list_apps, foreground window screenshot attach
  • Cross-platform drag: ComputerUseHost::drag trait method with macOS bg_drag and Windows post_drag_screen overrides

Cross-Platform Infrastructure

  • Element token system (element_token.rs): Per-pid LRU snapshot registry with opaque s{hex}:{idx} tokens — wired into get_app_state_inner on both platforms
  • Debug overlay (debug_overlay.rs): annotate_screenshot_with_click for visual click verification in debug builds
  • Chromium/Electron detection (is_chromium_electron): Bundle ID matching against Chrome, Chromium, Electron, Brave, Edge, Arc, etc.

Code Quality

  • All new code formatted with cargo fmt / pnpm run fmt:rs
  • cargo check -p bitfun-desktop passes clean on Windows
  • 87 computer_use unit tests + 20 element_token tests pass (macOS CI baseline)
  • No breaking changes to existing API surface

Test Plan

  • cargo check -p bitfun-desktop passes (Windows verified)
  • cargo test — 87 computer_use + 20 element_token tests pass
  • Code formatted with pnpm run fmt:rs
  • Manual macOS testing: background click on Chromium/Electron apps
  • Manual macOS testing: background text input in Terminal/iTerm
  • Manual Windows testing: get_app_state, app_click, app_type_text, drag on foreground app

Source Attribution

Integrated techniques and algorithms from cua-driver-rs v0.6.8 by the trycua team, licensed under MIT.

bobleer and others added 2 commits June 27, 2026 07:34
…round automation

Port and wire the core Computer Use capabilities from cua-driver-rs v0.6.8
into BitFun's built-in desktop automation, significantly improving
background input reliability for Chromium/Electron apps, terminal
emulators, and multi-window scenarios.

macOS (fully wired and verified):
- SkyLight SPI bridge (SLEventPostToPid, CGEventSetWindowLocation,
  activate_without_raise) via dlopen/dlsym
- Dual-post strategy: events posted via BOTH SkyLight SPI and public
  CGEvent::post_to_pid for maximum target coverage
- Chromium 5-event click recipe (mouseMoved → primer → target down/up)
  with routing field stamping (f0/f1/f40/f51/f58/f91/f92)
- Focus-without-raise window activation using real CGWindowID
- Terminal-safe typing: detect terminal emulators by bundle ID and
  route to per-keystroke key-event synthesis
- AX focus (AXFocused) before text input to ensure correct field
- AXChildren ∪ AXWindows union for background app tree completeness
- Chromium AX enablement (AXManualAccessibility / AXEnhancedUserInterface)
- Fn modifier support (CGEventFlagSecondaryFn, keycode 63)
- Element token system with per-pid LRU snapshot registry
- Debug overlay screenshot annotation for click coordinates

Windows (modules ported, get_app_state wired):
- UIA batched cache request tree walk (IUIAutomationCacheRequest)
- ControlViewCondition filter replacing RawViewWalker
- Cached action detection (Invoke/Toggle/Value/Scroll/etc.)
- COM element pointer retention for pattern dispatch
- get_app_state_snapshot with AxNode conversion and SHA1 digest
- supports_ax_tree() and supports_background_input() return true
- windows_bg_input, windows_capture, windows_msaa modules ported
  (remaining app_* wiring tracked in external/WINDOWS_TODO.md)

Also includes rustfmt formatting across touched files.

Tests: 87 desktop computer_use tests + 20 element_token tests pass.
Verification: cargo check -p bitfun-desktop clean on macOS.
Complete WINDOWS_TODO.md follow-up: wire app_click/type_text/scroll/key_chord/wait_for, interactive/visual views, list_apps, window screenshot capture with PointerMap, MSAA fallback, and cross-platform drag via ComputerUseHost::drag with Windows post_drag_screen.
@bobleer bobleer merged commit 63a7b81 into GCWing:main Jun 27, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant