feat(computer-use): integrate cua-driver-rs v0.6.8 for enhanced background automation#1324
Merged
Merged
Conversation
…round automation Port and wire the core Computer Use capabilities from cua-driver-rs v0.6.8 into BitFun's built-in desktop automation, significantly improving background input reliability for Chromium/Electron apps, terminal emulators, and multi-window scenarios. macOS (fully wired and verified): - SkyLight SPI bridge (SLEventPostToPid, CGEventSetWindowLocation, activate_without_raise) via dlopen/dlsym - Dual-post strategy: events posted via BOTH SkyLight SPI and public CGEvent::post_to_pid for maximum target coverage - Chromium 5-event click recipe (mouseMoved → primer → target down/up) with routing field stamping (f0/f1/f40/f51/f58/f91/f92) - Focus-without-raise window activation using real CGWindowID - Terminal-safe typing: detect terminal emulators by bundle ID and route to per-keystroke key-event synthesis - AX focus (AXFocused) before text input to ensure correct field - AXChildren ∪ AXWindows union for background app tree completeness - Chromium AX enablement (AXManualAccessibility / AXEnhancedUserInterface) - Fn modifier support (CGEventFlagSecondaryFn, keycode 63) - Element token system with per-pid LRU snapshot registry - Debug overlay screenshot annotation for click coordinates Windows (modules ported, get_app_state wired): - UIA batched cache request tree walk (IUIAutomationCacheRequest) - ControlViewCondition filter replacing RawViewWalker - Cached action detection (Invoke/Toggle/Value/Scroll/etc.) - COM element pointer retention for pattern dispatch - get_app_state_snapshot with AxNode conversion and SHA1 digest - supports_ax_tree() and supports_background_input() return true - windows_bg_input, windows_capture, windows_msaa modules ported (remaining app_* wiring tracked in external/WINDOWS_TODO.md) Also includes rustfmt formatting across touched files. Tests: 87 desktop computer_use tests + 20 element_token tests pass. Verification: cargo check -p bitfun-desktop clean on macOS.
Complete WINDOWS_TODO.md follow-up: wire app_click/type_text/scroll/key_chord/wait_for, interactive/visual views, list_apps, window screenshot capture with PointerMap, MSAA fallback, and cross-platform drag via ComputerUseHost::drag with Windows post_drag_screen.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Integrates key source code from cua-driver-rs v0.6.8 to enhance BitFun's built-in Computer Use capability. This adopts the open-source community's strongest Computer Use techniques for both macOS and Windows background automation.
What's Changed
macOS Background Input (fully wired)
macos_skylight.rs): Loads private macOS framework symbols viadlopen/dlsymfor background input (SLEventPostToPid,activate_without_raise, etc.)macos_bg_input.rs): Events posted via BOTH SkyLightSLEventPostToPidAND publicCGEvent::post_to_pidfor maximum compatibilitybg_click_chromium): 5-event sequence with routing field stamping (f0 phase, f1 clickState, f40 pid, f51/f91/f92 windowID, f58 click-group) — solves the long-standing Chromium/Electron apps ignoring background clicks issueactivate_without_raiseSPI): Uses real CGWindowID fromCGWindowListCopyWindowInfoto focus target app without raising its windowmacos_ax_dump.rs):AXChildren ∪ AXWindowsunion for background app tree completeness; Chromium AX enablement viaAXManualAccessibility/AXEnhancedUserInterface;AXPlaceholderValuefallbackmacos_ax_write.rs):try_ax_focuswired intoapp_type_textfor reliable text inputWindows Background Input & UI Automation (fully wired)
windows_ax_ui.rs):IUIAutomationCacheRequestwithTreeScope_Subtree+ControlViewCondition()for efficient tree walkingget_app_state_snapshotwith foreground pid, MSAA fallback for SAL/VCL windowswindows_bg_input.rs):post_click_screen,post_scroll_screen,post_drag_screen,inject_text_cloaked,inject_key_cloakedwith VK parsingwindows_capture.rs): PrintWindow/BitBlt with DWM crop;WindowCapturegeometry forPointerMapconstructionwindows_list_apps.rs):EnumWindows+ process image name resolutiondesktop_host.rs):app_click,app_type_text,app_scroll,app_key_chord,app_wait_for, interactive/visual views,list_apps, foreground window screenshot attachComputerUseHost::dragtrait method with macOSbg_dragand Windowspost_drag_screenoverridesCross-Platform Infrastructure
element_token.rs): Per-pid LRU snapshot registry with opaques{hex}:{idx}tokens — wired intoget_app_state_inneron both platformsdebug_overlay.rs):annotate_screenshot_with_clickfor visual click verification in debug buildsis_chromium_electron): Bundle ID matching against Chrome, Chromium, Electron, Brave, Edge, Arc, etc.Code Quality
cargo fmt/pnpm run fmt:rscargo check -p bitfun-desktoppasses clean on WindowsTest Plan
cargo check -p bitfun-desktoppasses (Windows verified)cargo test— 87 computer_use + 20 element_token tests passpnpm run fmt:rsget_app_state,app_click,app_type_text, drag on foreground appSource Attribution
Integrated techniques and algorithms from cua-driver-rs v0.6.8 by the trycua team, licensed under MIT.