Claudex Computer Use uses a mixed-mode approach: AX (Accessibility) actions when available, with CGEvent injection and coordinate clicks as fallback. Different apps expose varying levels of AX support, which directly affects reliability.
| Tier | Description | Examples |
|---|---|---|
| Stable | Rich AX tree, element-index clicks, set_value works, screenshots reliable | Safari, Notes, TextEdit, Calculator, Finder, System Settings |
| Limited | Partial AX tree, some actions need coordinate fallback or pasteboard workarounds | Chrome, Edge, Slack, VS Code, Cursor |
| Best-effort | Sparse or missing AX tree, CGEvent may be dropped, use at your own risk | WeChat, Feishu/Lark, self-drawn/canvas apps |
| App | Bundle ID | Read | Click | Type | Scroll | Set Value | Notes |
|---|---|---|---|---|---|---|---|
| Safari | com.apple.Safari |
Full AX tree with web area | element_index + coordinate | unicodeEvent for ASCII, pasteboard for CJK | AX action first, event fallback | Settable on address bar and some web fields | Best-supported browser. Use cmd+l to focus address bar, element_index for toolbar and page links. |
| Notes | com.apple.Notes |
Full AX tree | element_index | set_value on text area is most reliable |
AX action | Full set_value support on note body text area |
Hero demo app. Prefer set_value over type_text for content writing. |
| TextEdit | com.apple.TextEdit |
Full AX tree | element_index | set_value on text area, unicodeEvent for ASCII |
AX action | Full support | Similar to Notes. Use set_value for document body. |
| Calculator | com.apple.calculator |
Full AX tree with semantic IDs | element_index (AXPress) | N/A (button-based) | N/A | N/A | Pure AXPress workflow. Each button has a semantic identifier (AllClear, Delete, digit buttons). |
| Finder | com.apple.finder |
Full AX tree with outline/row elements | element_index for sidebar, coordinate for content area | cmd+shift+g for Go To Folder is most reliable |
AX action | Limited | Use keyboard shortcuts (cmd+shift+g, cmd+n) over tree navigation. Re-snapshot after folder navigation. |
| System Settings | com.apple.systempreferences |
Full AX tree | element_index | Limited (mostly click-navigate) | AX action | Some toggle switches are settable | Good for demonstrating navigation. Sidebar + content pane structure. |
| App | Bundle ID | Read | Click | Type | Scroll | Set Value | Notes |
|---|---|---|---|---|---|---|---|
| Chrome | com.google.Chrome |
AX tree with web area (can be large) | element_index for chrome UI, coordinate for page content | pasteboard recommended for non-ASCII | Event injection (AX scroll limited in web content) | Limited to address bar | Web area can produce 1000+ elements. Snapshot aggressively. |
| Edge | com.microsoft.edgemac |
Similar to Chrome | Similar to Chrome | pasteboard recommended | Event injection | Limited | Chromium-based, same behavior as Chrome. |
| VS Code | com.microsoft.VSCode |
Partial AX tree (editor area sparse) | element_index for sidebar/tabs, coordinate for editor | pasteboard for all text | Event injection | Very limited | Electron app. Editor canvas has weak AX fidelity. Prefer keyboard shortcuts. |
| Slack | com.tinyspeck.slackmacgap |
Partial AX tree (after AXManualAccessibility) | element_index for channel list, coordinate for messages | pasteboard | Limited | Limited | Electron. Main message area may be missing from AX. Prefer cmd+k for navigation. |
| Cursor | com.todesktop.runtime.cursor |
Similar to VS Code | Similar to VS Code | pasteboard | Event injection | Very limited | Electron. Same limitations as VS Code. |
| App | Bundle ID | Read | Click | Type | Scroll | Set Value | Notes |
|---|---|---|---|---|---|---|---|
com.tencent.xinWeChat |
~13 elements (self-drawn UI) | Unreliable (CGEvent may be dropped) | Unreliable | Unreliable (CGEvent scroll ignored) | No settable elements | Self-drawn UI. AX tree exposes window frame but not content. Not recommended for automation. | |
| Feishu/Lark | com.bytedance.lark |
Sparse AX tree | Coordinate click only, timing-dependent | pasteboard + cmd+v, timing-dependent focus |
Unreliable | Very limited | Electron with heavy custom rendering. Input focus is fragile. Works sometimes, fails unpredictably. |
| Discord | com.hnc.Discord |
Partial AX tree | Coordinate fallback needed | pasteboard | Unreliable | Limited | Electron. Similar to Slack but less AX exposure. |
- element_index (preferred): Uses AXPress or AX action. Works without moving the visible cursor. Most reliable for apps with good AX trees.
- coordinate click: Uses
CGEventPostToPid(). Does not move the system cursor. Works on most apps but may fail on self-drawn UI.
- set_value (most reliable): Directly writes to AXValue. No keyboard focus needed. No clipboard interference. Works on native text fields/areas.
- pasteboard (
strategy=pasteboard): Copies to clipboard, sendscmd+v. Works across most apps including Electron. Restores clipboard after. Requires keyboard focus in target field. - unicodeEvent (
strategy=unicodeEvent): Sends CGEvent keyboard events per character. Works for ASCII in native apps. May fail with IME active or in Electron apps. - auto (default): Chooses pasteboard for Electron apps and non-ASCII text, unicodeEvent otherwise.
- AX action (preferred): Uses
AXScrollUpByPage/AXScrollDownByPage. Works without focus. Preferred when available. - Event injection: Sends
CGEventscroll wheel events to the target PID. May require momentary focus in some apps.
- background (default): Events posted to target PID via
CGEventPostToPid(). System cursor and focus are not moved. - direct: Activates the target app briefly, performs the action, then optionally restores focus. Use when background delivery fails (some Electron apps).
-
Minimized windows: macOS does not maintain a render buffer for minimized windows. Screenshots and some AX operations fail. The app must be open (can be behind other windows, just not minimized).
-
IME interference: Chinese/Japanese/Korean input methods intercept CGEvent keyboard events. Always use
strategy=pasteboardfor CJK text. -
Focus fragility in Electron apps: CGEvent keyboard events reach the process but may not route to the correct input field inside a WebView. Use
delivery=directas the main workaround; it restores focus by default, and you can setrestore_focus=falseonly when you intentionally want the target app to stay frontmost. -
AX tree size: Some apps (especially browsers with complex pages) can produce 2000+ AX elements. Claudex Computer Use caps at 3000 nodes and 12 depth levels.
-
Transient UI: Sheets, popovers, and modal dialogs can replace the main window's AX tree. Always re-snapshot after navigation or dismissal.
-
Screen Recording permission: Required for screenshots. Without it,
get_app_statestill returns the AX tree but no screenshot.
Claudex Computer Use's mixed-mode approach is consistent with other macOS background computer-use tools:
- All use AXPress/AXValue as the primary interaction path
- All fall back to CGEvent coordinate clicks
- All struggle with self-drawn UI (WeChat, canvas apps)
- All require Accessibility + Screen Recording permissions
- Background operation means "not frontmost" but "not minimized"
The honest boundary: native AppKit/SwiftUI apps are well-supported, Electron apps are workable with workarounds, and self-drawn apps are best-effort. This is a platform limitation, not a tool limitation.