OpenAdapt Capture is the data collection component of the OpenAdapt GUI automation ecosystem.
Capture platform-agnostic GUI interaction streams with time-aligned screenshots and audio for training ML models or replaying workflows.
Status: Pre-alpha.
OpenAdapt GUI Automation Pipeline
=================================
+-----------------+ +------------------+ +------------------+
| | | | | |
| openadapt- | ------> | openadapt-ml | ------> | Deploy |
| capture | Convert | (Train & Eval) | Export | (Inference) |
| | | | | |
+-----------------+ +------------------+ +------------------+
| | |
v v v
- Record GUI - Fine-tune VLMs - Run trained
interactions - Evaluate on agent on new
- Mouse, keyboard, benchmarks (WAA) tasks
screen, audio - Compare models - Real-time
- Privacy scrubbing - Cloud GPU training automation
| Component | Purpose | Repository |
|---|---|---|
| openadapt-capture | Record human demonstrations | GitHub |
| openadapt-ml | Train and evaluate GUI automation models | GitHub |
| openadapt-privacy | PII scrubbing for recordings | GitHub |
uv add openadapt-captureThis includes everything needed to capture and replay GUI interactions (mouse, keyboard, screen recording).
For audio capture with Whisper transcription (large download):
uv add "openadapt-capture[audio]"from openadapt_capture import Recorder
# Record GUI interactions
with Recorder("./my_capture", task_description="Demo task") as recorder:
# Captures mouse, keyboard, and screen until context exits
input("Press Enter to stop recording...")from openadapt_capture import Capture
# Load and iterate over time-aligned events
capture = Capture.load("./my_capture")
for action in capture.actions():
# Each action has an associated screenshot
print(f"{action.timestamp}: {action.type} at ({action.x}, {action.y})")
screenshot = action.screenshot # PIL Image at time of actionfrom openadapt_capture.db import create_db, get_session_for_path
from openadapt_capture.db import crud
from openadapt_capture.db.models import Recording, ActionEvent
# Create a database
engine, Session = create_db("/path/to/recording.db")
session = Session()
# Insert a recording
recording = crud.insert_recording(session, {
"timestamp": 1700000000.0,
"monitor_width": 1920,
"monitor_height": 1080,
"platform": "win32",
"task_description": "My task",
})
# Insert events
crud.insert_action_event(session, recording, 1700000001.0, {
"name": "click",
"mouse_x": 100.0,
"mouse_y": 200.0,
"mouse_button_name": "left",
"mouse_pressed": True,
})
# Query events back
from openadapt_capture.capture import CaptureSession
capture = CaptureSession.load("/path/to/capture_dir")
actions = list(capture.actions())Raw events (captured):
mouse.move,mouse.down,mouse.up,mouse.scrollkey.down,key.up
Actions (processed):
mouse.singleclick,mouse.doubleclick,mouse.dragkey.type(merged keystrokes into text)
The recorder uses a multi-process architecture copied from legacy OpenAdapt:
- Reader threads: Capture mouse, keyboard, screen, and window events into a central queue
- Processor thread: Routes events to type-specific write queues
- Writer processes: Persist events to SQLAlchemy DB (one process per event type)
- Action-gated video: Only encodes video frames when user actions occur
capture_directory/
├── recording.db # SQLite: events, screenshots, window events, perf stats
├── oa_recording-{ts}.mp4 # Screen recording (action-gated)
└── audio.flac # Audio (optional)
Run a performance test with synthetic input:
uv run python scripts/perf_test.pyThis records for 10 seconds using pynput Controllers, then reports:
- Wall/CPU time and memory usage
- Event counts and action types
- Output file sizes
- Memory usage plot (saved to capture directory)
Run integration tests (requires accessibility permissions):
uv run pytest tests/test_performance.py -v -m slowGenerate animated demos and interactive viewers from recordings:
from openadapt_capture import Capture, create_demo
capture = Capture.load("./my_capture")
create_demo(capture, output="demo.gif", fps=10, max_duration=15)from openadapt_capture import Capture, create_html
capture = Capture.load("./my_capture")
create_html(capture, output="viewer.html", include_audio=True)Share recordings between machines using Magic Wormhole:
# On the sending machine
capture share send ./my_capture
# Shows a code like: 7-guitarist-revenge
# On the receiving machine
capture share receive 7-guitarist-revengeThe share command compresses the recording, sends it via Magic Wormhole, and extracts it on the receiving end. No account or setup required - just share the code.
| Extra | Features |
|---|---|
audio |
Audio capture + Whisper transcription |
privacy |
PII scrubbing (openadapt-privacy) |
share |
Recording sharing via Magic Wormhole |
all |
Everything |
uv sync --dev
uv run pytest tests/ -v --ignore=tests/test_browser_bridge.py
# Run slow integration tests (requires accessibility permissions)
uv run pytest tests/ -v -m slow- openadapt-ml - Train and evaluate GUI automation models
- openadapt-privacy - PII detection and scrubbing for recordings
- openadapt-evals - Benchmark evaluation for GUI agents
- Windows Agent Arena - Benchmark for Windows GUI agents
MIT