Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .factory-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@
"source": "./plugins/core",
"category": "core"
},
{
"name": "droid-control",
"description": "Terminal, browser, and computer automation for testing, demos, QA, and computer-use tasks",
"source": "./plugins/droid-control",
"category": "automation"
},
{
"name": "autoresearch",
"description": "Autonomous experiment loop for optimization research. Try an idea, measure it, keep what works, discard what doesn't, repeat. Works standalone or as a mission worker.",
Expand Down
14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,20 @@ Skills for continuous learning and improvement.
- `frontend-design` - Build web apps, websites, HTML pages with good design
- `browser-navigation` - Browser automation with agent-browser

### droid-control

Terminal, browser, and computer automation for Droids. Record demos, verify behavior claims, and run QA flows.

**Commands:** `/demo`, `/verify`, `/qa-test`

**Skills:** `droid-control` (orchestrator), `tuistory`, `true-input`, `agent-browser`, `droid-cli`, `pty-capture`, `capture`, `compose`, `verify`, `showcase`

See [plugins/droid-control/README.md](plugins/droid-control/README.md) for details.

### autoresearch

Autonomous experiment loop for optimization research. Try an idea, measure it, keep what works, discard what doesn't, repeat. Works standalone or as a mission worker.

## Plugin Structure

Each plugin follows the Factory plugin format:
Expand Down
5 changes: 5 additions & 0 deletions plugins/droid-control/.factory-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"name": "droid-control",
"description": "Terminal and browser automation for testing, demos, QA, and computer-use tasks",
"version": "1.0.0"
}
59 changes: 59 additions & 0 deletions plugins/droid-control/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Architecture

## The problem

Put all driver knowledge, recording lifecycle, video rendering, and verification logic in one skill and two things happen. First, the droid loads 3000 tokens of Windows KVM docs to record a tuistory demo on Linux. Second, and this is the one that actually hurts, the droid gets all the information at once and has to figure out what's relevant *right now*. It makes worse decisions, skips steps it shouldn't, and invents steps it doesn't need to.

## UX for droids

Droids aren't exempt from information architecture. They have finite context, they get distracted by irrelevant detail, and they degrade under overload.

Every skill in this plugin is a surface the droid interacts with at a specific moment in a workflow: scoped to what it needs right now, actionable on first read, with an explicit handoff to the next surface. The same instincts that make a good CLI or a good API apply. Don't dump everything. Sequence the information. Make the next step obvious.

A command like `/demo` doesn't contain Remotion props or driver-specific logic. It parses intent, builds commitments, and tells the droid which skills to load. The droid never sees video rendering details while it's still planning what to record.

## Waterfall routing

Skills chain into each other without hardcoding. The orchestrator doesn't call skills or build a pipeline. It tells the droid which skills to load based on three independent lookups. Once loaded, each skill's exit is the next skill's entry:

```
/demo parses intent, commits deliverables
→ orchestrator routes to driver + stage + artifact skills
→ capture launches the app, records, hands off clips + metadata
→ compose receives clips, builds Remotion props, renders video
→ verify checks the output against the original commitments
```

No state machine or orchestration framework. Just documents whose outputs naturally feed into the next document's inputs. The droid follows the waterfall because each skill makes the next step obvious, not because something forces it to. Complex multi-stage workflows emerge from skill composition rather than control flow.

## Task delegation

Because each stage's inputs and outputs are explicit, mechanical work naturally decomposes into worker tasks. The parent agent retains planning and editorial control; workers execute exact commands and return file paths.

Capture workers for both branches run in parallel. They need tctl commands and worktree paths, not PR context. The render worker gets a props JSON and clip paths. It doesn't need to know what the PR does or why the demo matters. Verification stays with the parent because it requires the original commitments and judgment about whether the evidence holds up.

The skill boundaries are the delegation boundaries. You don't need a separate delegation framework because the decomposition into self-contained stages already defines what can be farmed out, what needs creative judgment, and what's too trivial to be worth the overhead.

## Orthogonal routing

The orchestrator makes three independent lookups:

- **Target**: what are you driving? (droid-cli, other TUI, web app, byte capture)
- **Stage**: what does the workflow need? (capture, compose, verify)
- **Artifact**: does compose need polish tools? (showcase)

These compose without cross-product explosion. 6 targets + 3 stages + 1 artifact route = 10 skills, not 18. Adding a new target means writing one skill and adding one row to the routing table. Every existing stage and artifact skill works with it immediately.

## Hybrid handoffs

Commands hand off to the compose stage with two sections: structured fields for mechanical decisions, natural language for creative ones.

Structured: layout, speed, preset, fidelity, effects tier. These have correct answers. A side-by-side layout is either `side-by-side` or it isn't.

Natural language: what the viewer should take away, which moments to hold, how to frame the story. These are editorial judgments that benefit from the droid's understanding of the PR context.

Two failure modes this prevents: over-specifying creative decisions up front (the droid produces rigid, paint-by-numbers output) and under-specifying mechanical params (the droid hallucinates presets and layouts). The effects tier is a concrete example. The command commits a single word ("utilitarian" or "full"), and compose makes specific, grounded choices after capture, when it has actual recordings to look at.

## Platform isolation

Platform-specific content lives in `platforms/` subdirs under the relevant skill. A droid on Linux loads `true-input/platforms/linux.md`. It never sees the Windows KVM or macOS QEMU docs. This is a routing decision, not a reading-comprehension test.
25 changes: 25 additions & 0 deletions plugins/droid-control/NOTICES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Third-Party Notices

This plugin depends on several third-party tools and libraries. They are not bundled -- each is installed separately by the user. Their respective licenses apply at the point of installation and use.

## Video rendering

- **[Remotion](https://www.remotion.dev/)** -- React-based video renderer used by the compose/showcase pipeline. Remotion is free for individuals, small teams (<=3 employees), and non-profits. Larger companies require a [company license](https://www.remotion.pro/). See the [full license terms](https://github.com/remotion-dev/remotion/blob/main/LICENSE.md).
- **[React](https://react.dev/)** -- MIT License
- **[Zod](https://zod.dev/)** -- MIT License

## Terminal automation

- **[tuistory](https://github.com/nicholasgasior/tuistory)** -- virtual PTY automation CLI
- **[asciinema](https://asciinema.org/)** -- terminal session recorder (GPL-3.0)
- **[agg](https://github.com/asciinema/agg)** -- asciinema GIF generator (Apache-2.0)

## Browser automation

- **[agent-browser](https://docs.factory.ai/)** -- Playwright-backed browser automation CLI

## System tools

- **[ffmpeg](https://ffmpeg.org/)** -- multimedia framework (LGPL-2.1+ / GPL-2.0+, depending on build configuration)
- **[cage](https://github.com/cage-kiosk/cage)** -- Wayland kiosk compositor (MIT)
- **[wtype](https://github.com/atx/wtype)** -- Wayland keystroke injection (MIT)
102 changes: 102 additions & 0 deletions plugins/droid-control/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# droid-control

Terminal, browser, and computer automation plugin for Droids.

Droids can read and write code. This plugin enables them to *operate* it: launch apps, type commands, click buttons, record what happens, and produce polished video evidence of it. No human hands required (they don't have any).

## What you get

**Record a demo video from a PR:**

```
/demo pr-1847
```

Droid reads the PR, scripts the interactions that prove the change works, records both branches in parallel, and renders a side-by-side comparison video. Factory preset for cinematic warmth, macos preset for clean and utilitarian.

**Verify a behavior claim:**

```
/verify "ESC cancels streaming in bash mode"
```

Droid launches the app, attempts the claim, and reports what actually happened, with screenshots and text snapshots as evidence. If the claim is false, that's a valid finding, not a failure.

**Run a QA flow against a web app:**

```
/qa-test https://app.example.com -- login, create a project, invite a member
```

Droid drives the browser through the flow, captures each step, and reports pass/fail with annotated screenshots.

## Quick start

```bash
# Register the Factory plugins marketplace (if not already added)
droid plugin marketplace add https://github.com/Factory-AI/factory-plugins

# Install the plugin
droid plugin install droid-control@factory-plugins --scope user

# Install Remotion dependencies (one-time, only needed for video rendering)
# Find the plugin install path with: droid plugin list --scope user
cd <plugin-path>/remotion && npm install
```

Or use the `/plugins` UI: Browse tab, select droid-control, install.

Then open a Droid session and run `/demo`, `/verify`, or `/qa-test`.

## Commands

### `/demo`

Plans and records a demo video. Accepts a PR number, GitHub URL, or free-text description. Comparison PRs get side-by-side layout by default; new features get single-branch. Add "showcase" for cinematic polish, "keys" for keystroke overlay.

### `/verify`

Tests a specific behavior claim and reports findings with evidence. Frames the droid as an investigator. Anti-fabrication rules prevent staging evidence to match expected outcomes.

### `/qa-test`

Automated QA against terminal CLIs or web/Electron apps. Accepts a URL, CLI command, or app description with optional test steps after `--`.

## How it works

Three layers:

- **Orchestrator** -- routes each request through three independent lookups (target, stage, artifact) to determine which skills to load. ~93 lines.
- **10 atom skills** -- self-contained background knowledge loaded on demand. Driver atoms (tuistory, true-input, agent-browser), target atoms (droid-cli, pty-capture), stage atoms (capture, compose, verify), and a polish atom (showcase).
- **3 commands** -- thin intent declarations that parse arguments into commitments, then delegate to atoms via hybrid handoffs.

Every workflow flows through **capture → compose → verify**. Commands declare *what* to produce; atoms own *how*.

## Video rendering

The compose stage uses [Remotion](https://www.remotion.dev/) (React-based video renderer) for all compositing. 6 visual presets, automatic cinematic layers (warm backgrounds, floating particles, noise overlay, motion blur transitions), and effect-driven layers (spotlight, zoom, keystroke overlay, section headers).

The `render-showcase.sh` helper handles the full pipeline: `.cast` conversion via `agg`, clip staging, duration detection, Remotion render, and cleanup.

## Prerequisites

| Stage | Platform | Required |
|---|---|---|
| tuistory | All | `tuistory`, `asciinema`, `agg` |
| true-input | Linux/Wayland | `cage`, `wtype`, Wayland terminal |
| true-input | Windows (KVM) | `libvirt`, `qemu`, KVM VM with SSH |
| true-input | macOS (QEMU) | `qemu`, `socat`, macOS VM with SSH |
| agent-browser | All | `agent-browser` |
| compose | All | `ffmpeg`, `ffprobe`, `agg` |
| showcase | All | Node.js (>= 18), Chrome/Chromium |

```bash
npm install -g tuistory # virtual PTY driver
pip install asciinema # terminal recording
cargo install --git https://github.com/asciinema/agg # .cast → .gif converter
sudo apt-get install -y ffmpeg # video processing
agent-browser install # browser automation (downloads Chromium)
cd plugins/droid-control/remotion && npm install # Remotion (video rendering)
```

Only install what you need for your use case. Terminal demos need tuistory, asciinema, agg, and ffmpeg. Web/Electron automation just needs agent-browser.
Loading