Factory-AI · factory-ain3sh · Apr 9, 2026 · Apr 9, 2026 · Apr 10, 2026 · Apr 10, 2026
diff --git a/.factory-plugin/marketplace.json b/.factory-plugin/marketplace.json
@@ -24,6 +24,12 @@
       "source": "./plugins/core",
       "category": "core"
     },
+    {
+      "name": "droid-control",
+      "description": "Terminal, browser, and computer automation for testing, demos, QA, and computer-use tasks",
+      "source": "./plugins/droid-control",
+      "category": "automation"
+    },
     {
       "name": "autoresearch",
       "description": "Autonomous experiment loop for optimization research. Try an idea, measure it, keep what works, discard what doesn't, repeat. Works standalone or as a mission worker.",

diff --git a/README.md b/README.md
@@ -56,6 +56,20 @@ Skills for continuous learning and improvement.
 - `frontend-design` - Build web apps, websites, HTML pages with good design
 - `browser-navigation` - Browser automation with agent-browser
 
+### droid-control
+
+Terminal, browser, and computer automation for Droids. Record demos, verify behavior claims, and run QA flows.
+
+**Commands:** `/demo`, `/verify`, `/qa-test`
+
+**Skills:** `droid-control` (orchestrator), `tuistory`, `true-input`, `agent-browser`, `droid-cli`, `pty-capture`, `capture`, `compose`, `verify`, `showcase`
+
+See [plugins/droid-control/README.md](plugins/droid-control/README.md) for details.
+
+### autoresearch
+
+Autonomous experiment loop for optimization research. Try an idea, measure it, keep what works, discard what doesn't, repeat. Works standalone or as a mission worker.
+
 ## Plugin Structure
 
 Each plugin follows the Factory plugin format:

diff --git a/plugins/droid-control/.factory-plugin/plugin.json b/plugins/droid-control/.factory-plugin/plugin.json
@@ -0,0 +1,5 @@
+{
+  "name": "droid-control",
+  "description": "Terminal and browser automation for testing, demos, QA, and computer-use tasks",
+  "version": "1.0.0"
+}
diff --git a/plugins/droid-control/ARCHITECTURE.md b/plugins/droid-control/ARCHITECTURE.md
@@ -0,0 +1,59 @@
+# Architecture
+
+## The problem
+
+Put all driver knowledge, recording lifecycle, video rendering, and verification logic in one skill and two things happen. First, the droid loads 3000 tokens of Windows KVM docs to record a tuistory demo on Linux. Second, and this is the one that actually hurts, the droid gets all the information at once and has to figure out what's relevant *right now*. It makes worse decisions, skips steps it shouldn't, and invents steps it doesn't need to.
+
+## UX for droids
+
+Droids aren't exempt from information architecture. They have finite context, they get distracted by irrelevant detail, and they degrade under overload.
+
+Every skill in this plugin is a surface the droid interacts with at a specific moment in a workflow: scoped to what it needs right now, actionable on first read, with an explicit handoff to the next surface. The same instincts that make a good CLI or a good API apply. Don't dump everything. Sequence the information. Make the next step obvious.
+
+A command like `/demo` doesn't contain Remotion props or driver-specific logic. It parses intent, builds commitments, and tells the droid which skills to load. The droid never sees video rendering details while it's still planning what to record.
+
+## Waterfall routing
+
+Skills chain into each other without hardcoding. The orchestrator doesn't call skills or build a pipeline. It tells the droid which skills to load based on three independent lookups. Once loaded, each skill's exit is the next skill's entry:
+
+```
+/demo parses intent, commits deliverables
+  → orchestrator routes to driver + stage + artifact skills
+    → capture launches the app, records, hands off clips + metadata
+      → compose receives clips, builds Remotion props, renders video
+        → verify checks the output against the original commitments
+```
+
+No state machine or orchestration framework. Just documents whose outputs naturally feed into the next document's inputs. The droid follows the waterfall because each skill makes the next step obvious, not because something forces it to. Complex multi-stage workflows emerge from skill composition rather than control flow.
+
+## Task delegation
+
+Because each stage's inputs and outputs are explicit, mechanical work naturally decomposes into worker tasks. The parent agent retains planning and editorial control; workers execute exact commands and return file paths.
+
+Capture workers for both branches run in parallel. They need tctl commands and worktree paths, not PR context. The render worker gets a props JSON and clip paths. It doesn't need to know what the PR does or why the demo matters. Verification stays with the parent because it requires the original commitments and judgment about whether the evidence holds up.
+
+The skill boundaries are the delegation boundaries. You don't need a separate delegation framework because the decomposition into self-contained stages already defines what can be farmed out, what needs creative judgment, and what's too trivial to be worth the overhead.
+
+## Orthogonal routing
+
+The orchestrator makes three independent lookups:
+
+- **Target**: what are you driving? (droid-cli, other TUI, web app, byte capture)
+- **Stage**: what does the workflow need? (capture, compose, verify)
+- **Artifact**: does compose need polish tools? (showcase)
+
+These compose without cross-product explosion. 6 targets + 3 stages + 1 artifact route = 10 skills, not 18. Adding a new target means writing one skill and adding one row to the routing table. Every existing stage and artifact skill works with it immediately.
+
+## Hybrid handoffs
+
+Commands hand off to the compose stage with two sections: structured fields for mechanical decisions, natural language for creative ones.
+
+Structured: layout, speed, preset, fidelity, effects tier. These have correct answers. A side-by-side layout is either `side-by-side` or it isn't.
+
+Natural language: what the viewer should take away, which moments to hold, how to frame the story. These are editorial judgments that benefit from the droid's understanding of the PR context.
+
+Two failure modes this prevents: over-specifying creative decisions up front (the droid produces rigid, paint-by-numbers output) and under-specifying mechanical params (the droid hallucinates presets and layouts). The effects tier is a concrete example. The command commits a single word ("utilitarian" or "full"), and compose makes specific, grounded choices after capture, when it has actual recordings to look at.
+
+## Platform isolation
+
+Platform-specific content lives in `platforms/` subdirs under the relevant skill. A droid on Linux loads `true-input/platforms/linux.md`. It never sees the Windows KVM or macOS QEMU docs. This is a routing decision, not a reading-comprehension test.
diff --git a/plugins/droid-control/NOTICES.md b/plugins/droid-control/NOTICES.md
@@ -0,0 +1,25 @@
+# Third-Party Notices
+
+This plugin depends on several third-party tools and libraries. They are not bundled -- each is installed separately by the user. Their respective licenses apply at the point of installation and use.
+
+## Video rendering
+
+- **[Remotion](https://www.remotion.dev/)** -- React-based video renderer used by the compose/showcase pipeline. Remotion is free for individuals, small teams (<=3 employees), and non-profits. Larger companies require a [company license](https://www.remotion.pro/). See the [full license terms](https://github.com/remotion-dev/remotion/blob/main/LICENSE.md).
+- **[React](https://react.dev/)** -- MIT License
+- **[Zod](https://zod.dev/)** -- MIT License
+
+## Terminal automation
+
+- **[tuistory](https://github.com/nicholasgasior/tuistory)** -- virtual PTY automation CLI
+- **[asciinema](https://asciinema.org/)** -- terminal session recorder (GPL-3.0)
+- **[agg](https://github.com/asciinema/agg)** -- asciinema GIF generator (Apache-2.0)
+
+## Browser automation
+
+- **[agent-browser](https://docs.factory.ai/)** -- Playwright-backed browser automation CLI
+
+## System tools
+
+- **[ffmpeg](https://ffmpeg.org/)** -- multimedia framework (LGPL-2.1+ / GPL-2.0+, depending on build configuration)
+- **[cage](https://github.com/cage-kiosk/cage)** -- Wayland kiosk compositor (MIT)
+- **[wtype](https://github.com/atx/wtype)** -- Wayland keystroke injection (MIT)
diff --git a/plugins/droid-control/README.md b/plugins/droid-control/README.md
@@ -0,0 +1,102 @@
+# droid-control
+
+Terminal, browser, and computer automation plugin for Droids.
+
+Droids can read and write code. This plugin enables them to *operate* it: launch apps, type commands, click buttons, record what happens, and produce polished video evidence of it. No human hands required (they don't have any).
+
+## What you get
+
+**Record a demo video from a PR:**
+
+```
+/demo pr-1847
+```
+
+Droid reads the PR, scripts the interactions that prove the change works, records both branches in parallel, and renders a side-by-side comparison video. Factory preset for cinematic warmth, macos preset for clean and utilitarian.
+
+**Verify a behavior claim:**
+
+```
+/verify "ESC cancels streaming in bash mode"
+```
+
+Droid launches the app, attempts the claim, and reports what actually happened, with screenshots and text snapshots as evidence. If the claim is false, that's a valid finding, not a failure.
+
+**Run a QA flow against a web app:**
+
+```
+/qa-test https://app.example.com -- login, create a project, invite a member
+```
+
+Droid drives the browser through the flow, captures each step, and reports pass/fail with annotated screenshots.
+
+## Quick start
+
+```bash
+# Register the Factory plugins marketplace (if not already added)
+droid plugin marketplace add https://github.com/Factory-AI/factory-plugins
+
+# Install the plugin
+droid plugin install droid-control@factory-plugins --scope user
+
+# Install Remotion dependencies (one-time, only needed for video rendering)
+# Find the plugin install path with: droid plugin list --scope user
+cd <plugin-path>/remotion && npm install
+```
+
+Or use the `/plugins` UI: Browse tab, select droid-control, install.
+
+Then open a Droid session and run `/demo`, `/verify`, or `/qa-test`.
+
+## Commands
+
+### `/demo`
+
+Plans and records a demo video. Accepts a PR number, GitHub URL, or free-text description. Comparison PRs get side-by-side layout by default; new features get single-branch. Add "showcase" for cinematic polish, "keys" for keystroke overlay.
+
+### `/verify`
+
+Tests a specific behavior claim and reports findings with evidence. Frames the droid as an investigator. Anti-fabrication rules prevent staging evidence to match expected outcomes.
+
+### `/qa-test`
+
+Automated QA against terminal CLIs or web/Electron apps. Accepts a URL, CLI command, or app description with optional test steps after `--`.
+
+## How it works
+
+Three layers:
+
+- **Orchestrator** -- routes each request through three independent lookups (target, stage, artifact) to determine which skills to load. ~93 lines.
+- **10 atom skills** -- self-contained background knowledge loaded on demand. Driver atoms (tuistory, true-input, agent-browser), target atoms (droid-cli, pty-capture), stage atoms (capture, compose, verify), and a polish atom (showcase).
+- **3 commands** -- thin intent declarations that parse arguments into commitments, then delegate to atoms via hybrid handoffs.
+
+Every workflow flows through **capture → compose → verify**. Commands declare *what* to produce; atoms own *how*.
+
+## Video rendering
+
+The compose stage uses [Remotion](https://www.remotion.dev/) (React-based video renderer) for all compositing. 6 visual presets, automatic cinematic layers (warm backgrounds, floating particles, noise overlay, motion blur transitions), and effect-driven layers (spotlight, zoom, keystroke overlay, section headers).
+
+The `render-showcase.sh` helper handles the full pipeline: `.cast` conversion via `agg`, clip staging, duration detection, Remotion render, and cleanup.
+
+## Prerequisites
+
+| Stage | Platform | Required |
+|---|---|---|
+| tuistory | All | `tuistory`, `asciinema`, `agg` |
+| true-input | Linux/Wayland | `cage`, `wtype`, Wayland terminal |
+| true-input | Windows (KVM) | `libvirt`, `qemu`, KVM VM with SSH |
+| true-input | macOS (QEMU) | `qemu`, `socat`, macOS VM with SSH |
+| agent-browser | All | `agent-browser` |
+| compose | All | `ffmpeg`, `ffprobe`, `agg` |
+| showcase | All | Node.js (>= 18), Chrome/Chromium |
+
+```bash
+npm install -g tuistory                              # virtual PTY driver
+pip install asciinema                                 # terminal recording
+cargo install --git https://github.com/asciinema/agg   # .cast → .gif converter
+sudo apt-get install -y ffmpeg                        # video processing
+agent-browser install                                 # browser automation (downloads Chromium)
+cd plugins/droid-control/remotion && npm install       # Remotion (video rendering)
+```
+
+Only install what you need for your use case. Terminal demos need tuistory, asciinema, agg, and ffmpeg. Web/Electron automation just needs agent-browser.