How It Works · See It Work · Skill Router · Install · Deep Dive · Skills · CLI · FAQ
From goal to shipped code — agents research, plan, and implement in parallel. Councils validate before and after. Every learning feeds the next session.
Coding agents get a blank context window every session. AgentOps is a toolbox of primitives — pick the ones you need, skip the ones you don't. Every skill works standalone. Swarm any of them for parallelism. Chain them into a pipeline when you want structure. Knowledge compounds between sessions automatically.
DevOps' Three Ways — applied to the agent loop as composable primitives:
- Flow (
/research,/plan,/crank,/swarm,/rpi): orchestration skills that move work through the system. Single-piece flow, minimizing context switches. Swarm parallelizes any skill; crank runs dependency-ordered waves; rpi chains the full pipeline. - Feedback (
/council,/vibe,/pre-mortem, hooks): shorten the feedback loop until defects can't survive it. Independent judges catch issues before code ships. Hooks make the rules unavoidable — validation gates, push blocking, standards injection. Problems found Friday don't wait until Monday. - Learning (
.agents/,aoCLI,/retro,/knowledge): stop rediscovering what you already know. Every session extracts learnings into an append-only ledger, scores them by freshness, and re-injects the best ones at next session start. Session 50 knows what session 1 learned the hard way.
/quickstart ← Day 1: guided tour on your codebase (~10 min)
│
Not sure what to do? ─────────► /brainstorm
│
Have an idea of what you want? ─────────► /research
│
Ready to scope it cleanly? ─────────► /plan
│
/implement (small) · /crank (epic) ← Build and ship
│
/vibe → /post-mortem ← Validate and learn
│
/rpi "goal" ← One command for the full flow
Use one skill — validate a PR:
> /council validate this PR
[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping
The council verdict, your decisions, and the patterns used are automatically written to .agents/ — an append-only ledger. Nothing gets overwritten. Session ends, hooks extract learnings.
Knowledge compounds — three weeks later, different task, but your agent already knows:
> /research "retry backoff strategies"
[inject] 3 prior learnings loaded (freshness-weighted):
- Token bucket with Redis (established, high confidence)
- Rate limit at middleware layer, not per-handler (pattern)
- /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + injected context
Recommends: exponential backoff with jitter, reuse existing Redis client
Session 5 didn't start from scratch — it started with what session 1 learned. Stale insights decay automatically.
Parallelize anything with /swarm:
> /swarm "research auth patterns, brainstorm rate limiting improvements"
[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/
Full pipeline — one command, walk away:
> /rpi "add retry backoff to rate limiter"
[research] Found 3 prior learnings on rate limiting (injected)
[plan] 2 issues, 1 wave → epic ag-0058
[pre-mortem] Council validates plan → PASS (knew about Redis choice)
[crank] Parallel agents: Wave 1 ██ 2/2
[vibe] Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel] Next: /rpi "add circuit breaker to external API calls"
AgentOps building AgentOps: completed `/crank` across 3 parallel epics (15 issues, 5 waves, 0 regressions).
More examples — /evolve, session continuity
Session continuity across compaction or restart:
> /handoff
[handoff] Saved: 3 open issues, current branch, next action
Continuation prompt written to .agents/handoffs/
--- next session ---
> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
Branch: feature/rate-limiter
Next: /implement ag-0058.3
Goal-driven improvement loop:
> /evolve --max-cycles=5
[evolve] GOALS.yaml: 4 goals loaded
[cycle-1] Measuring fitness... 2/4 passing
Worst gap: test-pass-rate (weight: 10)
/rpi "Improve test-pass-rate" → 3 issues, 2 waves
Re-measure: 3/4 passing ✓
[cycle-2] Worst gap: doc-coverage (weight: 7)
/rpi "Improve doc-coverage" → 2 issues, 1 wave
Re-measure: 4/4 passing ✓
[cycle-3] All goals met. Checking harvested work...
Picked: "add smoke test for /evolve" (from post-mortem)
[teardown] /post-mortem → 5 learnings extracted
Different developers, different setups — use what fits your workflow
The PR reviewer — uses one skill, nothing else:
> /council validate this PR
Consensus: WARN — missing error handling in 2 locations
That's it. No pipeline, no setup, no commitment. One command, actionable feedback.
The team lead — composes skills manually:
> /research "performance bottlenecks in the API layer"
> /plan "optimize database queries identified in research"
> /council validate the plan
Picks skills as needed, stays in control of sequencing.
The solo dev — runs the full pipeline, walks away:
> /rpi "add user authentication"
[3 phases run autonomously, learnings extracted]
One command does research through post-mortem. Comes back to committed code.
The platform team — parallel agents, hands-free improvement:
> /swarm "run /rpi on each of these 3 epics"
> /evolve --max-cycles=5
Swarms full pipelines in parallel. Evolve measures goals and fixes gaps in a loop.
Use this when you're not sure which skill to run.
What are you trying to do?
│
├─ "Not sure what to do yet"
│ └─ Generate options first ─────► /brainstorm
│
├─ "I have an idea"
│ └─ Understand code + context ──► /research
│
├─ "I know what I want to build"
│ └─ Break it into issues ───────► /plan
│
├─ "Now build it"
│ ├─ Small/single issue ─────────► /implement
│ ├─ Multi-issue epic ───────────► /crank <epic-id>
│ └─ Full flow in one command ───► /rpi "goal"
│
├─ "Fix a bug"
│ ├─ Know which file? ──────────► /implement <issue-id>
│ └─ Need to investigate? ──────► /bug-hunt
│
├─ "Build a feature"
│ ├─ Small (1-2 files) ─────────► /implement
│ ├─ Medium (3-6 issues) ───────► /plan → /crank
│ └─ Large (7+ issues) ─────────► /rpi (full pipeline)
│
├─ "Validate something"
│ ├─ Code ready to ship? ───────► /vibe
│ ├─ Plan ready to build? ──────► /pre-mortem
│ ├─ Work ready to close? ──────► /post-mortem
│ └─ Quick sanity check? ───────► /council --quick validate
│
├─ "Explore or research"
│ ├─ Understand this codebase ──► /research
│ ├─ Compare approaches ────────► /council research <topic>
│ └─ Generate ideas ────────────► /brainstorm
│
├─ "Learn from past work"
│ ├─ What do we know about X? ──► /knowledge <query>
│ ├─ Save this insight ─────────► /learn "insight"
│ └─ Run a retrospective ───────► /retro
│
├─ "Parallelize work"
│ ├─ Multiple independent tasks ► /swarm
│ └─ Full epic with waves ──────► /crank <epic-id>
│
├─ "Ship a release"
│ └─ Changelog + tag ──────────► /release <version>
│
├─ "Session management"
│ ├─ Where was I? ──────────────► /status
│ ├─ Save for next session ─────► /handoff
│ └─ Recover after compaction ──► /recover
│
└─ "First time here" ────────────► /quickstart
Requirements
node18+ (fornpx skills) andgit- One supported runtime: Claude Code, Codex CLI, Cursor, or OpenCode
- Optional for
aoCLI install path shown below: Homebrew (brew)
# Claude Code, Codex CLI, Cursor (most users)
npx skills@latest add boshu2/agentops --all -g
# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bashWorks with: Claude Code · Codex CLI · Cursor · OpenCode — skills are portable across runtimes (/converter exports to native formats).
Then type /quickstart in your agent chat.
# Claude Code plugin (alternative)
claude plugin add boshu2/agentopsnpx skills installs skills into your agent's global skills directory. The plugin path registers AgentOps as a Claude Code plugin instead — same skills, different integration point. Most users should start with npx skills.
The ao CLI — powers the knowledge flywheel
Skills work standalone. The ao CLI powers the automated learning loop — knowledge extraction, injection with freshness decay, maturity lifecycle, and progress gates. Install it when you want knowledge to compound between sessions.
brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops
cd /path/to/your/repo
ao init --hooksThis installs 25+ hooks across core lifecycle events:
| Event | What happens |
|---|---|
| SessionStart | Extract from prior session, inject top learnings (freshness-weighted), check progress gates |
| SessionEnd | Mine transcript for knowledge, record session outcome, expire stale artifacts, evict dead knowledge |
| PreToolUse | Inject coding standards before edits, gate dangerous git ops, validate before push |
| PostToolUse | Advance progress ratchets, track citations |
| TaskCompleted | Validate task output against acceptance criteria |
| Stop/PreCompact | Close feedback loops, snapshot before compaction |
OpenCode — plugin + skills
Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md
Local-only. No telemetry. No cloud. No accounts.
| What | Where | Reversible? |
|---|---|---|
| Skills | Global skills dir (outside your repo; for Claude Code: ~/.claude/skills/) |
npx skills@latest remove boshu2/agentops -g |
| Knowledge artifacts | .agents/ in your repo (git-ignored by default) |
rm -rf .agents/ |
| Hook registration | .claude/settings.json |
ao hooks uninstall or delete entries |
| Git push gate | Pre-push hook (optional, only with CLI) | AGENTOPS_HOOKS_DISABLED=1 |
Nothing modifies your source code. Nothing phones home. Everything is open source — audit it yourself.
Configuration — environment variables
All optional. AgentOps works out of the box with no configuration.
Council / validation:
| Variable | Default | What it does |
|---|---|---|
COUNCIL_TIMEOUT |
120 | Judge timeout in seconds |
COUNCIL_CLAUDE_MODEL |
sonnet | Claude model for judges (opus for high-stakes) |
COUNCIL_CODEX_MODEL |
(user's Codex default) | Override Codex model for --mixed |
COUNCIL_EXPLORER_MODEL |
sonnet | Model for explorer sub-agents |
COUNCIL_EXPLORER_TIMEOUT |
60 | Explorer timeout in seconds |
COUNCIL_R2_TIMEOUT |
90 | Debate round 2 timeout in seconds |
Hooks:
| Variable | Default | What it does |
|---|---|---|
AGENTOPS_HOOKS_DISABLED |
0 | 1 to disable all hooks (kill switch) |
AGENTOPS_PRECOMPACT_DISABLED |
0 | 1 to disable pre-compaction snapshot |
AGENTOPS_TASK_VALIDATION_DISABLED |
0 | 1 to disable task validation gate |
AGENTOPS_SESSION_START_DISABLED |
0 | 1 to disable session-start hook |
AGENTOPS_EVICTION_DISABLED |
0 | 1 to disable knowledge eviction |
AGENTOPS_GITIGNORE_AUTO |
1 | 0 to skip auto-adding .agents/ to .gitignore |
AGENTOPS_WORKER |
0 | 1 to skip push gate (for worker agents) |
Full reference with examples and precedence rules: docs/ENV-VARS.md
Troubleshooting: docs/troubleshooting.md
Standard iterative development — research, plan, validate, build, review, learn — automated for agents that can't carry context between sessions.
This is DevOps thinking applied to agent work: the Three Ways as composable primitives.
- Flow: wave-based execution (
/crank) + workflow orchestration (/rpi) to keep work moving. - Feedback: shift-left validation (
/pre-mortem,/vibe,/council) plus optional gates/hooks to make feedback unavoidable. - Continual learning: post-mortems turn outcomes into reusable knowledge in
.agents/, so the next session starts smarter./flywheelmonitors health.
.agents/ is an append-only ledger with cache-like semantics. Nothing gets overwritten — every learning, council verdict, pattern, and decision is a new dated file. Freshness decay prunes what's stale. The cycle:
Session N ends
→ ao forge: mine transcript for learnings, decisions, patterns
→ ao maturity --expire: mark stale artifacts (freshness decay)
→ ao maturity --evict: archive what's decayed past threshold
Session N+1 starts
→ ao inject --apply-decay: score all artifacts by recency,
inject top-N within token budget
→ Agent starts with institutional knowledge, not a blank slate
Write once, score by freshness, inject the best, prune the rest. If retrieval_rate × usage_rate stays above decay and scale friction, knowledge compounds. If not, growth stalls unless fresh input or stronger controls are added. The formal model is cache eviction with a decay function and limits-to-growth controls.
/rpi "goal"
│
├── /research → /plan → /pre-mortem → /crank → /vibe
│
▼
/post-mortem
├── validates what shipped
├── extracts learnings → .agents/
└── suggests next /rpi command ────┐
│
/rpi "next goal" ◄──────────────────┘
The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.
Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns.
Phase details — what each step does
-
/research— Explores your codebase. Produces a research artifact with findings and recommendations. -
/plan— Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking). -
/pre-mortem— Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries). -
/crank— Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed.--test-firstfor spec-first TDD. -
/vibe— Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3). -
/post-mortem— Council validates the implementation. Retro extracts learnings. Suggests the next/rpicommand.
/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.
Phased RPI — fresh context per phase for larger goals
ao rpi phased "goal" runs each phase in its own session — no context bleed between phases.
ao rpi phased "add rate limiting" # Hands-free, fresh context per phase
ao rpi phased "add auth" & # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=implementation "fix perf" # Resume at execution phase
ao rpi status --watch # Monitor active phased runsUse /rpi when context fits in one session. Use ao rpi phased when it doesn't.
Goal-driven mode — /evolve with GOALS.yaml
Bootstrap with /goals generate — it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:
# GOALS.yaml
version: 1
goals:
- id: test-pass-rate
description: "All tests pass"
check: "make test"
weight: 10Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL
Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.
References — science, systems theory, prior art
Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).
AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.
Deep dive: docs/how-it-works.md — Brownian Ratchet, Ralph Wiggum Pattern, agent backends, hooks, context windowing.
Five pillars, one recursive shape. The same pattern — lead decomposes work, workers execute atomically, validation gates lock progress, next wave begins — repeats at every scale:
/implement ── one worker, one issue, one verify cycle
└── /crank ── waves of /implement (FIRE loop)
└── /rpi ── research → plan → crank → validate → learn
└── /evolve ── fitness-gated /rpi cycles
Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem — not accumulated chat context. Parallel execution works because each unit of work is atomic: no shared mutable state with concurrent workers.
Validation is mechanical, not advisory. Multi-model councils judge before and after implementation. Hooks enforce gates — push blocked until /vibe passes, /crank blocked until /pre-mortem passes. The knowledge flywheel extracts learnings, scores them, and re-injects them at session start so each cycle compounds.
Full treatment: docs/ARCHITECTURE.md — all five pillars, operational invariants, component overview.
Every skill works alone. Compose them however you want.
Judgment — the foundation everything validates against:
| Skill | What it does |
|---|---|
/council |
Independent judges (Claude + Codex) debate, surface disagreement, converge. --preset=security-audit, --perspectives, --debate for adversarial review |
/vibe |
Code quality review — complexity analysis + council |
/pre-mortem |
Validate plans before implementation — council simulates failures |
/post-mortem |
Wrap up completed work — council validates + retro extracts learnings |
Execution — research, plan, build, ship:
| Skill | What it does |
|---|---|
/research |
Deep codebase exploration — produces structured findings |
/plan |
Decompose a goal into trackable issues with dependency waves |
/implement |
Full lifecycle for one task — research, plan, build, validate, learn |
/crank |
Parallel agents in dependency-ordered waves, fresh context per worker |
/swarm |
Parallelize any skill — run research, brainstorms, implementations in parallel |
/rpi |
Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem) |
/evolve |
Measure fitness goals, fix the worst gap, roll back regressions, loop |
Knowledge — the flywheel that makes sessions compound:
| Skill | What it does |
|---|---|
/knowledge |
Query learnings, patterns, and decisions across .agents/ |
/learn |
Manually capture a decision, pattern, or lesson |
/retro |
Extract learnings from completed work |
/flywheel |
Monitor knowledge health — velocity, staleness, pool depths |
Supporting skills:
| Onboarding | /quickstart, /using-agentops |
| Session | /handoff, /recover, /status |
| Traceability | /trace, /provenance |
| Product | /product, /goals, /release, /readme, /doc |
| Utility | /quickstart, /brainstorm, /bug-hunt, /complexity |
Full reference: docs/SKILLS.md
Cross-runtime orchestration — mix Claude, Codex, OpenCode
AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.
| Spawning Backend | How it works | Best for |
|---|---|---|
| Native teams | TeamCreate + SendMessage — built into Claude Code |
Tight coordination, debate |
| Background tasks | Task(run_in_background=true) — last-resort fallback |
When no team APIs available |
| Codex sub-agents | /codex-team — Claude orchestrates Codex workers |
Cross-vendor validation |
| tmux + Agent Mail | /swarm --mode=distributed — full process isolation |
Long-running work, crash recovery |
Distributed mode workers survive disconnects — each runs in its own tmux session with crash recovery. tmux attach to debug live.
Skills work standalone — no CLI required. The ao CLI adds two things: (1) the knowledge flywheel that makes sessions compound (extract, inject, decay, maturity), and (2) terminal-based RPI that runs without an active chat session. Each phase gets its own fresh context window, so large goals don't hit context limits.
ao rpi phased "add rate limiting" # 3 sessions: discover → build → validate
ao rpi phased "fix auth bug" & # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=implementation "ag-058" # Resume at build phase
ao rpi status --watch # Monitor active runsWalk away, come back to committed code + extracted learnings.
ao search "query" # Search knowledge across files and chat history
ao demo # Interactive demoFull reference: CLI Commands
These are fellow experiments in making coding agents work. Use pieces from any of them.
| Alternative | What it does well | Where AgentOps focuses differently |
|---|---|---|
| GSD | Clean subagent spawning, fights context rot | Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions) |
| Compound Engineer | Knowledge compounding, structured loop | Multi-model councils and validation gates — independent judges debating before and after code ships |
docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.
Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL
Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)
Issue tracking — Beads / bd
Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd sync. More: AGENTS.md
See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.
Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · Configuration · CLI Reference · Changelog