AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works · See It Work · Skill Router · Install · Deep Dive · Skills · CLI · FAQ

From goal to shipped code — agents research, plan, and implement in parallel. Councils validate before and after. Every learning feeds the next session.

How It Works

Coding agents get a blank context window every session. AgentOps is a toolbox of primitives — pick the ones you need, skip the ones you don't. Every skill works standalone. Swarm any of them for parallelism. Chain them into a pipeline when you want structure. Knowledge compounds between sessions automatically.

DevOps' Three Ways — applied to the agent loop as composable primitives:

Flow (/research, /plan, /crank, /swarm, /rpi): orchestration skills that move work through the system. Single-piece flow, minimizing context switches. Swarm parallelizes any skill; crank runs dependency-ordered waves; rpi chains the full pipeline.
Feedback (/council, /vibe, /pre-mortem, hooks): shorten the feedback loop until defects can't survive it. Independent judges catch issues before code ships. Hooks make the rules unavoidable — validation gates, push blocking, standards injection. Problems found Friday don't wait until Monday.
Learning (.agents/, ao CLI, /retro, /knowledge): stop rediscovering what you already know. Every session extracts learnings into an append-only ledger, scores them by freshness, and re-injects the best ones at next session start. Session 50 knows what session 1 learned the hard way.

See It Work

/quickstart                          ← Day 1: guided tour on your codebase (~10 min)
    │
Not sure what to do?                 ─────────► /brainstorm
    │
Have an idea of what you want?       ─────────► /research
    │
Ready to scope it cleanly?           ─────────► /plan
    │
/implement (small) · /crank (epic)   ← Build and ship
    │
/vibe → /post-mortem                 ← Validate and learn
    │
/rpi "goal"                          ← One command for the full flow

Use one skill — validate a PR:

> /council validate this PR

[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping

The council verdict, your decisions, and the patterns used are automatically written to .agents/ — an append-only ledger. Nothing gets overwritten. Session ends, hooks extract learnings.

Knowledge compounds — three weeks later, different task, but your agent already knows:

> /research "retry backoff strategies"

[inject] 3 prior learnings loaded (freshness-weighted):
  - Token bucket with Redis (established, high confidence)
  - Rate limit at middleware layer, not per-handler (pattern)
  - /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + injected context
           Recommends: exponential backoff with jitter, reuse existing Redis client

Session 5 didn't start from scratch — it started with what session 1 learned. Stale insights decay automatically.

Parallelize anything with /swarm:

> /swarm "research auth patterns, brainstorm rate limiting improvements"

[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/

Full pipeline — one command, walk away:

> /rpi "add retry backoff to rate limiter"

[research]    Found 3 prior learnings on rate limiting (injected)
[plan]        2 issues, 1 wave → epic ag-0058
[pre-mortem]  Council validates plan → PASS (knew about Redis choice)
[crank]       Parallel agents: Wave 1 ██ 2/2
[vibe]        Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel]    Next: /rpi "add circuit breaker to external API calls"

AgentOps building AgentOps: completed `/crank` across 3 parallel epics (15 issues, 5 waves, 0 regressions).

More examples — /evolve, session continuity

Session continuity across compaction or restart:

> /handoff
[handoff] Saved: 3 open issues, current branch, next action
         Continuation prompt written to .agents/handoffs/

--- next session ---

> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
          Branch: feature/rate-limiter
          Next: /implement ag-0058.3

Goal-driven improvement loop:

> /evolve --max-cycles=5

[evolve] GOALS.yaml: 4 goals loaded
[cycle-1] Measuring fitness... 2/4 passing
         Worst gap: test-pass-rate (weight: 10)
         /rpi "Improve test-pass-rate" → 3 issues, 2 waves
         Re-measure: 3/4 passing ✓
[cycle-2] Worst gap: doc-coverage (weight: 7)
         /rpi "Improve doc-coverage" → 2 issues, 1 wave
         Re-measure: 4/4 passing ✓
[cycle-3] All goals met. Checking harvested work...
         Picked: "add smoke test for /evolve" (from post-mortem)
[teardown] /post-mortem → 5 learnings extracted

Different developers, different setups — use what fits your workflow

The PR reviewer — uses one skill, nothing else:

> /council validate this PR
Consensus: WARN — missing error handling in 2 locations

That's it. No pipeline, no setup, no commitment. One command, actionable feedback.

The team lead — composes skills manually:

> /research "performance bottlenecks in the API layer"
> /plan "optimize database queries identified in research"
> /council validate the plan

Picks skills as needed, stays in control of sequencing.

The solo dev — runs the full pipeline, walks away:

> /rpi "add user authentication"
[3 phases run autonomously, learnings extracted]

One command does research through post-mortem. Comes back to committed code.

The platform team — parallel agents, hands-free improvement:

> /swarm "run /rpi on each of these 3 epics"
> /evolve --max-cycles=5

Swarms full pipelines in parallel. Evolve measures goals and fixes gaps in a loop.

Skill Router

Use this when you're not sure which skill to run.

What are you trying to do?
│
├─ "Not sure what to do yet"
│   └─ Generate options first ─────► /brainstorm
│
├─ "I have an idea"
│   └─ Understand code + context ──► /research
│
├─ "I know what I want to build"
│   └─ Break it into issues ───────► /plan
│
├─ "Now build it"
│   ├─ Small/single issue ─────────► /implement
│   ├─ Multi-issue epic ───────────► /crank <epic-id>
│   └─ Full flow in one command ───► /rpi "goal"
│
├─ "Fix a bug"
│   ├─ Know which file? ──────────► /implement <issue-id>
│   └─ Need to investigate? ──────► /bug-hunt
│
├─ "Build a feature"
│   ├─ Small (1-2 files) ─────────► /implement
│   ├─ Medium (3-6 issues) ───────► /plan → /crank
│   └─ Large (7+ issues) ─────────► /rpi (full pipeline)
│
├─ "Validate something"
│   ├─ Code ready to ship? ───────► /vibe
│   ├─ Plan ready to build? ──────► /pre-mortem
│   ├─ Work ready to close? ──────► /post-mortem
│   └─ Quick sanity check? ───────► /council --quick validate
│
├─ "Explore or research"
│   ├─ Understand this codebase ──► /research
│   ├─ Compare approaches ────────► /council research <topic>
│   └─ Generate ideas ────────────► /brainstorm
│
├─ "Learn from past work"
│   ├─ What do we know about X? ──► /knowledge <query>
│   ├─ Save this insight ─────────► /learn "insight"
│   └─ Run a retrospective ───────► /retro
│
├─ "Parallelize work"
│   ├─ Multiple independent tasks ► /swarm
│   └─ Full epic with waves ──────► /crank <epic-id>
│
├─ "Ship a release"
│   └─ Changelog + tag ──────────► /release <version>
│
├─ "Session management"
│   ├─ Where was I? ──────────────► /status
│   ├─ Save for next session ─────► /handoff
│   └─ Recover after compaction ──► /recover
│
└─ "First time here" ────────────► /quickstart

Install

Requirements

node 18+ (for npx skills) and git
One supported runtime: Claude Code, Codex CLI, Cursor, or OpenCode
Optional for ao CLI install path shown below: Homebrew (brew)

# Claude Code, Codex CLI, Cursor (most users)
npx skills@latest add boshu2/agentops --all -g

# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

Works with: Claude Code · Codex CLI · Cursor · OpenCode — skills are portable across runtimes (/converter exports to native formats).

Then type /quickstart in your agent chat.

# Claude Code plugin (alternative)
claude plugin add boshu2/agentops

npx skills installs skills into your agent's global skills directory. The plugin path registers AgentOps as a Claude Code plugin instead — same skills, different integration point. Most users should start with npx skills.

The ao CLI — powers the knowledge flywheel

Skills work standalone. The ao CLI powers the automated learning loop — knowledge extraction, injection with freshness decay, maturity lifecycle, and progress gates. Install it when you want knowledge to compound between sessions.

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops
cd /path/to/your/repo
ao init --hooks

This installs 25+ hooks across core lifecycle events:

Event	What happens
SessionStart	Extract from prior session, inject top learnings (freshness-weighted), check progress gates
SessionEnd	Mine transcript for knowledge, record session outcome, expire stale artifacts, evict dead knowledge
PreToolUse	Inject coding standards before edits, gate dangerous git ops, validate before push
PostToolUse	Advance progress ratchets, track citations
TaskCompleted	Validate task output against acceptance criteria
Stop/PreCompact	Close feedback loops, snapshot before compaction

OpenCode — plugin + skills

Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md

Local-only. No telemetry. No cloud. No accounts.

What	Where	Reversible?
Skills	Global skills dir (outside your repo; for Claude Code: `~/.claude/skills/`)	`npx skills@latest remove boshu2/agentops -g`
Knowledge artifacts	`.agents/` in your repo (git-ignored by default)	`rm -rf .agents/`
Hook registration	`.claude/settings.json`	`ao hooks uninstall` or delete entries
Git push gate	Pre-push hook (optional, only with CLI)	`AGENTOPS_HOOKS_DISABLED=1`

Nothing modifies your source code. Nothing phones home. Everything is open source — audit it yourself.

Configuration — environment variables

All optional. AgentOps works out of the box with no configuration.

Council / validation:

Variable	Default	What it does
`COUNCIL_TIMEOUT`	120	Judge timeout in seconds
`COUNCIL_CLAUDE_MODEL`	sonnet	Claude model for judges (`opus` for high-stakes)
`COUNCIL_CODEX_MODEL`	(user's Codex default)	Override Codex model for `--mixed`
`COUNCIL_EXPLORER_MODEL`	sonnet	Model for explorer sub-agents
`COUNCIL_EXPLORER_TIMEOUT`	60	Explorer timeout in seconds
`COUNCIL_R2_TIMEOUT`	90	Debate round 2 timeout in seconds

Hooks:

Variable	Default	What it does
`AGENTOPS_HOOKS_DISABLED`	0	`1` to disable all hooks (kill switch)
`AGENTOPS_PRECOMPACT_DISABLED`	0	`1` to disable pre-compaction snapshot
`AGENTOPS_TASK_VALIDATION_DISABLED`	0	`1` to disable task validation gate
`AGENTOPS_SESSION_START_DISABLED`	0	`1` to disable session-start hook
`AGENTOPS_EVICTION_DISABLED`	0	`1` to disable knowledge eviction
`AGENTOPS_GITIGNORE_AUTO`	1	`0` to skip auto-adding `.agents/` to `.gitignore`
`AGENTOPS_WORKER`	0	`1` to skip push gate (for worker agents)

Full reference with examples and precedence rules: docs/ENV-VARS.md

Troubleshooting: docs/troubleshooting.md

Deep Dive

Standard iterative development — research, plan, validate, build, review, learn — automated for agents that can't carry context between sessions.

This is DevOps thinking applied to agent work: the Three Ways as composable primitives.

Flow: wave-based execution (/crank) + workflow orchestration (/rpi) to keep work moving.
Feedback: shift-left validation (/pre-mortem, /vibe, /council) plus optional gates/hooks to make feedback unavoidable.
Continual learning: post-mortems turn outcomes into reusable knowledge in .agents/, so the next session starts smarter. /flywheel monitors health.

The Knowledge Ledger

.agents/ is an append-only ledger with cache-like semantics. Nothing gets overwritten — every learning, council verdict, pattern, and decision is a new dated file. Freshness decay prunes what's stale. The cycle:

Session N ends
    → ao forge: mine transcript for learnings, decisions, patterns
    → ao maturity --expire: mark stale artifacts (freshness decay)
    → ao maturity --evict: archive what's decayed past threshold

Session N+1 starts
    → ao inject --apply-decay: score all artifacts by recency,
      inject top-N within token budget
    → Agent starts with institutional knowledge, not a blank slate

Write once, score by freshness, inject the best, prune the rest. If retrieval_rate × usage_rate stays above decay and scale friction, knowledge compounds. If not, growth stalls unless fresh input or stronger controls are added. The formal model is cache eviction with a decay function and limits-to-growth controls.

  /rpi "goal"
    │
    ├── /research → /plan → /pre-mortem → /crank → /vibe
    │
    ▼
  /post-mortem
    ├── validates what shipped
    ├── extracts learnings → .agents/
    └── suggests next /rpi command ────┐
                                       │
   /rpi "next goal" ◄──────────────────┘

The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.

Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns.

Phase details — what each step does

/research — Explores your codebase. Produces a research artifact with findings and recommendations.
/plan — Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking).
/pre-mortem — Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries).
/crank — Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed. --test-first for spec-first TDD.
/vibe — Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3).
/post-mortem — Council validates the implementation. Retro extracts learnings. Suggests the next /rpi command.

/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.

Phased RPI — fresh context per phase for larger goals

ao rpi phased "goal" runs each phase in its own session — no context bleed between phases.

ao rpi phased "add rate limiting"      # Hands-free, fresh context per phase
ao rpi phased "add auth" &             # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=implementation "fix perf"  # Resume at execution phase
ao rpi status --watch                   # Monitor active phased runs

Use /rpi when context fits in one session. Use ao rpi phased when it doesn't.

Goal-driven mode — /evolve with GOALS.yaml

Bootstrap with /goals generate — it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:

# GOALS.yaml
version: 1
goals:
  - id: test-pass-rate
    description: "All tests pass"
    check: "make test"
    weight: 10

Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL

Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.

References — science, systems theory, prior art

Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).

AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.

Deep dive: docs/how-it-works.md — Brownian Ratchet, Ralph Wiggum Pattern, agent backends, hooks, context windowing.

Architecture

Five pillars, one recursive shape. The same pattern — lead decomposes work, workers execute atomically, validation gates lock progress, next wave begins — repeats at every scale:

/implement ── one worker, one issue, one verify cycle
    └── /crank ── waves of /implement (FIRE loop)
        └── /rpi ── research → plan → crank → validate → learn
            └── /evolve ── fitness-gated /rpi cycles

Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem — not accumulated chat context. Parallel execution works because each unit of work is atomic: no shared mutable state with concurrent workers.

Validation is mechanical, not advisory. Multi-model councils judge before and after implementation. Hooks enforce gates — push blocked until /vibe passes, /crank blocked until /pre-mortem passes. The knowledge flywheel extracts learnings, scores them, and re-injects them at session start so each cycle compounds.

Full treatment: docs/ARCHITECTURE.md — all five pillars, operational invariants, component overview.

Skills

Every skill works alone. Compose them however you want.

Judgment — the foundation everything validates against:

Skill	What it does
`/council`	Independent judges (Claude + Codex) debate, surface disagreement, converge. `--preset=security-audit`, `--perspectives`, `--debate` for adversarial review
`/vibe`	Code quality review — complexity analysis + council
`/pre-mortem`	Validate plans before implementation — council simulates failures
`/post-mortem`	Wrap up completed work — council validates + retro extracts learnings

Execution — research, plan, build, ship:

Skill	What it does
`/research`	Deep codebase exploration — produces structured findings
`/plan`	Decompose a goal into trackable issues with dependency waves
`/implement`	Full lifecycle for one task — research, plan, build, validate, learn
`/crank`	Parallel agents in dependency-ordered waves, fresh context per worker
`/swarm`	Parallelize any skill — run research, brainstorms, implementations in parallel
`/rpi`	Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem)
`/evolve`	Measure fitness goals, fix the worst gap, roll back regressions, loop

Knowledge — the flywheel that makes sessions compound:

Skill	What it does
`/knowledge`	Query learnings, patterns, and decisions across `.agents/`
`/learn`	Manually capture a decision, pattern, or lesson
`/retro`	Extract learnings from completed work
`/flywheel`	Monitor knowledge health — velocity, staleness, pool depths

Supporting skills:


Onboarding	`/quickstart`, `/using-agentops`
Session	`/handoff`, `/recover`, `/status`
Traceability	`/trace`, `/provenance`
Product	`/product`, `/goals`, `/release`, `/readme`, `/doc`
Utility	`/quickstart`, `/brainstorm`, `/bug-hunt`, `/complexity`

Full reference: docs/SKILLS.md

Cross-runtime orchestration — mix Claude, Codex, OpenCode

AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.

Spawning Backend	How it works	Best for
Native teams	`TeamCreate` + `SendMessage` — built into Claude Code	Tight coordination, debate
Background tasks	`Task(run_in_background=true)` — last-resort fallback	When no team APIs available
Codex sub-agents	`/codex-team` — Claude orchestrates Codex workers	Cross-vendor validation
tmux + Agent Mail	`/swarm --mode=distributed` — full process isolation	Long-running work, crash recovery

Distributed mode workers survive disconnects — each runs in its own tmux session with crash recovery. tmux attach to debug live.

The `ao` CLI

Skills work standalone — no CLI required. The ao CLI adds two things: (1) the knowledge flywheel that makes sessions compound (extract, inject, decay, maturity), and (2) terminal-based RPI that runs without an active chat session. Each phase gets its own fresh context window, so large goals don't hit context limits.

ao rpi phased "add rate limiting"              # 3 sessions: discover → build → validate
ao rpi phased "fix auth bug" &                 # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=implementation "ag-058"   # Resume at build phase
ao rpi status --watch                          # Monitor active runs

Walk away, come back to committed code + extracted learnings.

ao search "query"      # Search knowledge across files and chat history
ao demo                # Interactive demo

Full reference: CLI Commands

How AgentOps Fits With Other Tools

These are fellow experiments in making coding agents work. Use pieces from any of them.

Alternative	What it does well	Where AgentOps focuses differently
GSD	Clean subagent spawning, fights context rot	Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions)
Compound Engineer	Knowledge compounding, structured loop	Multi-model councils and validation gates — independent judges debating before and after code ships

Detailed comparisons →

FAQ

docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.

Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL

Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)

Contributing

Issue tracking — Beads / bd

Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd sync. More: AGENTS.md

See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.

License

Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · Configuration · CLI Reference · Changelog

Name		Name	Last commit message	Last commit date
Latest commit History 796 Commits
.agents		.agents
.beads		.beads
.claude-plugin		.claude-plugin
.claude		.claude
.codex		.codex
.githooks		.githooks
.github		.github
.opencode		.opencode
bin		bin
cli		cli
docs		docs
homebrew-tap		homebrew-tap
hooks		hooks
lib		lib
schemas		schemas
scripts		scripts
skills		skills
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.goreleaser.yml		.goreleaser.yml
.markdownlint.json		.markdownlint.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOALS.yaml		GOALS.yaml
LICENSE		LICENSE
PRODUCT.md		PRODUCT.md
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works

See It Work

Skill Router

Install

Deep Dive

The Knowledge Ledger

Architecture

Skills

The `ao` CLI

How AgentOps Fits With Other Tools

FAQ

Contributing

License

About

Uh oh!

Releases 41

Uh oh!

Contributors 5

Uh oh!

Languages

License

boshu2/agentops

Folders and files

Latest commit

History

Repository files navigation

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works

See It Work

Skill Router

Install

Deep Dive

The Knowledge Ledger

Architecture

Skills

The ao CLI

How AgentOps Fits With Other Tools

FAQ

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 41

Uh oh!

Contributors 5

Uh oh!

Languages

The `ao` CLI