Skip to content

The missing DevOps layer for coding agents. Flow, feedback, and memory that compounds between sessions.

License

Notifications You must be signed in to change notification settings

boshu2/agentops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

796 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works · See It Work · Skill Router · Install · Deep Dive · Skills · CLI · FAQ

Agents running full development cycles in parallel with validation gates and a coordinating team leader
From goal to shipped code — agents research, plan, and implement in parallel. Councils validate before and after. Every learning feeds the next session.


How It Works

Coding agents get a blank context window every session. AgentOps is a toolbox of primitives — pick the ones you need, skip the ones you don't. Every skill works standalone. Swarm any of them for parallelism. Chain them into a pipeline when you want structure. Knowledge compounds between sessions automatically.

DevOps' Three Ways — applied to the agent loop as composable primitives:

  • Flow (/research, /plan, /crank, /swarm, /rpi): orchestration skills that move work through the system. Single-piece flow, minimizing context switches. Swarm parallelizes any skill; crank runs dependency-ordered waves; rpi chains the full pipeline.
  • Feedback (/council, /vibe, /pre-mortem, hooks): shorten the feedback loop until defects can't survive it. Independent judges catch issues before code ships. Hooks make the rules unavoidable — validation gates, push blocking, standards injection. Problems found Friday don't wait until Monday.
  • Learning (.agents/, ao CLI, /retro, /knowledge): stop rediscovering what you already know. Every session extracts learnings into an append-only ledger, scores them by freshness, and re-injects the best ones at next session start. Session 50 knows what session 1 learned the hard way.

See It Work

/quickstart                          ← Day 1: guided tour on your codebase (~10 min)
    │
Not sure what to do?                 ─────────► /brainstorm
    │
Have an idea of what you want?       ─────────► /research
    │
Ready to scope it cleanly?           ─────────► /plan
    │
/implement (small) · /crank (epic)   ← Build and ship
    │
/vibe → /post-mortem                 ← Validate and learn
    │
/rpi "goal"                          ← One command for the full flow

Use one skill — validate a PR:

> /council validate this PR

[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping

The council verdict, your decisions, and the patterns used are automatically written to .agents/ — an append-only ledger. Nothing gets overwritten. Session ends, hooks extract learnings.

Knowledge compounds — three weeks later, different task, but your agent already knows:

> /research "retry backoff strategies"

[inject] 3 prior learnings loaded (freshness-weighted):
  - Token bucket with Redis (established, high confidence)
  - Rate limit at middleware layer, not per-handler (pattern)
  - /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + injected context
           Recommends: exponential backoff with jitter, reuse existing Redis client

Session 5 didn't start from scratch — it started with what session 1 learned. Stale insights decay automatically.

Parallelize anything with /swarm:

> /swarm "research auth patterns, brainstorm rate limiting improvements"

[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/

Full pipeline — one command, walk away:

> /rpi "add retry backoff to rate limiter"

[research]    Found 3 prior learnings on rate limiting (injected)
[plan]        2 issues, 1 wave → epic ag-0058
[pre-mortem]  Council validates plan → PASS (knew about Redis choice)
[crank]       Parallel agents: Wave 1 ██ 2/2
[vibe]        Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel]    Next: /rpi "add circuit breaker to external API calls"

Completed crank run with 3 parallel epics and 15 issues shipped in 5 waves
AgentOps building AgentOps: completed `/crank` across 3 parallel epics (15 issues, 5 waves, 0 regressions).

More examples — /evolve, session continuity

Session continuity across compaction or restart:

> /handoff
[handoff] Saved: 3 open issues, current branch, next action
         Continuation prompt written to .agents/handoffs/

--- next session ---

> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
          Branch: feature/rate-limiter
          Next: /implement ag-0058.3

Goal-driven improvement loop:

> /evolve --max-cycles=5

[evolve] GOALS.yaml: 4 goals loaded
[cycle-1] Measuring fitness... 2/4 passing
         Worst gap: test-pass-rate (weight: 10)
         /rpi "Improve test-pass-rate" → 3 issues, 2 waves
         Re-measure: 3/4 passing ✓
[cycle-2] Worst gap: doc-coverage (weight: 7)
         /rpi "Improve doc-coverage" → 2 issues, 1 wave
         Re-measure: 4/4 passing ✓
[cycle-3] All goals met. Checking harvested work...
         Picked: "add smoke test for /evolve" (from post-mortem)
[teardown] /post-mortem → 5 learnings extracted
Different developers, different setups — use what fits your workflow

The PR reviewer — uses one skill, nothing else:

> /council validate this PR
Consensus: WARN — missing error handling in 2 locations

That's it. No pipeline, no setup, no commitment. One command, actionable feedback.

The team lead — composes skills manually:

> /research "performance bottlenecks in the API layer"
> /plan "optimize database queries identified in research"
> /council validate the plan

Picks skills as needed, stays in control of sequencing.

The solo dev — runs the full pipeline, walks away:

> /rpi "add user authentication"
[3 phases run autonomously, learnings extracted]

One command does research through post-mortem. Comes back to committed code.

The platform team — parallel agents, hands-free improvement:

> /swarm "run /rpi on each of these 3 epics"
> /evolve --max-cycles=5

Swarms full pipelines in parallel. Evolve measures goals and fixes gaps in a loop.


Skill Router

Use this when you're not sure which skill to run.

What are you trying to do?
│
├─ "Not sure what to do yet"
│   └─ Generate options first ─────► /brainstorm
│
├─ "I have an idea"
│   └─ Understand code + context ──► /research
│
├─ "I know what I want to build"
│   └─ Break it into issues ───────► /plan
│
├─ "Now build it"
│   ├─ Small/single issue ─────────► /implement
│   ├─ Multi-issue epic ───────────► /crank <epic-id>
│   └─ Full flow in one command ───► /rpi "goal"
│
├─ "Fix a bug"
│   ├─ Know which file? ──────────► /implement <issue-id>
│   └─ Need to investigate? ──────► /bug-hunt
│
├─ "Build a feature"
│   ├─ Small (1-2 files) ─────────► /implement
│   ├─ Medium (3-6 issues) ───────► /plan → /crank
│   └─ Large (7+ issues) ─────────► /rpi (full pipeline)
│
├─ "Validate something"
│   ├─ Code ready to ship? ───────► /vibe
│   ├─ Plan ready to build? ──────► /pre-mortem
│   ├─ Work ready to close? ──────► /post-mortem
│   └─ Quick sanity check? ───────► /council --quick validate
│
├─ "Explore or research"
│   ├─ Understand this codebase ──► /research
│   ├─ Compare approaches ────────► /council research <topic>
│   └─ Generate ideas ────────────► /brainstorm
│
├─ "Learn from past work"
│   ├─ What do we know about X? ──► /knowledge <query>
│   ├─ Save this insight ─────────► /learn "insight"
│   └─ Run a retrospective ───────► /retro
│
├─ "Parallelize work"
│   ├─ Multiple independent tasks ► /swarm
│   └─ Full epic with waves ──────► /crank <epic-id>
│
├─ "Ship a release"
│   └─ Changelog + tag ──────────► /release <version>
│
├─ "Session management"
│   ├─ Where was I? ──────────────► /status
│   ├─ Save for next session ─────► /handoff
│   └─ Recover after compaction ──► /recover
│
└─ "First time here" ────────────► /quickstart

Install

Requirements

  • node 18+ (for npx skills) and git
  • One supported runtime: Claude Code, Codex CLI, Cursor, or OpenCode
  • Optional for ao CLI install path shown below: Homebrew (brew)
# Claude Code, Codex CLI, Cursor (most users)
npx skills@latest add boshu2/agentops --all -g

# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

Works with: Claude Code · Codex CLI · Cursor · OpenCode — skills are portable across runtimes (/converter exports to native formats).

Then type /quickstart in your agent chat.

# Claude Code plugin (alternative)
claude plugin add boshu2/agentops

npx skills installs skills into your agent's global skills directory. The plugin path registers AgentOps as a Claude Code plugin instead — same skills, different integration point. Most users should start with npx skills.

The ao CLI — powers the knowledge flywheel

Skills work standalone. The ao CLI powers the automated learning loop — knowledge extraction, injection with freshness decay, maturity lifecycle, and progress gates. Install it when you want knowledge to compound between sessions.

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops
cd /path/to/your/repo
ao init --hooks

This installs 25+ hooks across core lifecycle events:

Event What happens
SessionStart Extract from prior session, inject top learnings (freshness-weighted), check progress gates
SessionEnd Mine transcript for knowledge, record session outcome, expire stale artifacts, evict dead knowledge
PreToolUse Inject coding standards before edits, gate dangerous git ops, validate before push
PostToolUse Advance progress ratchets, track citations
TaskCompleted Validate task output against acceptance criteria
Stop/PreCompact Close feedback loops, snapshot before compaction
OpenCode — plugin + skills

Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md

Local-only. No telemetry. No cloud. No accounts.

What Where Reversible?
Skills Global skills dir (outside your repo; for Claude Code: ~/.claude/skills/) npx skills@latest remove boshu2/agentops -g
Knowledge artifacts .agents/ in your repo (git-ignored by default) rm -rf .agents/
Hook registration .claude/settings.json ao hooks uninstall or delete entries
Git push gate Pre-push hook (optional, only with CLI) AGENTOPS_HOOKS_DISABLED=1

Nothing modifies your source code. Nothing phones home. Everything is open source — audit it yourself.

Configuration — environment variables

All optional. AgentOps works out of the box with no configuration.

Council / validation:

Variable Default What it does
COUNCIL_TIMEOUT 120 Judge timeout in seconds
COUNCIL_CLAUDE_MODEL sonnet Claude model for judges (opus for high-stakes)
COUNCIL_CODEX_MODEL (user's Codex default) Override Codex model for --mixed
COUNCIL_EXPLORER_MODEL sonnet Model for explorer sub-agents
COUNCIL_EXPLORER_TIMEOUT 60 Explorer timeout in seconds
COUNCIL_R2_TIMEOUT 90 Debate round 2 timeout in seconds

Hooks:

Variable Default What it does
AGENTOPS_HOOKS_DISABLED 0 1 to disable all hooks (kill switch)
AGENTOPS_PRECOMPACT_DISABLED 0 1 to disable pre-compaction snapshot
AGENTOPS_TASK_VALIDATION_DISABLED 0 1 to disable task validation gate
AGENTOPS_SESSION_START_DISABLED 0 1 to disable session-start hook
AGENTOPS_EVICTION_DISABLED 0 1 to disable knowledge eviction
AGENTOPS_GITIGNORE_AUTO 1 0 to skip auto-adding .agents/ to .gitignore
AGENTOPS_WORKER 0 1 to skip push gate (for worker agents)

Full reference with examples and precedence rules: docs/ENV-VARS.md

Troubleshooting: docs/troubleshooting.md


Deep Dive

Standard iterative development — research, plan, validate, build, review, learn — automated for agents that can't carry context between sessions.

This is DevOps thinking applied to agent work: the Three Ways as composable primitives.

  • Flow: wave-based execution (/crank) + workflow orchestration (/rpi) to keep work moving.
  • Feedback: shift-left validation (/pre-mortem, /vibe, /council) plus optional gates/hooks to make feedback unavoidable.
  • Continual learning: post-mortems turn outcomes into reusable knowledge in .agents/, so the next session starts smarter. /flywheel monitors health.

The Knowledge Ledger

.agents/ is an append-only ledger with cache-like semantics. Nothing gets overwritten — every learning, council verdict, pattern, and decision is a new dated file. Freshness decay prunes what's stale. The cycle:

Session N ends
    → ao forge: mine transcript for learnings, decisions, patterns
    → ao maturity --expire: mark stale artifacts (freshness decay)
    → ao maturity --evict: archive what's decayed past threshold

Session N+1 starts
    → ao inject --apply-decay: score all artifacts by recency,
      inject top-N within token budget
    → Agent starts with institutional knowledge, not a blank slate

Write once, score by freshness, inject the best, prune the rest. If retrieval_rate × usage_rate stays above decay and scale friction, knowledge compounds. If not, growth stalls unless fresh input or stronger controls are added. The formal model is cache eviction with a decay function and limits-to-growth controls.

  /rpi "goal"
    │
    ├── /research → /plan → /pre-mortem → /crank → /vibe
    │
    ▼
  /post-mortem
    ├── validates what shipped
    ├── extracts learnings → .agents/
    └── suggests next /rpi command ────┐
                                       │
   /rpi "next goal" ◄──────────────────┘

The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.

Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns.

Phase details — what each step does
  1. /research — Explores your codebase. Produces a research artifact with findings and recommendations.

  2. /plan — Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking).

  3. /pre-mortem — Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries).

  4. /crank — Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed. --test-first for spec-first TDD.

  5. /vibe — Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3).

  6. /post-mortem — Council validates the implementation. Retro extracts learnings. Suggests the next /rpi command.

/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.

Phased RPI — fresh context per phase for larger goals

ao rpi phased "goal" runs each phase in its own session — no context bleed between phases.

ao rpi phased "add rate limiting"      # Hands-free, fresh context per phase
ao rpi phased "add auth" &             # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=implementation "fix perf"  # Resume at execution phase
ao rpi status --watch                   # Monitor active phased runs

Use /rpi when context fits in one session. Use ao rpi phased when it doesn't.

Goal-driven mode — /evolve with GOALS.yaml

Bootstrap with /goals generate — it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:

# GOALS.yaml
version: 1
goals:
  - id: test-pass-rate
    description: "All tests pass"
    check: "make test"
    weight: 10

Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL

Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.

References — science, systems theory, prior art

Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).

AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.

Deep dive: docs/how-it-works.md — Brownian Ratchet, Ralph Wiggum Pattern, agent backends, hooks, context windowing.


Architecture

Five pillars, one recursive shape. The same pattern — lead decomposes work, workers execute atomically, validation gates lock progress, next wave begins — repeats at every scale:

/implement ── one worker, one issue, one verify cycle
    └── /crank ── waves of /implement (FIRE loop)
        └── /rpi ── research → plan → crank → validate → learn
            └── /evolve ── fitness-gated /rpi cycles

Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem — not accumulated chat context. Parallel execution works because each unit of work is atomic: no shared mutable state with concurrent workers.

Validation is mechanical, not advisory. Multi-model councils judge before and after implementation. Hooks enforce gates — push blocked until /vibe passes, /crank blocked until /pre-mortem passes. The knowledge flywheel extracts learnings, scores them, and re-injects them at session start so each cycle compounds.

Full treatment: docs/ARCHITECTURE.md — all five pillars, operational invariants, component overview.


Skills

Every skill works alone. Compose them however you want.

Judgment — the foundation everything validates against:

Skill What it does
/council Independent judges (Claude + Codex) debate, surface disagreement, converge. --preset=security-audit, --perspectives, --debate for adversarial review
/vibe Code quality review — complexity analysis + council
/pre-mortem Validate plans before implementation — council simulates failures
/post-mortem Wrap up completed work — council validates + retro extracts learnings

Execution — research, plan, build, ship:

Skill What it does
/research Deep codebase exploration — produces structured findings
/plan Decompose a goal into trackable issues with dependency waves
/implement Full lifecycle for one task — research, plan, build, validate, learn
/crank Parallel agents in dependency-ordered waves, fresh context per worker
/swarm Parallelize any skill — run research, brainstorms, implementations in parallel
/rpi Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem)
/evolve Measure fitness goals, fix the worst gap, roll back regressions, loop

Knowledge — the flywheel that makes sessions compound:

Skill What it does
/knowledge Query learnings, patterns, and decisions across .agents/
/learn Manually capture a decision, pattern, or lesson
/retro Extract learnings from completed work
/flywheel Monitor knowledge health — velocity, staleness, pool depths

Supporting skills:

Onboarding /quickstart, /using-agentops
Session /handoff, /recover, /status
Traceability /trace, /provenance
Product /product, /goals, /release, /readme, /doc
Utility /quickstart, /brainstorm, /bug-hunt, /complexity

Full reference: docs/SKILLS.md

Cross-runtime orchestration — mix Claude, Codex, OpenCode

AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.

Spawning Backend How it works Best for
Native teams TeamCreate + SendMessage — built into Claude Code Tight coordination, debate
Background tasks Task(run_in_background=true) — last-resort fallback When no team APIs available
Codex sub-agents /codex-team — Claude orchestrates Codex workers Cross-vendor validation
tmux + Agent Mail /swarm --mode=distributed — full process isolation Long-running work, crash recovery

Distributed mode workers survive disconnects — each runs in its own tmux session with crash recovery. tmux attach to debug live.


The ao CLI

Skills work standalone — no CLI required. The ao CLI adds two things: (1) the knowledge flywheel that makes sessions compound (extract, inject, decay, maturity), and (2) terminal-based RPI that runs without an active chat session. Each phase gets its own fresh context window, so large goals don't hit context limits.

ao rpi phased "add rate limiting"              # 3 sessions: discover → build → validate
ao rpi phased "fix auth bug" &                 # Run multiple in parallel (auto-worktrees)
ao rpi phased --from=implementation "ag-058"   # Resume at build phase
ao rpi status --watch                          # Monitor active runs

Walk away, come back to committed code + extracted learnings.

ao search "query"      # Search knowledge across files and chat history
ao demo                # Interactive demo

Full reference: CLI Commands


How AgentOps Fits With Other Tools

These are fellow experiments in making coding agents work. Use pieces from any of them.

Alternative What it does well Where AgentOps focuses differently
GSD Clean subagent spawning, fights context rot Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions)
Compound Engineer Knowledge compounding, structured loop Multi-model councils and validation gates — independent judges debating before and after code ships

Detailed comparisons →


FAQ

docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.


Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL

Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)

Contributing

Issue tracking — Beads / bd

Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd sync. More: AGENTS.md

See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.

License

Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · Configuration · CLI Reference · Changelog