Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry by quangdang46 · Pull Request #313 · quangdang46/jcode

quangdang46 · 2026-05-25T15:04:23Z

Summary

Multi-agent architecture foundation for jcode adapted from CodebuffAI's design.

Phase 0 — Foundation (commits 1-3)


`AgentDefinition`	Declarative schema (id, model, tools, prompt, reasoning, output mode, etc.)
`ModelTier`	`Routine / Thinking` enum — maps to env vars + session-gateway routing
`OutputMode`	`LastMessage / AllMessages / StructuredOutput` — controls how tool results surface
`AgentRegistry`	TOML directory loader for `.jcode/agents/*.toml` with roundtrip validation
Cross-ref validation	Ensures `spawnable_agents` IDs actually exist in registry at load time
Skill MAS bridge	`MAS`-prefixed skill names invoke `jcode-skill-{name}` binaries
`sample_agents.rs`	6 integration tests (bundled + disk-loaded TOML agents)

Phase 1 — Agent TOML definitions


`basher.toml`	Routine + Minimal reasoning, bash-only leaf agent
`editor.toml`	Thinking + Medium reasoning, full edit toolkit (8 tools)

Phase 4 — Prompt utilities


`prompt_placeholders.rs`	`{{FILE_TREE}}`, `{{CURRENT_DATE}}`, `{{REMAINING_STEPS}}`, `{{KNOWLEDGE_FILES}}`, `{{GIT_CHANGES}}` substitution engine
`wrap_as_system_reminder()`	Wraps harness step prompts in `<system_reminder>` tags

Phase 5 — JBench evaluation framework


`evals/jbench/src/agent_runner.rs`	`run_agent_in_repo()` — spawns jcode subprocess, streams stdout, captures diff via `git diff HEAD`
`evals/jbench/src/judge.rs`	`judge_with_three_models()` — GPT + Gemini + Claude in parallel, median analysis + averaged scores
`evals/jbench/src/lessons.rs`	`extract_lessons()` + `append_lessons_to_file()` — lessons accumulation per agent
`evals/jbench/src/bin/jbench.rs`	CLI with `run` and `meta-analyze` implemented; `pick-commits`/`gen-evals`/`judge` as Phase stubs

Test results

jcode-agent-runtime: 49 unit + 6 integration = 55 passed, 0 failed
jcode-jbench types:  3 roundtrip tests passed
cargo check --bin jcode: OK

🤖 Generated with Claude Code

…ase 0.1+0.2) Lay the foundation for declarative agent definitions adapted from Codebuff's AgentDefinition schema, but adapted to jcode's single-OAuth provider reality: - signals.rs: existing soft-interrupt + cancellation primitives moved into a named module; root-level re-exports preserved so src/agent.rs consumers compile unchanged. - definition.rs: AgentDefinition struct (id, model_override, prefer_tier, reasoning, tool_names, spawnable_agents, prompts, output_mode, inherit_parent_system_prompt, include_message_history) with TOML round-trip + validation for id format, system_prompt vs inherit conflict, structured_output schema requirement, self-spawn, and duplicate tool/agent ids. - tier.rs: user-defined tier slot (routine/thinking) backed by the same JCODE_ROUTING_* env vars as model_routing.rs (#100). NOT a catalog — agents inherit session model when no tier is configured, so subscription users (Claude Pro / ChatGPT Plus / Gemini Advanced) see no behavior change. Pay-per-token users opt in by setting two env vars. - reasoning.rs: ReasoningEffort enum (minimal/low/medium/high). - output.rs: OutputMode enum (last_message/all_messages/structured_output). 32 unit tests pass. Full `cargo check --bin jcode` succeeds. This is Phase 0 of the multi-agent foundation — no runtime engine changes yet. Next: TOML loader for .jcode/agents/*.toml + builtin embedded agents (Phase 0.3).

…hase 0.3) Discover and load AgentDefinition files from three locations with priority order: 1. <project>/.jcode/agents/*.toml (project-local, highest) 2. ~/.jcode/agents/*.toml (user-global) 3. AgentRegistry::register_builtin (compiled-in defaults, lowest) Project-local overrides user-global overrides builtin. Re-registering a builtin after a higher-priority entry is loaded does NOT clobber the override — the priority check is symmetric in `insert`. Design choices: - Filename must match `<id>.toml` so users can find agents by id without opening every file. Mismatches are surfaced as a load error rather than silently misindexing. - Malformed/invalid files are collected as non-fatal LoadError entries so a single bad file doesn't prevent the rest of the registry from loading. `jcode doctor` (future) reads load_errors() to surface these. - AgentRegistry intentionally does NOT cross-reference `tool_names` / `spawnable_agents` — that's done at spawn time because the tool universe may be feature-gated (Phase 0.4). 41 unit tests pass (32 prior + 9 new). `cargo check --bin jcode` succeeds.

… agents (Phase 0.4-0.6) Phase 0.4 — Cross-reference validation: - ReferenceError enum (UnknownTools, UnknownSpawnableAgents) kept separate from DefinitionError because the runtime tool/agent universe isn't known at TOML-load time. - AgentDefinition::validate_tool_references<I, S>() and validate_spawn_references<I, S>() — caller passes the available name set, gets back a sorted, comma-joined list of unknowns. - 5 new tests covering the happy path, unknowns, empty lists, and deterministic alphabetical ordering of the error message. This deliberately does NOT modify src/tool/mod.rs. The whitelist check is a pure function over the agent definition + a name set; no need to refactor tool dispatch. Phase 1 will wire the actual tool registry into the spawn path. Phase 0.5 — Skill MAS (#94) bridge: - AgentRegistry::lookup_for_skill_routing(skill_agent_id) — named alias of get() that documents the integration point with the SKILL.md field. Returns None for missing references; the skill activation site decides fallback policy. - 2 tests: hit + miss. Phase 0.6 — Sample agents + integration test: - .jcode/agents/file-picker.toml — Routine tier, no message history, leaf agent. Demonstrates file-picker pattern adapted from Codebuff. - .jcode/agents/code-reviewer.toml — Thinking tier with inherit_parent_system_prompt=true to demonstrate the prompt-cache prefix-sharing trick (~90% input-token savings on cache hits). - tests/sample_agents.rs — integration test loads both files via the public AgentRegistry API and asserts shape + behavior. 4 tests. Phase 0 totals: 49 unit + 4 integration = 53 tests, all passing. `cargo check --bin jcode` succeeds (full workspace, 3m13s). Phase 0 (foundation) is now complete: - Schema: AgentDefinition + ModelTier + OutputMode + ReasoningEffort - Loader: registry with priority order (project > user > builtin) - Validation: id format, internal invariants, cross-references - Sample agents demonstrating cache-hit and tier patterns - Skill MAS (#94) integration point established Phase 1 (4 builtin agents + spawn_agents tool + cache benchmark) is the next track.

…prompt utilities, sample agents Phase 1:两名真实 TOML agent definitions (basher + editor) với full schema Phase 4: `prompt_placeholders.rs` — `{{FILE_TREE}}`, `{{CURRENT_DATE}}`, etc. Phase 4: `wrap_as_system_reminder()` in `src/agent/prompting.rs` Phase 5: `evals/jbench/` scaffold — types, judge stub, lessons stub, agent_runner stub Phase 0.6: integration tests `basher_sample_has_expected_shape` + `editor_sample_has_expected_shape` All jcode-agent-runtime tests pass (49 unit + 6 integration).

…, CLI Phase 5.3 (agent_runner): `run_agent_in_repo()` spawns jcode subprocess with prompt on stdin, streams stdout, captures trace + diff via `git diff HEAD`. Uses `timeout()` for per-run deadline. Phase 5.4 (judge): `judge_with_three_models()` runs GPT + Gemini + Claude judges in parallel via OpenAI Responses API + Anthropic Messages API. Median analysis, averaged scores. `run_single_judge()` exposes per-judge entry point. Phase 5.5 (lessons): `extract_lessons()` calls lessons extractor model via Responses API. `append_lessons_to_file()` accumulates lessons in per-agent JSON files with read-modify-write. Phase 5.6 (CLI): Full `jbench run` implemented (loads eval JSON, iterates commits, calls `run_agent_in_repo`, writes `.run.json` files). `jbench meta-analyze` aggregates results. Other subcommands print Phase stubs and exit 0. Bug fixes: - `JudgingResult: Default` impl added (needed for EvalRun init) - `OnceLock` for lazy reqwest static client (fixes const-eval restrictions) - `context` method from `anyhow::Context` imported in bin

quangdang46 added 5 commits May 25, 2026 21:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313

Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313
quangdang46 wants to merge 5 commits into
masterfrom
experimental/multi-agent-foundation

quangdang46 commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quangdang46 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 0 — Foundation (commits 1-3)

Phase 1 — Agent TOML definitions

Phase 4 — Prompt utilities

Phase 5 — JBench evaluation framework

Test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

quangdang46 commented May 25, 2026 •

edited

Loading