Skip to content

Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313

Open
quangdang46 wants to merge 5 commits into
masterfrom
experimental/multi-agent-foundation
Open

Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313
quangdang46 wants to merge 5 commits into
masterfrom
experimental/multi-agent-foundation

Conversation

@quangdang46
Copy link
Copy Markdown
Owner

@quangdang46 quangdang46 commented May 25, 2026

Summary

Multi-agent architecture foundation for jcode adapted from CodebuffAI's design.

Phase 0 — Foundation (commits 1-3)

AgentDefinition Declarative schema (id, model, tools, prompt, reasoning, output mode, etc.)
ModelTier Routine / Thinking enum — maps to env vars + session-gateway routing
OutputMode LastMessage / AllMessages / StructuredOutput — controls how tool results surface
AgentRegistry TOML directory loader for .jcode/agents/*.toml with roundtrip validation
Cross-ref validation Ensures spawnable_agents IDs actually exist in registry at load time
Skill MAS bridge MAS-prefixed skill names invoke jcode-skill-{name} binaries
sample_agents.rs 6 integration tests (bundled + disk-loaded TOML agents)

Phase 1 — Agent TOML definitions

basher.toml Routine + Minimal reasoning, bash-only leaf agent
editor.toml Thinking + Medium reasoning, full edit toolkit (8 tools)

Phase 4 — Prompt utilities

prompt_placeholders.rs {{FILE_TREE}}, {{CURRENT_DATE}}, {{REMAINING_STEPS}}, {{KNOWLEDGE_FILES}}, {{GIT_CHANGES}} substitution engine
wrap_as_system_reminder() Wraps harness step prompts in <system_reminder> tags

Phase 5 — JBench evaluation framework

evals/jbench/src/agent_runner.rs run_agent_in_repo() — spawns jcode subprocess, streams stdout, captures diff via git diff HEAD
evals/jbench/src/judge.rs judge_with_three_models() — GPT + Gemini + Claude in parallel, median analysis + averaged scores
evals/jbench/src/lessons.rs extract_lessons() + append_lessons_to_file() — lessons accumulation per agent
evals/jbench/src/bin/jbench.rs CLI with run and meta-analyze implemented; pick-commits/gen-evals/judge as Phase stubs

Test results

jcode-agent-runtime: 49 unit + 6 integration = 55 passed, 0 failed
jcode-jbench types:  3 roundtrip tests passed
cargo check --bin jcode: OK

🤖 Generated with Claude Code

…ase 0.1+0.2)

Lay the foundation for declarative agent definitions adapted from
Codebuff's AgentDefinition schema, but adapted to jcode's single-OAuth
provider reality:

- signals.rs: existing soft-interrupt + cancellation primitives moved
  into a named module; root-level re-exports preserved so src/agent.rs
  consumers compile unchanged.
- definition.rs: AgentDefinition struct (id, model_override, prefer_tier,
  reasoning, tool_names, spawnable_agents, prompts, output_mode,
  inherit_parent_system_prompt, include_message_history) with TOML
  round-trip + validation for id format, system_prompt vs inherit
  conflict, structured_output schema requirement, self-spawn, and
  duplicate tool/agent ids.
- tier.rs: user-defined tier slot (routine/thinking) backed by the
  same JCODE_ROUTING_* env vars as model_routing.rs (#100). NOT a
  catalog — agents inherit session model when no tier is configured,
  so subscription users (Claude Pro / ChatGPT Plus / Gemini Advanced)
  see no behavior change. Pay-per-token users opt in by setting two
  env vars.
- reasoning.rs: ReasoningEffort enum (minimal/low/medium/high).
- output.rs: OutputMode enum (last_message/all_messages/structured_output).

32 unit tests pass. Full `cargo check --bin jcode` succeeds.

This is Phase 0 of the multi-agent foundation — no runtime engine
changes yet. Next: TOML loader for .jcode/agents/*.toml + builtin
embedded agents (Phase 0.3).
…hase 0.3)

Discover and load AgentDefinition files from three locations with
priority order:

  1. <project>/.jcode/agents/*.toml   (project-local, highest)
  2. ~/.jcode/agents/*.toml           (user-global)
  3. AgentRegistry::register_builtin  (compiled-in defaults, lowest)

Project-local overrides user-global overrides builtin. Re-registering
a builtin after a higher-priority entry is loaded does NOT clobber the
override — the priority check is symmetric in `insert`.

Design choices:

- Filename must match `<id>.toml` so users can find agents by id without
  opening every file. Mismatches are surfaced as a load error rather
  than silently misindexing.
- Malformed/invalid files are collected as non-fatal LoadError entries
  so a single bad file doesn't prevent the rest of the registry from
  loading. `jcode doctor` (future) reads load_errors() to surface
  these.
- AgentRegistry intentionally does NOT cross-reference `tool_names` /
  `spawnable_agents` — that's done at spawn time because the tool
  universe may be feature-gated (Phase 0.4).

41 unit tests pass (32 prior + 9 new). `cargo check --bin jcode` succeeds.
… agents (Phase 0.4-0.6)

Phase 0.4 — Cross-reference validation:
  - ReferenceError enum (UnknownTools, UnknownSpawnableAgents) kept
    separate from DefinitionError because the runtime tool/agent
    universe isn't known at TOML-load time.
  - AgentDefinition::validate_tool_references<I, S>() and
    validate_spawn_references<I, S>() — caller passes the available
    name set, gets back a sorted, comma-joined list of unknowns.
  - 5 new tests covering the happy path, unknowns, empty lists,
    and deterministic alphabetical ordering of the error message.

  This deliberately does NOT modify src/tool/mod.rs. The whitelist
  check is a pure function over the agent definition + a name set;
  no need to refactor tool dispatch. Phase 1 will wire the actual
  tool registry into the spawn path.

Phase 0.5 — Skill MAS (#94) bridge:
  - AgentRegistry::lookup_for_skill_routing(skill_agent_id) — named
    alias of get() that documents the integration point with the
    SKILL.md  field. Returns None for missing references; the
    skill activation site decides fallback policy.
  - 2 tests: hit + miss.

Phase 0.6 — Sample agents + integration test:
  - .jcode/agents/file-picker.toml — Routine tier, no message history,
    leaf agent. Demonstrates file-picker pattern adapted from Codebuff.
  - .jcode/agents/code-reviewer.toml — Thinking tier with
    inherit_parent_system_prompt=true to demonstrate the prompt-cache
    prefix-sharing trick (~90% input-token savings on cache hits).
  - tests/sample_agents.rs — integration test loads both files via the
    public AgentRegistry API and asserts shape + behavior. 4 tests.

Phase 0 totals: 49 unit + 4 integration = 53 tests, all passing.
`cargo check --bin jcode` succeeds (full workspace, 3m13s).

Phase 0 (foundation) is now complete:
  - Schema: AgentDefinition + ModelTier + OutputMode + ReasoningEffort
  - Loader: registry with priority order (project > user > builtin)
  - Validation: id format, internal invariants, cross-references
  - Sample agents demonstrating cache-hit and tier patterns
  - Skill MAS (#94) integration point established

Phase 1 (4 builtin agents + spawn_agents tool + cache benchmark) is
the next track.
…prompt utilities, sample agents

Phase 1:两名真实 TOML agent definitions (basher + editor) với full schema
Phase 4: `prompt_placeholders.rs` — `{{FILE_TREE}}`, `{{CURRENT_DATE}}`, etc.
Phase 4: `wrap_as_system_reminder()` in `src/agent/prompting.rs`
Phase 5: `evals/jbench/` scaffold — types, judge stub, lessons stub, agent_runner stub
Phase 0.6: integration tests `basher_sample_has_expected_shape` + `editor_sample_has_expected_shape`

All jcode-agent-runtime tests pass (49 unit + 6 integration).
…, CLI

Phase 5.3 (agent_runner): `run_agent_in_repo()` spawns jcode subprocess
  with prompt on stdin, streams stdout, captures trace + diff via
  `git diff HEAD`. Uses `timeout()` for per-run deadline.

Phase 5.4 (judge): `judge_with_three_models()` runs GPT + Gemini +
  Claude judges in parallel via OpenAI Responses API + Anthropic
  Messages API. Median analysis, averaged scores. `run_single_judge()`
  exposes per-judge entry point.

Phase 5.5 (lessons): `extract_lessons()` calls lessons extractor model
  via Responses API. `append_lessons_to_file()` accumulates lessons in
  per-agent JSON files with read-modify-write.

Phase 5.6 (CLI): Full `jbench run` implemented (loads eval JSON, iterates
  commits, calls `run_agent_in_repo`, writes `.run.json` files).
  `jbench meta-analyze` aggregates results. Other subcommands print
  Phase stubs and exit 0.

Bug fixes:
- `JudgingResult: Default` impl added (needed for EvalRun init)
- `OnceLock` for lazy reqwest static client (fixes const-eval restrictions)
- `context` method from `anyhow::Context` imported in bin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant