Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313
Open
quangdang46 wants to merge 5 commits into
Open
Experimental: multi-agent foundation (Phase 0) — AgentDefinition + tier slots + TOML registry#313quangdang46 wants to merge 5 commits into
quangdang46 wants to merge 5 commits into
Conversation
…ase 0.1+0.2) Lay the foundation for declarative agent definitions adapted from Codebuff's AgentDefinition schema, but adapted to jcode's single-OAuth provider reality: - signals.rs: existing soft-interrupt + cancellation primitives moved into a named module; root-level re-exports preserved so src/agent.rs consumers compile unchanged. - definition.rs: AgentDefinition struct (id, model_override, prefer_tier, reasoning, tool_names, spawnable_agents, prompts, output_mode, inherit_parent_system_prompt, include_message_history) with TOML round-trip + validation for id format, system_prompt vs inherit conflict, structured_output schema requirement, self-spawn, and duplicate tool/agent ids. - tier.rs: user-defined tier slot (routine/thinking) backed by the same JCODE_ROUTING_* env vars as model_routing.rs (#100). NOT a catalog — agents inherit session model when no tier is configured, so subscription users (Claude Pro / ChatGPT Plus / Gemini Advanced) see no behavior change. Pay-per-token users opt in by setting two env vars. - reasoning.rs: ReasoningEffort enum (minimal/low/medium/high). - output.rs: OutputMode enum (last_message/all_messages/structured_output). 32 unit tests pass. Full `cargo check --bin jcode` succeeds. This is Phase 0 of the multi-agent foundation — no runtime engine changes yet. Next: TOML loader for .jcode/agents/*.toml + builtin embedded agents (Phase 0.3).
…hase 0.3) Discover and load AgentDefinition files from three locations with priority order: 1. <project>/.jcode/agents/*.toml (project-local, highest) 2. ~/.jcode/agents/*.toml (user-global) 3. AgentRegistry::register_builtin (compiled-in defaults, lowest) Project-local overrides user-global overrides builtin. Re-registering a builtin after a higher-priority entry is loaded does NOT clobber the override — the priority check is symmetric in `insert`. Design choices: - Filename must match `<id>.toml` so users can find agents by id without opening every file. Mismatches are surfaced as a load error rather than silently misindexing. - Malformed/invalid files are collected as non-fatal LoadError entries so a single bad file doesn't prevent the rest of the registry from loading. `jcode doctor` (future) reads load_errors() to surface these. - AgentRegistry intentionally does NOT cross-reference `tool_names` / `spawnable_agents` — that's done at spawn time because the tool universe may be feature-gated (Phase 0.4). 41 unit tests pass (32 prior + 9 new). `cargo check --bin jcode` succeeds.
… agents (Phase 0.4-0.6)
Phase 0.4 — Cross-reference validation:
- ReferenceError enum (UnknownTools, UnknownSpawnableAgents) kept
separate from DefinitionError because the runtime tool/agent
universe isn't known at TOML-load time.
- AgentDefinition::validate_tool_references<I, S>() and
validate_spawn_references<I, S>() — caller passes the available
name set, gets back a sorted, comma-joined list of unknowns.
- 5 new tests covering the happy path, unknowns, empty lists,
and deterministic alphabetical ordering of the error message.
This deliberately does NOT modify src/tool/mod.rs. The whitelist
check is a pure function over the agent definition + a name set;
no need to refactor tool dispatch. Phase 1 will wire the actual
tool registry into the spawn path.
Phase 0.5 — Skill MAS (#94) bridge:
- AgentRegistry::lookup_for_skill_routing(skill_agent_id) — named
alias of get() that documents the integration point with the
SKILL.md field. Returns None for missing references; the
skill activation site decides fallback policy.
- 2 tests: hit + miss.
Phase 0.6 — Sample agents + integration test:
- .jcode/agents/file-picker.toml — Routine tier, no message history,
leaf agent. Demonstrates file-picker pattern adapted from Codebuff.
- .jcode/agents/code-reviewer.toml — Thinking tier with
inherit_parent_system_prompt=true to demonstrate the prompt-cache
prefix-sharing trick (~90% input-token savings on cache hits).
- tests/sample_agents.rs — integration test loads both files via the
public AgentRegistry API and asserts shape + behavior. 4 tests.
Phase 0 totals: 49 unit + 4 integration = 53 tests, all passing.
`cargo check --bin jcode` succeeds (full workspace, 3m13s).
Phase 0 (foundation) is now complete:
- Schema: AgentDefinition + ModelTier + OutputMode + ReasoningEffort
- Loader: registry with priority order (project > user > builtin)
- Validation: id format, internal invariants, cross-references
- Sample agents demonstrating cache-hit and tier patterns
- Skill MAS (#94) integration point established
Phase 1 (4 builtin agents + spawn_agents tool + cache benchmark) is
the next track.
…prompt utilities, sample agents
Phase 1:两名真实 TOML agent definitions (basher + editor) với full schema
Phase 4: `prompt_placeholders.rs` — `{{FILE_TREE}}`, `{{CURRENT_DATE}}`, etc.
Phase 4: `wrap_as_system_reminder()` in `src/agent/prompting.rs`
Phase 5: `evals/jbench/` scaffold — types, judge stub, lessons stub, agent_runner stub
Phase 0.6: integration tests `basher_sample_has_expected_shape` + `editor_sample_has_expected_shape`
All jcode-agent-runtime tests pass (49 unit + 6 integration).
…, CLI Phase 5.3 (agent_runner): `run_agent_in_repo()` spawns jcode subprocess with prompt on stdin, streams stdout, captures trace + diff via `git diff HEAD`. Uses `timeout()` for per-run deadline. Phase 5.4 (judge): `judge_with_three_models()` runs GPT + Gemini + Claude judges in parallel via OpenAI Responses API + Anthropic Messages API. Median analysis, averaged scores. `run_single_judge()` exposes per-judge entry point. Phase 5.5 (lessons): `extract_lessons()` calls lessons extractor model via Responses API. `append_lessons_to_file()` accumulates lessons in per-agent JSON files with read-modify-write. Phase 5.6 (CLI): Full `jbench run` implemented (loads eval JSON, iterates commits, calls `run_agent_in_repo`, writes `.run.json` files). `jbench meta-analyze` aggregates results. Other subcommands print Phase stubs and exit 0. Bug fixes: - `JudgingResult: Default` impl added (needed for EvalRun init) - `OnceLock` for lazy reqwest static client (fixes const-eval restrictions) - `context` method from `anyhow::Context` imported in bin
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multi-agent architecture foundation for jcode adapted from CodebuffAI's design.
Phase 0 — Foundation (commits 1-3)
AgentDefinitionModelTierRoutine / Thinkingenum — maps to env vars + session-gateway routingOutputModeLastMessage / AllMessages / StructuredOutput— controls how tool results surfaceAgentRegistry.jcode/agents/*.tomlwith roundtrip validationspawnable_agentsIDs actually exist in registry at load timeMAS-prefixed skill names invokejcode-skill-{name}binariessample_agents.rsPhase 1 — Agent TOML definitions
basher.tomleditor.tomlPhase 4 — Prompt utilities
prompt_placeholders.rs{{FILE_TREE}},{{CURRENT_DATE}},{{REMAINING_STEPS}},{{KNOWLEDGE_FILES}},{{GIT_CHANGES}}substitution enginewrap_as_system_reminder()<system_reminder>tagsPhase 5 — JBench evaluation framework
evals/jbench/src/agent_runner.rsrun_agent_in_repo()— spawns jcode subprocess, streams stdout, captures diff viagit diff HEADevals/jbench/src/judge.rsjudge_with_three_models()— GPT + Gemini + Claude in parallel, median analysis + averaged scoresevals/jbench/src/lessons.rsextract_lessons()+append_lessons_to_file()— lessons accumulation per agentevals/jbench/src/bin/jbench.rsrunandmeta-analyzeimplemented;pick-commits/gen-evals/judgeas Phase stubsTest results
🤖 Generated with Claude Code