Releases: Protocol-zero-0/evolution-kernel
v1.1.2 — Packaging fix: roles/ now ship in the wheel
Released: 2026-05-26 · hours after v1.1.1 · pip install -U evolution-kernel
TL;DR: v1.1.0 and v1.1.1 wheels shipped the runtime but not the reference role scripts — pip install evolution-kernel users had nowhere to point roles.executor at. This release fixes the packaging and adds a bundled: prefix so the same config works for pip install and git clone setups.
The bug
The wheel only contained evolution_kernel/ and evolution_kernel/templates/. The reference roles (planner.py, executor.sh, evaluator.py, goal_evaluator.py, strategist.py) lived at the repo top level, outside the package, so setuptools never bundled them. The README's ["python3", "roles/planner.py"] only resolved for users whose CWD was a fresh git-clone of the repo. Discovered while smoke-testing the v1.1.1 ship — fixed before sundown.
The fix
- Move
roles/*→evolution_kernel/roles/*so setuptools ships them aspackage_data. - Add a
bundled:<name>prefix that the kernel resolves to the absolute path inside the installed wheel viaimportlib.resources. Sameevolution.ymlnow works for bothpip installusers and developers running from a checkout. - Update all 5
evolution-kernel inittemplates +examples/evolution.yml+ both READMEs to usebundled:form.evolution-kernel initnow writes configs that work out of the box on a pip-installed box.
roles:
planner: ["python3", "bundled:planner.py"]
executor: ["bash", "bundled:executor.sh"]
evaluator: ["python3", "bundled:evaluator.py"]→ 30-second smoke test:
pip install -U evolution-kernel
evolution-kernel init # 3 questions, drops a working evolution.ymlBackward compatibility
bundled: is opt-in. Plain argv (["python3", "myplanner.py"]) still passes through unchanged. Existing configs that hardcoded roles/X at the repo top level need updating to bundled:X — re-run evolution-kernel init for a fresh config, or do a single search-and-replace.
Verification
- 108 tests pass (102 prior + 6 new bundled-role tests covering happy path, no-op for plain argv, path-separator rejection, missing-file errors, and end-to-end through
load_config) - Wheel now contains all 5 role scripts under
evolution_kernel/roles/ - Fresh-venv
pip install+bundled:resolution returns real site-packages paths for all 5 roles
Links
中文摘要
发版: 2026-05-26 · 距 v1.1.1 几个小时 · pip install -U evolution-kernel
一句话: v1.1.0 / v1.1.1 的 wheel 只带了 runtime,没带 5 个参考 role 脚本——pip install evolution-kernel 用户没有可以指向 roles.executor 的文件。这一版修了封装、加了 bundled: 前缀,让同一份 evolution.yml 在 pip 装和 git clone 两种环境下都直接能跑。
Bug 在哪
之前 wheel 只装了 evolution_kernel/ 包 + templates/*.yml,5 个参考 role 脚本(`planner.py` / `executor.sh` / `evaluator.py` / `goal_evaluator.py` / `strategist.py`)住在仓库顶层,不在包里,setuptools 没打进去。README 写的 `["python3", "roles/planner.py"]` 只在 cwd 是 git-clone 仓库根的用户身上能跑。这个问题是 v1.1.1 发版做 smoke test 时发现的,当天就修完。
怎么修
- 移动 `roles/` → `evolution_kernel/roles/`,让 setuptools 当 `package_data` 发出去
- 加 `bundled:` 前缀,kernel 用 `importlib.resources` 解析成 wheel 里的绝对路径;同一份 `evolution.yml`,pip 装和 git clone 都直接跑
- 更新 5 个 init 模板 + `examples/evolution.yml` + 中英 README,全部用 `bundled:` 写法。`evolution-kernel init` 写出来的 config 在 pip 装机器上开箱即用
roles:
planner: ["python3", "bundled:planner.py"]
executor: ["bash", "bundled:executor.sh"]
evaluator: ["python3", "bundled:evaluator.py"]→ 30 秒 smoke test:
pip install -U evolution-kernel
evolution-kernel init # 3 个问题,落一份能跑的 evolution.yml兼容性
`bundled:` 是 opt-in。普通 argv(如 `["python3", "myplanner.py"]`)原样透传。已有 config 写死 `roles/X`(顶层路径)需要改成 `bundled:X`——重跑 `evolution-kernel init` 或一次性 search-and-replace 即可。
验证
- 108 测试通过(102 历史 + 6 新增 bundled-role 测试,覆盖 happy path / 普通 argv 透传 / 路径分隔符拒绝 / 缺文件报错 / 端到端走 `load_config`)
- Wheel 里现在含 5 个 role 脚本(位于 `evolution_kernel/roles/`)
- 干净 venv `pip install` + `bundled:` resolution 对 5 个 role 都返回 site-packages 真实路径
v1.1.1 — Executor permission-mode fix
Released: 2026-05-26 · 9 days after v1.1.0 · pip install -U evolution-kernel
TL;DR: If you tried coding_agent.tool: claude-code on v1.1.0 and your runs silently produced no patch, this fixes it.
→ Try the 10-minute quickstart on the fixed build: examples/quickstart/ — zero cost, no Anthropic API key required (Claude Pro is enough).
What changed
roles/executor.sh now passes --permission-mode bypassPermissions to claude -p. Without it, the inner Claude session refuses to edit files in non-interactive mode — the kernel run completes but the worktree stays unchanged, every attempt gets rejected by the evaluator, and budget is spent on no-op iterations.
Why this is safe
Governor already isolates each attempt in a temporary git worktree, and scope.py rejects any change outside allowed_paths. The kernel is the trust boundary; the inner claude -p session does not need its own permission prompts on top of that. With sandbox.enabled: true (firejail), you also get an OS-level read-only-root cage — two layers below the executor, so bypassPermissions inside the inner agent is the right default.
Affected configurations
You're affected if your evolution.yml sets coding_agent.tool: claude-code. Setups using aider or a custom executor (like examples/oss_fix_demo/) are unaffected.
Compatibility
No breaking changes. Pure bug fix. Single runtime dependency (PyYAML) preserved. Cumulative test count: 102 (unchanged from v1.1.0).
Links
- Fix: #37
- Version bump: #39
- PyPI: https://pypi.org/project/evolution-kernel/1.1.1/
中文摘要
发版: 2026-05-26 · 距 v1.1.0 整 9 天 · pip install -U evolution-kernel
一句话: v1.1.0 用 `coding_agent.tool: claude-code` 跑 kernel 时,所有 attempt 都"跑完但没改文件"——这个版本修了。
→ 10 分钟用上修复版: `examples/quickstart/` —— 零成本,不需要 Anthropic API key(Claude Pro 订阅即可)。
改了什么
`roles/executor.sh` 调 `claude -p` 时补了 `--permission-mode bypassPermissions`。之前缺这个 flag 导致 Claude 非交互模式拒绝改文件——kernel 跑完了但 worktree 没动,所有 attempt 被 evaluator reject,预算白烧在空转上。
为什么这样安全
Governor 已经把每次 attempt 隔离在临时 git worktree 里,`scope.py` 拒绝 `allowed_paths` 之外的任何改动。Kernel 才是信任边界,里面的 `claude -p` 不需要再加一层权限提示。开了 `sandbox.enabled: true`(firejail)还能再叠一层 OS 级 read-only-root cage——executor 下方共两层 sandbox,所以内层 agent 用 `bypassPermissions` 是正确默认。
谁受影响
你 `evolution.yml` 里设了 `coding_agent.tool: claude-code` 就受影响。用 `aider` 或自定义 executor(比如 `examples/oss_fix_demo/`)的不受影响。
兼容性
零破坏性变更。单依赖(PyYAML)不变。累计测试数 102(与 v1.1.0 持平)。
v1.1.0 — ready to publicize
v1.1.0 — ready to publicize
The v1.0 line shipped the runtime; v1.1 ships the first 10 minutes a stranger spends with the runtime. No kernel refactor, no new dependencies, no abstractions — just the onboarding surface that turns "interesting README" into "I just ran it."
What's new
evolution-kernel init — three-question scaffolder (#28)
A new subcommand asks 3 questions — mission, template, allowed paths — and drops a valid evolution.yml in the current directory. Five starter templates ship as plain YAML and cover the common mission shapes:
lint— drive a linter/formatter to zero violationscoverage— raise test coverageperf— optimize a measurable workloadbenchmark— FunSearch / AlphaEvolve-style population search (k-branch parallel)custom— blank-ish starter
No interactive prompt library, no template base classes, no Python template generator — the wizard is 76 lines of stdlib input() calls, and the rendered output is fed through load_config() before it can hit disk, so a broken template can never escape.
examples/quickstart/ — see the loop close in 1.4 seconds (#30)
A turn-key demo that takes a stranger from git clone to evolution/accepted commit in one shell snippet, zero cost, no API key:
pip install -e . && pip install ruff
bash examples/quickstart/setup.sh
evolution-kernel --config examples/quickstart/evolution.yml \
--repo /tmp/ek-quickstart-target \
--ledger /tmp/ek-quickstart-ledger --loopMeasured wall-clock on a developer laptop: 1.4 s. The mission is small on purpose (drive ruff to zero violations on src/messy.py) so the entire closed loop — worktree sandbox, scope enforcement, ledger writes, evolution/accepted branch advancing — fits in one terminal scroll. No LLM in the loop; the planner/executor/evaluator are deterministic Python scripts committed into the demo target itself. This is intentional: the example demonstrates the runtime, not LLM smarts. For the LLM-driven story, see ⬇.
examples/oss_fix_demo/ — real OSS fix via claude CLI (#32)
The companion to quickstart, pointed at a real published OSS package: python-slugify v8.0.4 (1,106 LoC, MIT). The executor is claude -p --permission-mode acceptEdits, billed against the operator's Claude Pro / Max subscription — no API key, no per-token charge.
Verified end-to-end (2026-05-17):
- 10 real ruff violations on the cloned target
- Claude makes the semantic edits (F401 → explicit
as-alias re-exports), wall-clock 34 s - A
ruff check --fix && ruff formatpostprocess mops up structural autofixes (I001 import sort) - Run 0001 accepted,
evolution/acceptedadvanced, real commitbae97a8landed - Total
--looptime: 48 s, $0 marginal cost
The realistic split — LLM does semantic work, deterministic tooling handles structural cleanup — mirrors how production teams actually chain agents with formatters.
README hero block (#34)
The first thing a visitor sees is now a copy-pasteable ▶ Try in 10 minutes snippet plus a compact ASCII workflow diagram showing Observe → Plan → Execute → Evaluate → accept/reject → ledger. The existing investor-narrative Motivation / SWE-bench Verified worked-example sections are intact below.
Numbers
- 102 tests pass under
python -m unittest discover -s testson Python 3.10 and 3.12 (99 baseline + 3 new for the init wizard, covering all 5 templates viasubTest). evolution_kernel/*.py: 1,969 lines — well under the v1.1 soft cap of baseline + 200 (= 2,089).- Single runtime dependency (PyYAML) preserved.
- No kernel changes — every new behavior lives in
init_wizard.py, the templates, or underexamples/. The runtime that v1.0.0 froze is byte-identical.
Issues closed
- #27
evolution-kernel initsubcommand + 5 YAML templates - #29
examples/quickstart/10-minute zero-cost ruff cleanup demo - #31
examples/oss_fix_demo/real OSS fix via claude CLI - #33 README hero — ▶ Try in 10 minutes + ASCII workflow
Migration
None. v1.1 is a strict superset of v1.0 on the kernel surface. Existing configs and ledgers continue to work unchanged.
v1.0.0 — Phase 4: Sandbox + Remote Observer
The kernel crosses the "灵魂插件" bar: an evolution runtime you can point at any git repository and trust to run unattended overnight — sandboxed at the OS level, with evidence pulled from anywhere on the network.
Highlights
Process sandbox via firejail (PR #19, closes #17)
- Executor argv is wrapped with
firejail --quiet --noprofile --read-only=/ --read-write=<worktree> --read-write=<run_dir>whensandbox.enabled: true. - The rest of the filesystem is mounted read-only, so a misbehaving executor cannot write to
/tmp,~/.ssh, or anywhere else on disk during a round. - Verified end-to-end: planted
/tmp/sandbox-leak-<run_id>.txtwrite attempt blocked withOSError: [Errno 30] Read-only file system; same fixture without the sandbox writes the file. CI installs firejail and runs this assertion on every push. - Planner and evaluator stay unsandboxed (read-mostly) to keep the blast radius of any policy bug as small as possible.
Remote observer — HTTP evidence source (PR #20, closes #18)
evidence_sourcesgainstype: httpwithurl,method,headers,timeout.- Uses stdlib
urllib.request, so the kernel's single-dependency rule still holds —pyproject.tomllists onlyPyYAML>=6.0. - Captures
status,body(64 KiB cap with atruncatedflag), and a sorted list of response headers so the ledger is stable for diffing. - Non-2xx responses still record the body — the planner gets to decide how to react instead of the observer silently retrying.
- Verified end-to-end: a
Governor.run_onceagainst a localpython3 -m http.serverrecords the JSON response underobservation.json.sources[0]withstatus: 200.
Test surface
| v0.3 → v1.0 | |
|---|---|
| Tests | 67 → 99 (+16 sandbox, +16 HTTP) |
| Runtime | ~1,200 → ~1,400 lines of Python |
| Third-party deps | 1 (PyYAML) → 1 (PyYAML) |
| CI matrix | Python 3.10 + 3.12, both green with firejail installed |
Compatibility
sandbox.enableddefaults tofalse. v0.3 configs run byte-for-byte identically.type: httpis an additive evidence source; existingtype: fileandtype: shellare unchanged.- The
EvidenceSourcedataclass gained new fields but kept the existing ones at the same positions.
What ships in v1.0
- Multi-round LLM loop with history injection.
max_total_usd/max_total_tokens/max_iterations/max_consecutive_failureshard stops.- Full ledger audit trail (survives process restarts).
- Git worktree sandbox per attempt.
- Scope enforcement against
allowed_paths. - Aider + Claude Code executor adapters, Anthropic + OpenAI planner/evaluator adapters.
- Goal evaluator — stops when the mission is "won".
- k-branch parallel exploration (FunSearch / AlphaEvolve style).
- NEW Process sandbox via firejail.
- NEW Remote observer (HTTP evidence source).
Changelog (since v0.3.0)
v0.3.0 — k-branch parallel exploration
First tagged release. The repository's pyproject.toml is now in sync with the public version — earlier README badges (v0.2) referenced unreleased states.
Highlights since v0.1.0
This release bundles four months of work that took the kernel from a flat MVP to a population-level evolution runtime:
🧬 Phase 3 — k-branch parallel exploration (#15)
Governor.run_once_parallel(goal, k)spawnskindependent worktrees per round, each runningplan → execute → evaluate.- The highest-fitness survivor is promoted to
evolution/accepted; the rest are recorded underledger/failed/. - New
parallel.k_branchesconfig field (default1, fully back-compatible). - Evaluator role now emits a float
fitness; older evaluators that only sethard_gates_passedkeep working via automatic back-fill.
🎯 Phase 2 — Goal evaluator + Strategist (#13)
- Goal evaluator — after every accepted round, an external role decides whether the mission has been won;
true→ CLI exits 0. - Strategist — every
Nrounds, an external role injects{ stage, next_milestone, taboo_directions }into the planner's input. - Both default to disabled; existing configs are unchanged.
🔁 Phase 1 — LLM loop + history + cost guard (PR #4, retroactively v0.2)
- Multi-round LLM loop with history injection — planner sees prior rounds' reflections.
- Budget guards:
max_total_usd,max_total_tokens. - Anthropic + OpenAI planner/evaluator support; Aider + Claude Code executor support.
🛡️ Phase 0 — MVP closed loop (PR #2, retroactively v0.1)
- Observer → planner → executor → evaluator → ledger.
- Git worktree sandbox; every change reversible.
mutation_scope.allowed_pathsenforcement.- Iteration / consecutive-failure hard stops.
What works today
| Feature | Status |
|---|---|
| Multi-round LLM loop with memory | ✅ |
Budget guards (max_total_usd, max_total_tokens) |
✅ |
| Iteration / consecutive-failure hard stops | ✅ |
| Full ledger audit trail | ✅ |
| Git worktree sandbox | ✅ |
| Scope enforcement | ✅ |
| Config-driven LLM provider / model / coding agent | ✅ |
| Aider and Claude Code executor support | ✅ |
| Anthropic and OpenAI planner/evaluator support | ✅ |
| Goal evaluator — stops when mission is won | ✅ |
| k-branch parallel exploration (FunSearch / AlphaEvolve) | ✅ |
| Process sandbox (firejail / bwrap) | 🔧 next |
Tests
67 passed — CI green on Python 3.10 and 3.12.
Install
pip install evolution-kernel==0.3.0(or clone the repo — single dependency: PyYAML.)