Skip to content

Releases: Protocol-zero-0/evolution-kernel

v1.1.2 — Packaging fix: roles/ now ship in the wheel

26 May 10:09
5dfef32

Choose a tag to compare

Released: 2026-05-26 · hours after v1.1.1 · pip install -U evolution-kernel

TL;DR: v1.1.0 and v1.1.1 wheels shipped the runtime but not the reference role scripts — pip install evolution-kernel users had nowhere to point roles.executor at. This release fixes the packaging and adds a bundled: prefix so the same config works for pip install and git clone setups.

The bug

The wheel only contained evolution_kernel/ and evolution_kernel/templates/. The reference roles (planner.py, executor.sh, evaluator.py, goal_evaluator.py, strategist.py) lived at the repo top level, outside the package, so setuptools never bundled them. The README's ["python3", "roles/planner.py"] only resolved for users whose CWD was a fresh git-clone of the repo. Discovered while smoke-testing the v1.1.1 ship — fixed before sundown.

The fix

  1. Move roles/*evolution_kernel/roles/* so setuptools ships them as package_data.
  2. Add a bundled:<name> prefix that the kernel resolves to the absolute path inside the installed wheel via importlib.resources. Same evolution.yml now works for both pip install users and developers running from a checkout.
  3. Update all 5 evolution-kernel init templates + examples/evolution.yml + both READMEs to use bundled: form. evolution-kernel init now writes configs that work out of the box on a pip-installed box.
roles:
  planner:   ["python3", "bundled:planner.py"]
  executor:  ["bash",    "bundled:executor.sh"]
  evaluator: ["python3", "bundled:evaluator.py"]

30-second smoke test:

pip install -U evolution-kernel
evolution-kernel init     # 3 questions, drops a working evolution.yml

Backward compatibility

bundled: is opt-in. Plain argv (["python3", "myplanner.py"]) still passes through unchanged. Existing configs that hardcoded roles/X at the repo top level need updating to bundled:X — re-run evolution-kernel init for a fresh config, or do a single search-and-replace.

Verification

  • 108 tests pass (102 prior + 6 new bundled-role tests covering happy path, no-op for plain argv, path-separator rejection, missing-file errors, and end-to-end through load_config)
  • Wheel now contains all 5 role scripts under evolution_kernel/roles/
  • Fresh-venv pip install + bundled: resolution returns real site-packages paths for all 5 roles

Links


中文摘要

发版: 2026-05-26 · 距 v1.1.1 几个小时 · pip install -U evolution-kernel

一句话: v1.1.0 / v1.1.1 的 wheel 只带了 runtime,没带 5 个参考 role 脚本——pip install evolution-kernel 用户没有可以指向 roles.executor 的文件。这一版修了封装、加了 bundled: 前缀,让同一份 evolution.yml 在 pip 装和 git clone 两种环境下都直接能跑。

Bug 在哪

之前 wheel 只装了 evolution_kernel/ 包 + templates/*.yml,5 个参考 role 脚本(`planner.py` / `executor.sh` / `evaluator.py` / `goal_evaluator.py` / `strategist.py`)住在仓库顶层,不在包里,setuptools 没打进去。README 写的 `["python3", "roles/planner.py"]` 只在 cwd 是 git-clone 仓库根的用户身上能跑。这个问题是 v1.1.1 发版做 smoke test 时发现的,当天就修完。

怎么修

  1. 移动 `roles/` → `evolution_kernel/roles/`,让 setuptools 当 `package_data` 发出去
  2. 加 `bundled:` 前缀,kernel 用 `importlib.resources` 解析成 wheel 里的绝对路径;同一份 `evolution.yml`,pip 装和 git clone 都直接跑
  3. 更新 5 个 init 模板 + `examples/evolution.yml` + 中英 README,全部用 `bundled:` 写法。`evolution-kernel init` 写出来的 config 在 pip 装机器上开箱即用
roles:
  planner:   ["python3", "bundled:planner.py"]
  executor:  ["bash",    "bundled:executor.sh"]
  evaluator: ["python3", "bundled:evaluator.py"]

30 秒 smoke test:

pip install -U evolution-kernel
evolution-kernel init     # 3 个问题,落一份能跑的 evolution.yml

兼容性

`bundled:` 是 opt-in。普通 argv(如 `["python3", "myplanner.py"]`)原样透传。已有 config 写死 `roles/X`(顶层路径)需要改成 `bundled:X`——重跑 `evolution-kernel init` 或一次性 search-and-replace 即可。

验证

  • 108 测试通过(102 历史 + 6 新增 bundled-role 测试,覆盖 happy path / 普通 argv 透传 / 路径分隔符拒绝 / 缺文件报错 / 端到端走 `load_config`)
  • Wheel 里现在含 5 个 role 脚本(位于 `evolution_kernel/roles/`)
  • 干净 venv `pip install` + `bundled:` resolution 对 5 个 role 都返回 site-packages 真实路径

v1.1.1 — Executor permission-mode fix

26 May 09:55
33f9a73

Choose a tag to compare

Released: 2026-05-26 · 9 days after v1.1.0 · pip install -U evolution-kernel

TL;DR: If you tried coding_agent.tool: claude-code on v1.1.0 and your runs silently produced no patch, this fixes it.

Try the 10-minute quickstart on the fixed build: examples/quickstart/ — zero cost, no Anthropic API key required (Claude Pro is enough).

What changed

roles/executor.sh now passes --permission-mode bypassPermissions to claude -p. Without it, the inner Claude session refuses to edit files in non-interactive mode — the kernel run completes but the worktree stays unchanged, every attempt gets rejected by the evaluator, and budget is spent on no-op iterations.

Why this is safe

Governor already isolates each attempt in a temporary git worktree, and scope.py rejects any change outside allowed_paths. The kernel is the trust boundary; the inner claude -p session does not need its own permission prompts on top of that. With sandbox.enabled: true (firejail), you also get an OS-level read-only-root cage — two layers below the executor, so bypassPermissions inside the inner agent is the right default.

Affected configurations

You're affected if your evolution.yml sets coding_agent.tool: claude-code. Setups using aider or a custom executor (like examples/oss_fix_demo/) are unaffected.

Compatibility

No breaking changes. Pure bug fix. Single runtime dependency (PyYAML) preserved. Cumulative test count: 102 (unchanged from v1.1.0).

Links


中文摘要

发版: 2026-05-26 · 距 v1.1.0 整 9 天 · pip install -U evolution-kernel

一句话: v1.1.0 用 `coding_agent.tool: claude-code` 跑 kernel 时,所有 attempt 都"跑完但没改文件"——这个版本修了。

10 分钟用上修复版: `examples/quickstart/` —— 零成本,不需要 Anthropic API key(Claude Pro 订阅即可)。

改了什么

`roles/executor.sh` 调 `claude -p` 时补了 `--permission-mode bypassPermissions`。之前缺这个 flag 导致 Claude 非交互模式拒绝改文件——kernel 跑完了但 worktree 没动,所有 attempt 被 evaluator reject,预算白烧在空转上。

为什么这样安全

Governor 已经把每次 attempt 隔离在临时 git worktree 里,`scope.py` 拒绝 `allowed_paths` 之外的任何改动。Kernel 才是信任边界,里面的 `claude -p` 不需要再加一层权限提示。开了 `sandbox.enabled: true`(firejail)还能再叠一层 OS 级 read-only-root cage——executor 下方共两层 sandbox,所以内层 agent 用 `bypassPermissions` 是正确默认。

谁受影响

你 `evolution.yml` 里设了 `coding_agent.tool: claude-code` 就受影响。用 `aider` 或自定义 executor(比如 `examples/oss_fix_demo/`)的不受影响。

兼容性

零破坏性变更。单依赖(PyYAML)不变。累计测试数 102(与 v1.1.0 持平)。

v1.1.0 — ready to publicize

17 May 06:34
5bfb505

Choose a tag to compare

v1.1.0 — ready to publicize

The v1.0 line shipped the runtime; v1.1 ships the first 10 minutes a stranger spends with the runtime. No kernel refactor, no new dependencies, no abstractions — just the onboarding surface that turns "interesting README" into "I just ran it."

What's new

evolution-kernel init — three-question scaffolder (#28)

A new subcommand asks 3 questions — mission, template, allowed paths — and drops a valid evolution.yml in the current directory. Five starter templates ship as plain YAML and cover the common mission shapes:

  • lint — drive a linter/formatter to zero violations
  • coverage — raise test coverage
  • perf — optimize a measurable workload
  • benchmark — FunSearch / AlphaEvolve-style population search (k-branch parallel)
  • custom — blank-ish starter

No interactive prompt library, no template base classes, no Python template generator — the wizard is 76 lines of stdlib input() calls, and the rendered output is fed through load_config() before it can hit disk, so a broken template can never escape.

examples/quickstart/ — see the loop close in 1.4 seconds (#30)

A turn-key demo that takes a stranger from git clone to evolution/accepted commit in one shell snippet, zero cost, no API key:

pip install -e . && pip install ruff
bash examples/quickstart/setup.sh
evolution-kernel --config examples/quickstart/evolution.yml \
                 --repo /tmp/ek-quickstart-target \
                 --ledger /tmp/ek-quickstart-ledger --loop

Measured wall-clock on a developer laptop: 1.4 s. The mission is small on purpose (drive ruff to zero violations on src/messy.py) so the entire closed loop — worktree sandbox, scope enforcement, ledger writes, evolution/accepted branch advancing — fits in one terminal scroll. No LLM in the loop; the planner/executor/evaluator are deterministic Python scripts committed into the demo target itself. This is intentional: the example demonstrates the runtime, not LLM smarts. For the LLM-driven story, see ⬇.

examples/oss_fix_demo/ — real OSS fix via claude CLI (#32)

The companion to quickstart, pointed at a real published OSS package: python-slugify v8.0.4 (1,106 LoC, MIT). The executor is claude -p --permission-mode acceptEdits, billed against the operator's Claude Pro / Max subscription — no API key, no per-token charge.

Verified end-to-end (2026-05-17):

  • 10 real ruff violations on the cloned target
  • Claude makes the semantic edits (F401 → explicit as-alias re-exports), wall-clock 34 s
  • A ruff check --fix && ruff format postprocess mops up structural autofixes (I001 import sort)
  • Run 0001 accepted, evolution/accepted advanced, real commit bae97a8 landed
  • Total --loop time: 48 s, $0 marginal cost

The realistic split — LLM does semantic work, deterministic tooling handles structural cleanup — mirrors how production teams actually chain agents with formatters.

README hero block (#34)

The first thing a visitor sees is now a copy-pasteable ▶ Try in 10 minutes snippet plus a compact ASCII workflow diagram showing Observe → Plan → Execute → Evaluate → accept/reject → ledger. The existing investor-narrative Motivation / SWE-bench Verified worked-example sections are intact below.

Numbers

  • 102 tests pass under python -m unittest discover -s tests on Python 3.10 and 3.12 (99 baseline + 3 new for the init wizard, covering all 5 templates via subTest).
  • evolution_kernel/*.py: 1,969 lines — well under the v1.1 soft cap of baseline + 200 (= 2,089).
  • Single runtime dependency (PyYAML) preserved.
  • No kernel changes — every new behavior lives in init_wizard.py, the templates, or under examples/. The runtime that v1.0.0 froze is byte-identical.

Issues closed

  • #27 evolution-kernel init subcommand + 5 YAML templates
  • #29 examples/quickstart/ 10-minute zero-cost ruff cleanup demo
  • #31 examples/oss_fix_demo/ real OSS fix via claude CLI
  • #33 README hero — ▶ Try in 10 minutes + ASCII workflow

Migration

None. v1.1 is a strict superset of v1.0 on the kernel surface. Existing configs and ledgers continue to work unchanged.

v1.0.0 — Phase 4: Sandbox + Remote Observer

14 May 03:24
5e3ab83

Choose a tag to compare

The kernel crosses the "灵魂插件" bar: an evolution runtime you can point at any git repository and trust to run unattended overnight — sandboxed at the OS level, with evidence pulled from anywhere on the network.

Highlights

Process sandbox via firejail (PR #19, closes #17)

  • Executor argv is wrapped with firejail --quiet --noprofile --read-only=/ --read-write=<worktree> --read-write=<run_dir> when sandbox.enabled: true.
  • The rest of the filesystem is mounted read-only, so a misbehaving executor cannot write to /tmp, ~/.ssh, or anywhere else on disk during a round.
  • Verified end-to-end: planted /tmp/sandbox-leak-<run_id>.txt write attempt blocked with OSError: [Errno 30] Read-only file system; same fixture without the sandbox writes the file. CI installs firejail and runs this assertion on every push.
  • Planner and evaluator stay unsandboxed (read-mostly) to keep the blast radius of any policy bug as small as possible.

Remote observer — HTTP evidence source (PR #20, closes #18)

  • evidence_sources gains type: http with url, method, headers, timeout.
  • Uses stdlib urllib.request, so the kernel's single-dependency rule still holds — pyproject.toml lists only PyYAML>=6.0.
  • Captures status, body (64 KiB cap with a truncated flag), and a sorted list of response headers so the ledger is stable for diffing.
  • Non-2xx responses still record the body — the planner gets to decide how to react instead of the observer silently retrying.
  • Verified end-to-end: a Governor.run_once against a local python3 -m http.server records the JSON response under observation.json.sources[0] with status: 200.

Test surface

v0.3 → v1.0
Tests 67 → 99 (+16 sandbox, +16 HTTP)
Runtime ~1,200 → ~1,400 lines of Python
Third-party deps 1 (PyYAML) → 1 (PyYAML)
CI matrix Python 3.10 + 3.12, both green with firejail installed

Compatibility

  • sandbox.enabled defaults to false. v0.3 configs run byte-for-byte identically.
  • type: http is an additive evidence source; existing type: file and type: shell are unchanged.
  • The EvidenceSource dataclass gained new fields but kept the existing ones at the same positions.

What ships in v1.0

  • Multi-round LLM loop with history injection.
  • max_total_usd / max_total_tokens / max_iterations / max_consecutive_failures hard stops.
  • Full ledger audit trail (survives process restarts).
  • Git worktree sandbox per attempt.
  • Scope enforcement against allowed_paths.
  • Aider + Claude Code executor adapters, Anthropic + OpenAI planner/evaluator adapters.
  • Goal evaluator — stops when the mission is "won".
  • k-branch parallel exploration (FunSearch / AlphaEvolve style).
  • NEW Process sandbox via firejail.
  • NEW Remote observer (HTTP evidence source).

Changelog (since v0.3.0)

v0.3.0 — k-branch parallel exploration

13 May 19:55
818860b

Choose a tag to compare

First tagged release. The repository's pyproject.toml is now in sync with the public version — earlier README badges (v0.2) referenced unreleased states.

Highlights since v0.1.0

This release bundles four months of work that took the kernel from a flat MVP to a population-level evolution runtime:

🧬 Phase 3 — k-branch parallel exploration (#15)

  • Governor.run_once_parallel(goal, k) spawns k independent worktrees per round, each running plan → execute → evaluate.
  • The highest-fitness survivor is promoted to evolution/accepted; the rest are recorded under ledger/failed/.
  • New parallel.k_branches config field (default 1, fully back-compatible).
  • Evaluator role now emits a float fitness; older evaluators that only set hard_gates_passed keep working via automatic back-fill.

🎯 Phase 2 — Goal evaluator + Strategist (#13)

  • Goal evaluator — after every accepted round, an external role decides whether the mission has been won; true → CLI exits 0.
  • Strategist — every N rounds, an external role injects { stage, next_milestone, taboo_directions } into the planner's input.
  • Both default to disabled; existing configs are unchanged.

🔁 Phase 1 — LLM loop + history + cost guard (PR #4, retroactively v0.2)

  • Multi-round LLM loop with history injection — planner sees prior rounds' reflections.
  • Budget guards: max_total_usd, max_total_tokens.
  • Anthropic + OpenAI planner/evaluator support; Aider + Claude Code executor support.

🛡️ Phase 0 — MVP closed loop (PR #2, retroactively v0.1)

  • Observer → planner → executor → evaluator → ledger.
  • Git worktree sandbox; every change reversible.
  • mutation_scope.allowed_paths enforcement.
  • Iteration / consecutive-failure hard stops.

What works today

Feature Status
Multi-round LLM loop with memory
Budget guards (max_total_usd, max_total_tokens)
Iteration / consecutive-failure hard stops
Full ledger audit trail
Git worktree sandbox
Scope enforcement
Config-driven LLM provider / model / coding agent
Aider and Claude Code executor support
Anthropic and OpenAI planner/evaluator support
Goal evaluator — stops when mission is won
k-branch parallel exploration (FunSearch / AlphaEvolve)
Process sandbox (firejail / bwrap) 🔧 next

Tests

67 passed — CI green on Python 3.10 and 3.12.

Install

pip install evolution-kernel==0.3.0

(or clone the repo — single dependency: PyYAML.)