feat: examples/oss_fix_demo/ — real OSS fix via claude CLI by Protocol-zero-0 · Pull Request #32 · Protocol-zero-0/evolution-kernel

Protocol-zero-0 · 2026-05-17T06:31:53Z

Closes #31.

Summary

Companion to `examples/quickstart/`. Where quickstart is fully deterministic, this example wires `claude -p` (Claude Pro / Max subscription) in as the executor and points at a real published OSS package — python-slugify v8.0.4, 1,106 LoC, MIT.
Mission: drive ruff to zero violations on `slugify/`. Verified end-to-end: 48 s, claude wrote real edits, run 0001 accepted, real `evolution/accepted` commit landed.
No kernel changes. No new runtime dependencies. 102 baseline tests still pass.

Measured (2026-05-17, dev laptop)

Step	Wall-clock	Cost
`setup.sh`	~1 s + clone bandwidth	$0
Run 1 (claude executor + ruff postprocess + evaluator)	34 s (claude alone)	Claude Pro flat fee — no API key
Full 3-round `--loop` (1 accept + 2 halts)	48 s	$0 marginal

`runs/0001/decision.json`:
```json
{"accepted": true, "candidate_commit": "bae97a8...", "reason": "hard gates passed and evaluator recommended promotion"}
```

`runs/0001/executor_output.json` (claude's verbatim summary):

Fixed both files: added explicit `as` re-exports for all version metadata imports in slugify/init.py, and sorted the from .slugify import names alphabetically in slugify/main.py.

Design notes (worth recording)

The kernel-bundled `roles/executor.sh` claude-code path invokes `claude -p` without `--permission-mode acceptEdits`, so claude refuses to edit files non-interactively. This PR ships a small custom `bots/executor.py` that adds the flag. A follow-up issue to update the kernel path is a good post-v1.1 task.
python-slugify's star imports are load-bearing public-API re-exports; `setup.sh` writes a target-side `pyproject.toml` that `ignore = ["F403"]` so the mission ("zero violations") is reachable.
Realistic division of labor: claude does the semantic edits (F401 → `as`-alias pattern); `ruff check --fix && ruff format` postprocess mops up structural autofixes (I001 sort). This both raises accept rate and mirrors how production teams chain LLMs with deterministic tooling.

Test plan

`bash setup.sh && evolution-kernel --config examples/oss_fix_demo/evolution.yml --repo ... --ledger ... --loop` → run 0001 accepted, `evaluation.json` shows `ruff_violations_remaining: 0`.
Real commit visible: `git -C /tmp/ek-oss-fix-target log --oneline evolution/accepted` → 2 commits.
`patch.diff` shows the semantic edits + autofix postprocess (355 lines).
Existing test suite still passes (no kernel changes).

🤖 Generated with Claude Code

Lands the LLM-driven companion to examples/quickstart/. The target is a real published OSS package (python-slugify v8.0.4 — 1,106 LoC, MIT, cloned by setup.sh at a pinned tag). The executor is `claude -p --permission-mode acceptEdits` inside a custom Python wrapper that uses the operator's Claude Pro/Max subscription — no API key, no per-token charge. Planner and evaluator stay deterministic so the loop can be driven from a clean install without anthropic / openai SDKs. E2E on a developer laptop (2026-05-17): setup.sh: ~1s + clone bandwidth evolution-kernel --loop: ~48s total wall-clock (3 rounds) claude executor (round 1): 34s (10 ruff violations → 0) rounds 2-3: halt — ruff is already clean cost: $0 marginal (Pro subscription flat fee) Real evolution/accepted commit landed on the demo target's branch. Captured in examples/oss_fix_demo/README.md. Realistic division of labor in bots/executor.py: claude does the semantic edits (F401 "use explicit `as` re-export" pattern); a follow-up `ruff check --fix && ruff format` postprocess mops up structural autofixes (I001 import sort). This both raises the accept rate and mirrors how production teams chain LLMs with deterministic tooling. setup.sh writes a target-side pyproject.toml with `ignore = ["F403"]` because python-slugify's star-imports are load-bearing public-API re-exports — flagging them as a real "bug" would be incorrect. Files added under examples/oss_fix_demo/: - setup.sh — clones python-slugify, drops bots/, writes lint config - evolution.yml — references in-target bots, allowed_paths slugify/ - bots/planner.py — deterministic, snapshots ruff diagnostics into the plan - bots/executor.py — wraps `claude -p --permission-mode acceptEdits`, then runs ruff --fix && ruff format postprocess - bots/evaluator.py — re-runs ruff, accept iff exit 0 - README.md — full forensic walk-through with ledger snippets No kernel changes; 102 baseline tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a new examples/oss_fix_demo/ companion demo that runs Evolution Kernel against python-slugify using a Claude CLI executor, with deterministic planner/evaluator scripts and setup documentation.

Changes:

Adds setup and README for cloning/preparing the python-slugify demo target.
Adds demo config wiring planner, Claude executor wrapper, evaluator, ruff evidence, and mutation scope.
Adds bot scripts for planning ruff cleanup, invoking claude -p, postprocessing with ruff, and evaluating ruff cleanliness.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`examples/oss_fix_demo/setup.sh`	Bootstraps the external OSS target repo and commits demo bots/config.
`examples/oss_fix_demo/README.md`	Documents prerequisites, run steps, measured output, and ledger artifacts.
`examples/oss_fix_demo/evolution.yml`	Defines mission, ruff evidence, allowed paths, hard stops, and role commands.
`examples/oss_fix_demo/bots/planner.py`	Emits a canned ruff-cleanup plan with current diagnostics.
`examples/oss_fix_demo/bots/executor.py`	Wraps Claude CLI execution and applies ruff postprocessing.
`examples/oss_fix_demo/bots/evaluator.py`	Accepts candidates when `ruff check slugify/` is clean.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+# Snapshot ruff state at observe-time so the planner sees concrete diagnostics.
+evidence_sources:
+  - type: shell
+    command: "python3 -m ruff check slugify/ || true"


+prompt = f"{plan.get('summary', 'improve the codebase')}\n\nSteps:\n{steps}\n\nOnly modify files within: {', '.join(plan.get('allowed_paths', ['slugify/']))}"
+
+claude_bin = os.environ.get("EK_CLAUDE_BIN", "claude")
+extra_args = os.environ.get("EK_CLAUDE_ARGS", "--permission-mode acceptEdits").split()


+
+## Want to point this at *your* OSS repo?
+
+Replace the `git clone` line in `setup.sh` with your own URL/tag, point `mutation_scope.allowed_paths` in `evolution.yml` at the right subtree, and rewrite the planner's `summary`/`steps` to describe your goal. The bots are <50 LoC each — easy to fork.


+#   2. Copy local-deterministic bots/ into it and commit on the target's
+#      first commit so the role scripts are reachable inside every


+|---|---|---|
+| `setup.sh` (clones slugify, commits bots, init repo) | ~1 s + clone bandwidth | \$0 |
+| Run 1 — claude executor + ruff postprocess + evaluator | **34 s** (claude alone) / **~48 s** total wall-clock for the 3-round loop | Claude Pro subscription, flat fee — no per-token charge |
+| Run 2 — halt: ruff already clean, no changes to make | <1 s | \$0 |


+| `observation.json` | Ruff snapshot the planner saw |
+| `plan.json` | Canned plan + verbatim ruff diagnostics |


+# Count remaining violations from the "Found N errors." footer if present.
+remaining = 0
+for line in proc.stdout.splitlines():
+    if line.startswith("Found ") and "error" in line:
+        try:
+            remaining = int(line.split()[1])
+        except (ValueError, IndexError):
+            pass


+            "tool": "claude-code",
+            "exit": proc.returncode,
+            "elapsed_seconds": round(elapsed, 2),
+            "postprocess": postprocess,
+            "changed_files": changed,
+            "stdout_tail": proc.stdout[-800:],
+            "stderr_tail": proc.stderr[-400:],


+# Mop up any remaining autofix-only diagnostics (import sort, whitespace).
+# This is realistic: in real workflows you run the formatter after the LLM.
+postprocess = []
+for cmd in (
+    ["python3", "-m", "ruff", "check", "--fix", "--unsafe-fixes", "slugify/"],


Copilot AI review requested due to automatic review settings May 17, 2026 06:31

Protocol-zero-0 merged commit d93b136 into main May 17, 2026
5 checks passed

Protocol-zero-0 deleted the feat/issue-31-oss-fix-demo branch May 17, 2026 06:32

Copilot started reviewing on behalf of Protocol-zero-0 May 17, 2026 06:32 View session

Protocol-zero-0 mentioned this pull request May 17, 2026

chore: release v1.1.0 #35

Merged

Copilot AI reviewed May 17, 2026

View reviewed changes

Protocol-zero-0 mentioned this pull request May 17, 2026

技术纲要与路线图：evolution-kernel 从 MVP 到灵魂插件 #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: examples/oss_fix_demo/ — real OSS fix via claude CLI#32

feat: examples/oss_fix_demo/ — real OSS fix via claude CLI#32
Protocol-zero-0 merged 1 commit into
mainfrom
feat/issue-31-oss-fix-demo

Protocol-zero-0 commented May 17, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		## Want to point this at your OSS repo?

		Replace the `git clone` line in `setup.sh` with your own URL/tag, point `mutation_scope.allowed_paths` in `evolution.yml` at the right subtree, and rewrite the planner's `summary`/`steps` to describe your goal. The bots are <50 LoC each — easy to fork.

		# 2. Copy local-deterministic bots/ into it and commit on the target's
		# first commit so the role scripts are reachable inside every

		\| `observation.json` \| Ruff snapshot the planner saw \|
		\| `plan.json` \| Canned plan + verbatim ruff diagnostics \|

Conversation

Protocol-zero-0 commented May 17, 2026

Summary

Measured (2026-05-17, dev laptop)

Design notes (worth recording)

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants