Skip to content

feat: examples/oss_fix_demo/ — real OSS fix via claude CLI#32

Merged
Protocol-zero-0 merged 1 commit into
mainfrom
feat/issue-31-oss-fix-demo
May 17, 2026
Merged

feat: examples/oss_fix_demo/ — real OSS fix via claude CLI#32
Protocol-zero-0 merged 1 commit into
mainfrom
feat/issue-31-oss-fix-demo

Conversation

@Protocol-zero-0

Copy link
Copy Markdown
Owner

Closes #31.

Summary

  • Companion to `examples/quickstart/`. Where quickstart is fully deterministic, this example wires `claude -p` (Claude Pro / Max subscription) in as the executor and points at a real published OSS package — python-slugify v8.0.4, 1,106 LoC, MIT.
  • Mission: drive ruff to zero violations on `slugify/`. Verified end-to-end: 48 s, claude wrote real edits, run 0001 accepted, real `evolution/accepted` commit landed.
  • No kernel changes. No new runtime dependencies. 102 baseline tests still pass.

Measured (2026-05-17, dev laptop)

Step Wall-clock Cost
`setup.sh` ~1 s + clone bandwidth $0
Run 1 (claude executor + ruff postprocess + evaluator) 34 s (claude alone) Claude Pro flat fee — no API key
Full 3-round `--loop` (1 accept + 2 halts) 48 s $0 marginal

`runs/0001/decision.json`:
```json
{"accepted": true, "candidate_commit": "bae97a8...", "reason": "hard gates passed and evaluator recommended promotion"}
```

`runs/0001/executor_output.json` (claude's verbatim summary):

Fixed both files: added explicit `as` re-exports for all version metadata imports in slugify/init.py, and sorted the from .slugify import names alphabetically in slugify/main.py.

Design notes (worth recording)

  1. The kernel-bundled `roles/executor.sh` claude-code path invokes `claude -p` without `--permission-mode acceptEdits`, so claude refuses to edit files non-interactively. This PR ships a small custom `bots/executor.py` that adds the flag. A follow-up issue to update the kernel path is a good post-v1.1 task.
  2. python-slugify's star imports are load-bearing public-API re-exports; `setup.sh` writes a target-side `pyproject.toml` that `ignore = ["F403"]` so the mission ("zero violations") is reachable.
  3. Realistic division of labor: claude does the semantic edits (F401 → `as`-alias pattern); `ruff check --fix && ruff format` postprocess mops up structural autofixes (I001 sort). This both raises accept rate and mirrors how production teams chain LLMs with deterministic tooling.

Test plan

  • `bash setup.sh && evolution-kernel --config examples/oss_fix_demo/evolution.yml --repo ... --ledger ... --loop` → run 0001 accepted, `evaluation.json` shows `ruff_violations_remaining: 0`.
  • Real commit visible: `git -C /tmp/ek-oss-fix-target log --oneline evolution/accepted` → 2 commits.
  • `patch.diff` shows the semantic edits + autofix postprocess (355 lines).
  • Existing test suite still passes (no kernel changes).

🤖 Generated with Claude Code

Lands the LLM-driven companion to examples/quickstart/. The target is a
real published OSS package (python-slugify v8.0.4 — 1,106 LoC, MIT,
cloned by setup.sh at a pinned tag). The executor is `claude -p
--permission-mode acceptEdits` inside a custom Python wrapper that uses
the operator's Claude Pro/Max subscription — no API key, no per-token
charge. Planner and evaluator stay deterministic so the loop can be
driven from a clean install without anthropic / openai SDKs.

E2E on a developer laptop (2026-05-17):
  setup.sh:                     ~1s + clone bandwidth
  evolution-kernel --loop:     ~48s total wall-clock (3 rounds)
    claude executor (round 1):  34s (10 ruff violations → 0)
    rounds 2-3:                 halt — ruff is already clean
  cost:                         $0 marginal (Pro subscription flat fee)

Real evolution/accepted commit landed on the demo target's branch.
Captured in examples/oss_fix_demo/README.md.

Realistic division of labor in bots/executor.py: claude does the
semantic edits (F401 "use explicit `as` re-export" pattern); a follow-up
`ruff check --fix && ruff format` postprocess mops up structural
autofixes (I001 import sort). This both raises the accept rate and
mirrors how production teams chain LLMs with deterministic tooling.

setup.sh writes a target-side pyproject.toml with `ignore = ["F403"]`
because python-slugify's star-imports are load-bearing public-API
re-exports — flagging them as a real "bug" would be incorrect.

Files added under examples/oss_fix_demo/:
- setup.sh — clones python-slugify, drops bots/, writes lint config
- evolution.yml — references in-target bots, allowed_paths slugify/
- bots/planner.py — deterministic, snapshots ruff diagnostics into the plan
- bots/executor.py — wraps `claude -p --permission-mode acceptEdits`,
                     then runs ruff --fix && ruff format postprocess
- bots/evaluator.py — re-runs ruff, accept iff exit 0
- README.md — full forensic walk-through with ledger snippets

No kernel changes; 102 baseline tests still pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 17, 2026 06:31
@Protocol-zero-0 Protocol-zero-0 merged commit d93b136 into main May 17, 2026
5 checks passed
@Protocol-zero-0 Protocol-zero-0 deleted the feat/issue-31-oss-fix-demo branch May 17, 2026 06:32

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new examples/oss_fix_demo/ companion demo that runs Evolution Kernel against python-slugify using a Claude CLI executor, with deterministic planner/evaluator scripts and setup documentation.

Changes:

  • Adds setup and README for cloning/preparing the python-slugify demo target.
  • Adds demo config wiring planner, Claude executor wrapper, evaluator, ruff evidence, and mutation scope.
  • Adds bot scripts for planning ruff cleanup, invoking claude -p, postprocessing with ruff, and evaluating ruff cleanliness.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
examples/oss_fix_demo/setup.sh Bootstraps the external OSS target repo and commits demo bots/config.
examples/oss_fix_demo/README.md Documents prerequisites, run steps, measured output, and ledger artifacts.
examples/oss_fix_demo/evolution.yml Defines mission, ruff evidence, allowed paths, hard stops, and role commands.
examples/oss_fix_demo/bots/planner.py Emits a canned ruff-cleanup plan with current diagnostics.
examples/oss_fix_demo/bots/executor.py Wraps Claude CLI execution and applies ruff postprocessing.
examples/oss_fix_demo/bots/evaluator.py Accepts candidates when ruff check slugify/ is clean.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Snapshot ruff state at observe-time so the planner sees concrete diagnostics.
evidence_sources:
- type: shell
command: "python3 -m ruff check slugify/ || true"
prompt = f"{plan.get('summary', 'improve the codebase')}\n\nSteps:\n{steps}\n\nOnly modify files within: {', '.join(plan.get('allowed_paths', ['slugify/']))}"

claude_bin = os.environ.get("EK_CLAUDE_BIN", "claude")
extra_args = os.environ.get("EK_CLAUDE_ARGS", "--permission-mode acceptEdits").split()

## Want to point this at *your* OSS repo?

Replace the `git clone` line in `setup.sh` with your own URL/tag, point `mutation_scope.allowed_paths` in `evolution.yml` at the right subtree, and rewrite the planner's `summary`/`steps` to describe your goal. The bots are <50 LoC each — easy to fork.
Comment on lines +6 to +7
# 2. Copy local-deterministic bots/ into it and commit on the target's
# first commit so the role scripts are reachable inside every
|---|---|---|
| `setup.sh` (clones slugify, commits bots, init repo) | ~1 s + clone bandwidth | \$0 |
| Run 1 — claude executor + ruff postprocess + evaluator | **34 s** (claude alone) / **~48 s** total wall-clock for the 3-round loop | Claude Pro subscription, flat fee — no per-token charge |
| Run 2 — halt: ruff already clean, no changes to make | <1 s | \$0 |
Comment on lines +84 to +85
| `observation.json` | Ruff snapshot the planner saw |
| `plan.json` | Canned plan + verbatim ruff diagnostics |
Comment on lines +24 to +31
# Count remaining violations from the "Found N errors." footer if present.
remaining = 0
for line in proc.stdout.splitlines():
if line.startswith("Found ") and "error" in line:
try:
remaining = int(line.split()[1])
except (ValueError, IndexError):
pass
Comment on lines +64 to +70
"tool": "claude-code",
"exit": proc.returncode,
"elapsed_seconds": round(elapsed, 2),
"postprocess": postprocess,
"changed_files": changed,
"stdout_tail": proc.stdout[-800:],
"stderr_tail": proc.stderr[-400:],
Comment on lines +45 to +49
# Mop up any remaining autofix-only diagnostics (import sort, whitespace).
# This is realistic: in real workflows you run the formatter after the LLM.
postprocess = []
for cmd in (
["python3", "-m", "ruff", "check", "--fix", "--unsafe-fixes", "slugify/"],
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v1.1: examples/oss_fix_demo/ — real OSS fix via claude CLI

2 participants