Evolution Kernel defines a small control mechanism for autonomous self-evolving systems.
The kernel lets a host system receive a long-term or mid-term goal, observe itself and its environment, plan modifications, execute those modifications inside a sandbox, evaluate the result independently, then promote or roll back the version.
The core property is not "an agent that edits code." The core property is a reproducible evolution loop with isolated roles, versioned experiments, explicit rollback, and auditable evidence.
- It is not a general chat agent framework.
- It is not a prompt optimization loop.
- It is not allowed to mutate production state directly in v0.
- It does not require a specific LLM provider.
- It does not trust natural-language self-reports as proof of improvement.
The protocol separates intelligence from authority.
The governor is deterministic orchestration code, not an intelligent agent.
Responsibilities:
- load the goal and constraints
- create sandbox worktrees
- dispatch isolated tasks to planner, executor, and evaluator
- record commits, diffs, metrics, decisions, and rollback points
- promote accepted experiments
- discard runtime effects from failed experiments
- write the audit ledger
The governor must not invent strategy, modify host code, or judge qualitative success.
The planner decides the next evolution experiment.
Inputs:
- goal spec
- current accepted version
- summarized historical ledger
- latest evaluation metrics
- resource budget
Outputs:
- a structured plan
- files or modules allowed to change
- expected improvement
- risks
- tests or metrics the evaluator should care about
The planner must not edit files or mark an experiment as successful.
The executor applies one plan inside a sandbox.
Inputs:
- plan
- sandbox checkout of the accepted version
- explicit mutation scope
Outputs:
- patch or commit
- implementation notes
- commands attempted
- test commands it ran locally, if allowed
The executor must not promote its own work. It should not receive evaluator hidden rubrics or the full private scoring logic.
The evaluator judges the sandbox result in a separate context.
Inputs:
- goal spec
- accepted baseline
- candidate patch or commit
- public evaluation rules
- golden set
- test commands
Outputs:
- machine-readable metrics
- pass/fail decision recommendation
- regression report
- complexity report
- reproducibility report
The evaluator must not edit files. It should not rely on executor self-explanations as evidence.
Planner, executor, and evaluator are independent entities. They communicate through files written by the governor, not through a shared free-form conversation.
The shared state is a ledger, not a memory transcript.
Allowed communication:
goal.yamlplan.jsonpatch.diffevaluation.jsondecision.jsonreflection.json
Disallowed communication in v0:
- planner and executor sharing a live chat context
- executor seeing hidden evaluator scoring details
- evaluator accepting executor's explanation without running tests
- any role mutating accepted state directly
The v0 version store is Git.
Abstract state:
accepted_version_n
-> sandbox_experiment_n+1
-> candidate_commit_n+1
-> accepted: promote to accepted_version_n+1
-> rejected: keep failed ledger, roll back to accepted_version_n
Required version records:
- accepted commit before the run
- sandbox branch or worktree path
- candidate commit
- patch diff
- decision
- rollback target
Failed experiments are not erased. Their runtime effects are discarded, but their evidence remains in the ledger.
Every experiment runs in a sandbox created from the latest accepted version.
v0 sandbox requirements:
- only repo-local file mutations are allowed
- network access is disabled unless explicitly configured by the host
- secrets are not mounted by default
- production state is not mounted
- evaluator runs from a clean checkout or clean worktree
- promotion happens only through the governor
The sandbox is a boundary around observation, modification, and testing, not only around command execution.
Recommended filesystem layout:
evolution-ledger/
goal.yaml
accepted/
current_commit.txt
runs/
0001/
planner_input.json
plan.json
executor_input.json
patch.diff
candidate_commit.txt
evaluator_input.json
evaluation.json
decision.json
reflection.json
failed/
0001-summary.json
The ledger must be sufficient for an external auditor to reconstruct what happened without reading private chat state.
Minimal goal fields:
name: token-ignition-review-evolution
objective: Improve evaluator accuracy while minimizing implementation complexity.
target_metrics:
talent_detection_accuracy: maximize
false_positive_rate: minimize
false_negative_rate: minimize
reproducibility_score: maximize
complexity_score: minimize
constraints:
only_modify_repo_files: true
require_sandbox: true
require_git_versioning: true
max_iterations: 20
max_wall_time_minutes: 60
acceptance:
require_existing_tests_pass: true
require_golden_set_non_regression: true
require_audit_log: trueA candidate version can be promoted only if all hard constraints pass.
After hard constraints pass, the governor compares the candidate against the accepted baseline using the host's fitness function.
Example:
fitness =
talent_detection_accuracy
+ reproducibility_score * 0.25
- false_positive_rate * 0.5
- false_negative_rate * 0.75
- complexity_penalty * 0.2
The exact function is host-specific, but it must be declared before evaluation.
Reflection is not proof. It is compressed learning for future planning.
A reflection should capture:
- what changed
- what improved
- what regressed
- why the governor accepted or rejected the candidate
- what future planners should avoid
- what follow-up experiment is suggested
Reflection must be linked to metrics and commits.
The first implementation should support:
- Git-backed versioning
- local worktree sandbox
- file-based ledger
- three role prompts or adapters
- deterministic governor
- host-provided test commands
- host-provided golden set
- rollback by deleting or abandoning failed worktree branches
It should not support production deployments, database mutation, credentialed network actions, or hidden evaluator state in v0.