ci: Standardize benchmarks and input constants#467
Draft
samuelburnham wants to merge 27 commits into
Draft
Conversation
43d00a3 to
eb6b068
Compare
Lowering the witness cap below Zisk's built-in default (10) was measured to have a negligible effect on host RAM and prove time for the kernel typecheck workload, so the CLI override (which defaulted to 5) is removed and the prover uses EmbeddedOpts::default(). Drop the now-stale flag mentions and per-RAM tuning guidance from the README, the cost-model doc, and the shard.rs model comments.
Rebased onto main (post #411 + #459) and integrated with the renamed bench-main.yml and reworked bencher-track interface. - Benchmarks/Vectors.csv: single shared source of truth (71 library constants from Init/Std/Mathlib/Lean). Consumed by Aiur (bench-typecheck --manifest), the zkVM hosts, and shell. - bench-pr.yml: `!benchmark [aiur] [zisk] [sp1|all] [execute|prove]` over the curated set, posting a main-vs-PR table; main results cached by base SHA. Hardened: comment body from env (no injection), allowlisted env parsing. - .github/scripts/{bench.py,run.sh}: parse/manifest/compare/comment, and the compile-.ixe + backend driver (cycles/execute-time/throughput/peak-rss, plus shards/max-shard-cycles for sharded runs). - .github/actions/install-{sp1,zisk}: shared zkVM toolchain + deps install, used by bench-pr.yml, bench-main.yml, and riscv-bench.yml. - bench-main.yml: add zkvm-execute (Zisk/SP1 cycle counts + time/RAM) and native-check (native parallel `ix check --anon` throughput) jobs, using the new bencher-track workload/thresholds interface. - bench-typecheck: add --constant / --skip-deps (align with the zkVM hosts; --skip-deps replaces --subject-only) and --execute-only (fast execute path).
issue_comment workflows only run from the default branch, so the `!benchmark` path can't be exercised on a PR branch. Add a pull_request trigger (base/head from the PR payload; empty comment → parser defaults) to test pre-merge. Revert before merge — delete the `pull_request:` trigger; the dual base/head resolution and `|| pull_request.number` fallbacks are harmless to keep.
The full curated set (~60 InitStd) is too slow to run on every !benchmark and on the bencher prove/zkVM jobs. Add a `primary` column to Vectors.csv marking 11 constants spanning shape (nat/list/array/int/string/vector/multiset, defs + proofs) and the cheap→heavy cost range (incl. 3 shard targets), and make it the default: - bench.py manifest --primary; parse honors BENCH_FULL (run the full set). - bench-pr.yml: !benchmark defaults to the primary subset; BENCH_FULL=1 runs the whole curated set. - bench-main.yml: prove + zkvm-execute derive constants from `manifest --primary` (replacing the hardcoded lists), so all backends bench the same set from the one source of truth. The tier filter keeps prove on the cheap primaries (heavy ones would OOM a single-shard prove); execute/native get the heavy ones for scale.
Bring both surfaces (!benchmark PR comment and bencher.dev) to parity across all four backends and remove the test scaffolding. - Native kernel on !benchmark: add a `native` backend (whole-env `ix check --anon`, the parallel out-of-circuit kernel) — bench.py backend/runner/metrics, a run.sh `native` branch, and a GNU-time install for native PR cells. `all` now fans out aiur/zisk/sp1/native. - run.sh: single-source the Aiur path through a per-constant bench-typecheck loop (per-constant peak-rss), add a per-constant `timeout` to the zkVM execute path (heavy primaries can't hang a job), and accept the env arg case-insensitively (bencher reuses the cached InitStd.ixe; bench-pr compiles initStd.ixe). - bench-main.yml: the prove and native-check jobs now drive run.sh too (dedup with the PR path); zkvm-execute gains the Mathlib env to match prove/native. - bench-thresholds-reset.yml: register the zisk / sp1 / native-check workloads. - bench-pr.yml: drop the temporary pull_request trigger (keep the harmless dual-SHA / number fallbacks). - docs/benchmarking.md: document the two surfaces, backends, Vectors.csv, the !benchmark grammar, and the bencher workloads / threshold resets.
…maries too - Aiur prove now covers the whole primary subset: bench.py exempts --primary from the prove cheap-tier cap, and run.sh proves each constant whose Aiur fft-cost fits the prover RAM ceiling (AIUR_PROVE_MAX_FFT, ~128 GB at 2.34 GB per billion fft) and execute-only's the rest — so heavy primaries (only Vector.extract_append at ~145 GB) still report execute metrics instead of being dropped. BENCH_FULL prove stays capped at the cheap tier so it doesn't balloon. - native now reports two views per env: the whole env (`ix check --anon`, keyed by env) and a per-primary subject check (`ix check --consts`, keyed by constant) — apples-to-apples with the zkVM --skip-deps execute. Wired into both the native-check bencher job and the !benchmark native backend. - run.sh: fix a stream-corruption bug — tool stdout and ::warning::/::notice:: now go to logs/stderr so only JSON reaches the per-constant `jq -s` merge.
Route every backend's peak-rss and per-phase timings through tracing-texray: - peak-rss now comes from the fork's process-tree sampler, so Zisk's ASM microservices' memory (tens of GB in separate PIDs) is captured — a bare /proc/self/status read of the host missed it. Verified on real execute runs: sp1 ~9GB, zisk ~49GB peak. - zisk/sp1 hosts and bench-typecheck self-report cycles/time/throughput/ peak-rss via --json, retiring run.sh's grep/awk/time parsing. They also emit per-phase span timings via --texray-json; native check-rs reports peak-rss as a 6th ##check## field. - run.sh folds span timings into a per-constant `phases` object; bench.py renders a collapsible per-phase drill-down in the !benchmark comparison, and bench-main.yml tracks phase:<span> measures on bencher. tracing-texray is pinned to the argumentcomputer/tracing-texray json-ram branch (rev 15ae57c), which adds the process-tree RSS sampler + JSON sink.
…h -check - Backend `native` is renamed to `ooc` (out-of-circuit) throughout: the BACKENDS tuple in bench.py, the run.sh case arm, the `native-check` job in bench-main.yml (now `ooc-check`), the `\!benchmark` grammar in bench-pr.yml, and docs/benchmarking.md. - Bencher workload names carry an explicit `-check` suffix so the tracked identity is self-describing: `aiur-check`, `zisk-check`, `sp1-check`, `ooc-check` (was `aiur`, `zisk`, `sp1`, `native-check`). Reset workflow tokens (options / valid / accepted / error text), tag-reference comments, and the bencher-track action's docstring updated to match. - Clarify the `\!benchmark` grammar: `([aiur] [zisk] [sp1] [ooc] | all)` — `all` is an alternative to the whole backend list, not just to `ooc`.
…ness
**bench.py !benchmark surface**
- `fetch-main`: queries bencher.dev's public reports API for `branch=main` at
the base SHA, filters results by `--names`, and reshapes into the neutral
`{ "<name>": { "<metric>": v } }` shape. Replaces the actions/cache layer;
bench-pr.yml falls back to a local base run only when bencher hasn't
ingested the base SHA yet. TODO left for non-main base branches.
- Mode is fixed per backend (`aiur=prove`, others=`execute`); the optional
bare `execute` token in `!benchmark` flips aiur to `--execute-only`.
Grammar drops `[execute|prove]` and `BENCH_GPU`; zkVM prove paths removed.
- Single runner `warp-ubuntu-latest-x64-32x` for every cell; testbeds
unified to `<backend>-check-x64-32x`; `MAIN_TESTBEDS` maps (backend, mode)
to the slug.
- `compare` renders the OOM sentinel `{"oom": true}` as `OOM` cells + `n/a`
Δ%. Regression/improvement flags fire on every metric column (previously
only the first); summary counts distinct constants; worst names constant
+ metric. `_human` is unit-aware: bytes → GiB/MiB, seconds → µs/ms/s/m,
counts → K/M/B/T.
- Sub-span (per-constant phase) drill-down removed with a TODO; `run.sh`'s
`merge_phases` and `bench-main.yml`'s `phase:<span>` flattening still
populate the neutral JSON for later reinstatement.
**CLI unification** (bench-typecheck, zisk-host, sp1-host)
- Dropped `--constant`; added `--consts <n1,n2,…>` (comma-list, clap
`value_delimiter` on the Rust hosts) and `--consts-file <path>`.
Multi-const runs loop the single-const path per name, accumulating one
entry per name via a merge-safe `write_json_entry`.
- Dropped `--texray-json <path>` and `--no-texray`. `--texray` (bool) alone
toggles the tracing-texray subscriber; combined with `--json <path>`,
span timings auto-write to `<json>.spans`.
- Inline `#[cfg(test)] mod cli_tests` in each Rust host covers
value-delimiter parsing, `--consts` `requires`, and `collect_consts`
union/dedup.
**Vectors.csv**
- Removed `aiur_fft`, `zisk_cycles` (measurements live in bencher, never
here). Rows can now omit trailing zero fields; parser tolerates 3+ cols.
- Primary set curated: renamed `Vector.extract_append` →
`Vector.extract_append._proof_1` (kept as shard_target); promoted
`Vector.append` and `Nat.sub_le_of_le_add` to primary; added the un-
shardable Init constants from the Zisk cost-model doc's canonical 12
(`Char.ofOrdinal_le_of_le`, `Array.extract_append._proof_1_1`, the
`SInt.Int{8,16,32,64}.instRxcHasSize_eq` family with the case corrected),
plus `ByteArray.utf8DecodeChar?_utf8EncodeChar_append`, `String.append`,
`IxVMPrim.nat_pow_big`, and
`Std.Tactic.BVDecide.BVExpr.bitblast.goCache_Inv_of_Inv._mutual`.
Trimmed obvious duplicates (`Array.qsort`, `Int.ediv`, `List.dropLast`,
`List.range`, `UInt32.toNat`, `Std.Time.Month.Offset.ofNat`).
**RAM harness** (run.sh, aiur prove)
- Tier gate removed. Every constant attempts a full prove.
- `watch_ram_kill` samples `ps -eo pid,ppid,rss` every ~3 s and SIGKILLs
the tree if it exceeds `AIUR_PROVE_MAX_RSS_GB` (default 120 GB — 8 GB
headroom under 128 GB). Killed constants record `{"oom": true}` for the
compare table.
**Misc**
- `install-zisk`'s description now correctly says the proving key is
installed (`client.setup()` loads const-trees before execute too).
- `bencher-track`'s workload description enumerates all options.
- `zisk/Cargo.toml`'s patch example uses `/path/to/…`, not `/home/ubuntu`.
- `docs/zisk-cycle-cost-model.md` finding #4 disambiguates
"not-shardable" (mutual blocks only) from "not full-closure-single-leaf
provable" (the canonical 12).
- `riscv-bench.yml` still on the temporary `sb/ci-benchmarks` push
trigger; drop before merge.
…threshold + doc cleanup
**`compile` as a !benchmark backend**
- bench.py: `DEFAULT_MODE["compile"]="compile"`,
`MAIN_TESTBEDS[("compile","compile")]="ix-compile-x64-32x"`,
`METRICS[("compile","compile")]=["compile-time","throughput","file-size",
"constants"]`. `cmd_manifest` short-circuits on `--backend compile`: writes
a one-line names.txt with the CamelCase env slug (`InitStd` / `Lean` /
`Mathlib` / `FLT`) and skips the CSV. New `--backend` arg on manifest.
- run.sh: setup step teed to `$compile_log` for every backend; `compile`
ignores REUSE_IXE (needs a fresh compile to measure) and parses
`##benchmark## <elapsed_ms> <bytes> <constants>` into the neutral
`{ "<CamelCase>": {"compile-time","file-size","constants","throughput"} }`
shape. Compile job on main (`bench-main.yml` testbed
`ix-compile-x64-32x`) is the same run bencher already stores.
- bench-pr.yml: `!benchmark ([aiur] [zisk] [sp1] [ooc] [compile] | all)`;
manifest step passes `--backend "$BACKEND"`.
**zkVM parity with ooc (drop `--skip-deps`)**
- run.sh zisk/sp1 branch: `--consts <c>` without `--skip-deps` so the check
is directly comparable to `ooc`'s `ix check-rs --consts <c>` (also
full-closure). Comparing subject-only zkVM against full-closure ooc mixed
in-circuit-vs-out-of-circuit overhead with scope; both are now
full-closure, so the delta isolates just the overhead.
**Threshold semantics for deterministic-but-directional measures**
- bench-main.yml zkvm-execute (now zkvm-typecheck): `cycles`, `shards`,
`max-shard-cycles` change from pinned `0/0` to `upper 0 / lower _`.
These are deterministic (no noise) but a real guest / packer improvement
legitimately drops them; the old pin flagged wins as regressions.
`constants` (definitional count) stays `0/0`.
**docs/benchmarking.md — stale content swept**
- Removed the "design-level skips post a note explaining why" paragraph
(that machinery was already removed).
- Replaced the `cheap → prove / heavy → execute-only` fallback description
with the RAM-watchdog reality (attempts prove for every primary; OOMs go
through as `{"oom": true}` sentinels rendered `OOM`).
- Backends table adds `compile`; drops metrics `aiur` doesn't actually emit.
- "Constant set": ~20 primaries in ~68 total (was ~11 in ~60); notes tier
is informational-only now, rows may omit trailing zeros, and `compile`
short-circuits the CSV.
- Per-measure threshold breakdown replaces the incorrect "Deterministic
measures … are pinned exactly" claim.
- "Not yet covered" now lists three real TODOs (zkVM prove, per-constant
phase drill-down, non-main base branches) with pointers to the in-code
TODO markers.
**Code comment audit** — every touched comment/docstring re-read; stale
references to `--skip-deps` parity, tier-based fallback, sub-span
drill-down, and cache-hit rendering all updated to match implementation.
Δ% is easier to interpret at scale when paired with a ratio — e.g.
`-33.3% (1.50x faster)` for a big prove-time drop; `+15.2% (1.15x slower)`
for a real execute-time regression. Only shown when the factor is >= 1.05x
(roughly ±5% in either direction) so sub-noise deltas don't get
`(1.02x slower)` clutter next to `+1.1%`. Cell format cascades cleanly:
`{Δ%} [(1.NN× {slower|faster})] [emoji]`.
…dedup Fixes from the branch-wide review, plus build-once caching and a dedup pass. **check-rs full-closure mode (new kernel FFI)** - `rs_kernel_check_anon_consts`: resolve displayed names via the env's `named` map (the zkVM hosts' resolution), then anon-check the seeds' FULL dependency closures via the existing `closure_addrs` + `build_anon_work` primitives — no sub-env serialization. `skip_deps` restricts to subject-only. `index_anon_work` extracted so the whole-env and filtered paths share the result-slot indexing. - `ix check-rs --anon --consts <names>` now runs that (full-closure default, `--skip-deps` opt-out — the hosts' CLI shape); meta-mode seeded checks are unchanged (subject-only, bisect flows intact). run.sh's ooc per-constant view uses it, so ooc-vs-zkVM deltas genuinely isolate in-circuit overhead (both sides now full-closure anon; previously ooc was subject-only meta). **OOM sentinel end-to-end** - run.sh merges `oom: true` INTO already-measured Phase-1 metrics (was: overwrite); compare renders OOM only for the missing metrics. - `bench.py bmf` (one neutral→BMF converter replacing four hand-copied jq/awk pipelines in bench-main.yml) strips the boolean sentinel — one OOM row no longer fails the whole bencher upload. This also covers the zkVM execute path, which gained its own RAM watchdog + sentinel. - Watchdog: setsid + process-group kill (reaches all descendants); kernel OOM beating the 3 s sampler (exit 137, no marker) labels OOM instead of silently dropping; per-constant wall-clock timeouts (AIUR_PROVE_TIMEOUT 50m / AIUR_EXECUTE_TIMEOUT 25m); $out re-merged per constant so a job-level kill keeps completed rows. **compare correctness** - throughput is higher-is-better: flags/worst/ratio direction-aware (HIGHER_IS_BETTER + _badness). - Ratio wording follows metric kind: times faster/slower, bytes larger/smaller, counts more/fewer. - `execute-peak-rss`: Phase-1 RSS high-water sampled at the Phase 1/2 boundary in bench-typecheck (both modes) and adopted by the zkVM hosts' execute JSON — one name for "execute-phase peak" on every backend; bare `peak-rss` reserved for prove-phase peaks. aiur-execute compares it apples-to-apples against prove-run baselines (raw peak-rss would dwarf it). - One-side-empty tables get a loud note (e.g. CLI-incompatible base). - Tidy JSON: bench-typecheck emits decimal `JsonNumber`s (`jsonRound`) instead of Float's full binary repr. **fetch-main robustness** - Retry with backoff + newest-first pagination (base SHA beyond page one no longer forces a permanent fallback). - Exit codes honored by bench-pr.yml: 3 (transient) → local fallback; 2 (BACKEND_TABLE / bench-main drift) fails the cell loudly. - ooc whole-env row keyed CamelCase on both sides (benv_cc table in run.sh, ENV_CC in bench.py) and admitted past the --names filter. **build-once + caching (bench-pr.yml)** - New build job: ix + bench-typecheck built once per head SHA (they embed the IxVM kernel/prover), cached as `bench-bins-<sha>`; cells restore into .bins/pr instead of per-cell Lean builds. Re-running \!benchmark on the same commit skips the build. - Base fallback restores bench-main's own `bench-bins-`/`bench-ixe-` caches (toolchain cmp + mathlib-oleans guard) before paying for a from-scratch base build. PR-side .ixe cached across cells. - Cache keys renamed `aiur-bench-bins-`/`aiur-ixe-` → `bench-bins-`/`bench-ixe-` (they serve every backend). - run.sh resolves tools in-tree first, then PATH (`resolve_bin`). **dedup** - bench.py: DEFAULT_MODE + METRICS + MAIN_TESTBEDS collapsed into one BACKEND_TABLE (mode / testbed / per-mode metrics). - bench-main's compile job routes through run.sh's compile backend + bmf — the `##benchmark##` line is parsed in exactly one place (run.sh gains `flt` in its env table for the FLT matrix cell). - `Ix.Cli.ConstsFile`: one names-file/comma-list parser (inline-`#` comments, dedup) shared by check-rs meta+anon, bench-typecheck, and `ix compile --exclude-file` — closes the whole-line-vs-inline comment drift across the five previous copies. **hosts** - clap `requires = "consts"` dropped from --json/--skip-deps (rejected valid --consts-file-only runs); validated in main after collect_consts. zisk's multi-`--ixe` guard now covers --consts-file. **misc** - Vectors.csv / docs tier semantics corrected (manifest-only consumer); stale prove-fallback comments fixed; tracing-texray bumped to bd4faa08; push-to-main workflows drop their concurrency groups (every merged commit must be benchmarked; a later merge must never cancel one). Verified: cargo check -p ix-ffi, lake build ix / bench-typecheck, live runs of the new check-rs mode against a 480 MB Init/Std env (closure = 34 items / 42 targets for Nat.add_comm; --skip-deps = 1; missing names error loudly), bmf/parse/compare dry-runs, YAML + bash -n + AST checks. zisk/sp1 host crates compile-verified in CI only (local sandbox lacks the pil2-proofman C++ toolchain).
…key) bench-main.yml's zkvm-execute job now runs real executions of both hosts on every main push, but tolerates per-constant failures by design (dropped rows, OOM sentinels) — it never turns red on a breakage. This workflow becomes the red-X signal instead, kept cheap: - Build-only + unit tests: `cargo build --release --bin <host>` plus `cargo test --release --bin <host>`, which covers the clap surface run.sh drives (--consts comma-splitting, --consts-file union/dedup, shard-plan conflicts). No execution, so no fixture compile job, no minimal.ixe artifact, no memlock prlimit. - install-zisk gains a `proving-key` input (default true): the key is loaded at runtime by `client.setup()`, never at build time, so the build gate skips the ~3 GB download + const-tree regeneration. - Renamed "RISC-V bench" → "zkVM host build" (it no longer benchmarks). - The TEMPORARY sb/ci-benchmarks push trigger (from the mid-branch test commit) reverts to main-only here; the tip commit is its sole carrier. The runtime/typecheck smoke signal (executing myReflEq and asserting failures == 0) is retired: that coverage now lives in zkvm-execute's real runs, at the cost of being warnings-not-red-X.
zisk-host pulls zisk-sdk as both a dependency and a build-dependency, so cargo compiles proofman-starks-lib-c as two units whose build scripts can run concurrently — and both run `make` inside the SHARED ~/.cargo/git/checkouts/pil2-proofman-* source tree (not OUT_DIR). On a cold runner the Makefile stamp is absent, so both units take the `make clean` + `make -j` path and race: one unit's clean deletes build/ while the other's g++ is mid-compile, dying with "opening dependency file ….d: No such file or directory" (g++ writes .d files at the END of compilation, so the dir vanished underneath it). Warm runners never hit it — the stamp short-circuits the clean — which is why the race only surfaced on this branch's cold cache (lockfile change → new rust-cache key). install-zisk now builds the sys crate solo first: its build script runs once, `make` completes, the stamp lands; both units of the subsequent parallel build then skip the clean and their `make -j` is a no-op. Same total work, just ordered. The proper fix is an flock around the make in pil2-proofman's build.rs — worth an upstream PR.
`resolve_bin ix` ran unconditionally at startup, but bench-main's zkvm-execute job legitimately has no `ix` anywhere: it restores only the `.ixe` cache (REUSE_IXE=1 skips the compile step) and builds its zkVM host via cargo — before the resolve_bin refactor that job never touched `ix` at all, so the eager check regressed it with "ix not found (in-tree or PATH)". Resolve per use site instead: the compile step and the ooc branch resolve `ix`, the aiur branch resolves `bench-typecheck`, and the zkVM branch needs neither.
c73bafa to
00dd93d
Compare
…or names Three fixes for the failing bench-main runs: - run.sh: a heavy prove can die of memory without tripping the watchdog or the kernel OOM killer — one huge trace allocation fails and the runtime aborts (SIGABRT, exit 134) with an allocator message. looks_like_oom() folds that third case into the OOM classification so those constants get an OOM row instead of being dropped. Non-OOM failures now log the first lines of the tool output so drops are diagnosable from the job log. - run.sh: after a watchdog group-kill, the Zisk host never runs its Drop-time cleanup of /dev/shm/ZISK_* segments (multi-GB; the MT output segment alone starts at 6 GB), so the next host launch fails creating its own segments before Zisk's startup stale-pid sweep runs — exactly one dropped constant after every OOM kill (the alternating OOM/exit-1 pattern in the zisk-execute logs). Sweep the debris between constants. - Vectors.csv: IxVMPrim.nat_pow_big removed (kernel primitive, never present in compiled envs); Vector.extract_append._proof_1 → ._proof_2 and Array.extract_append._proof_1_1 → Array.extract_append (proof-term names are toolchain-dependent; both replacements verified against a freshly compiled initStd.ixe and measured heavy: fft-cost 23.4e9 / 40.1e9).
…nchmarks riscv-bench.yml's two jobs (build + unit-test the Zisk/SP1 hosts, no execution, no proving key) are cheap enough (~5 min with warm caches) to gate every PR commit, and ci.yml already has exactly the triggers and per-ref cancellation they need — so they move there and the standalone workflow goes away. sp1 benchmarks are commented out for now — the execute run is too slow: the sp1 cell of bench-main's zkvm-execute matrix and bench-pr's Install SP1 step. Uncomment both to re-enable. The \!benchmark backend-subset selection is unchanged.
00dd93d to
a959f36
Compare
The op (added with the native execution pipeline, #463) pushes two values onto the value map during execution — the quotient/remainder list-head pointers — and the constraint generator allocates two auxiliary columns for them, but the trace populator treated it as a no-op. Every ValIdx and witness column after the first big-Nat division in a block was therefore off by two, and trace population panicked with "index out of bounds" at trace.rs:308 — the exit-134 failures on every big-Nat-heavy prove in the bench-main aiur job (List.mergeSort, String.split, the SInt instRxcHasSize_eq family, Multiset.sort, …). Reproducible in seconds with any Nat-division user, e.g. `bench-typecheck --consts Nat.repr`. The trace arm now mirrors the execute arm: recompute (q, r) with num_bigint and resolve the head pointers execution already recorded in memory[10] via a read-only twin of build_klimbs_u64, pushing both as map entries + auxiliary columns. Verified: Nat.repr proves end-to-end (panicked before); `lake test -- --ignored aiur ixvm` passes; clippy clean.
a959f36 to
6949298
Compare
Prover changes can trade prove speed against proof size or verification cost, so Phase 2 now serializes each fresh proof (proof-size, bytes) and verifies it (verify-time; a verification failure is reported loudly and drops only that measure). The full-closure path rebuilds the Array-G claim from proveAddrWithEnv's serialized claim bytes — verify_claim's input is the 32-G blake3 digest of those bytes, the same recipe as `ix verify`. Both measures flow through the neutral JSON into the PR compare table (bench.py aiur prove metrics + byte/second unit mapping) and onto bencher (verify-time 10% upper, proof-size 5% upper). Measured: Nat.add_comm proves in 1.63s, verifies in 0.19s, 33.4 MB proof.
The constants that OOM as single full-closure leaves (extract_append proofs, the instRxcHasSize_eq family, …) are only measurable under env sharding — their checks fit in one manifest shard each, with deps checked in other shards (docs/zisk-cycle-cost-model.md, finding 4). Wire the existing offline partitioner into CI instead of any per-constant sharding: - compile job: after the compile benchmark, `ix profile <Env>.ixe` → `ix shard --max-ram 120` for InitStd/Mathlib; the `.ixes` manifest is cached next to the `.ixe` (every restore of that key lists the same paths — actions/cache versions entries by path list). - zisk-host: `--shard-plan --execute --json` now writes one env-level row — total cycles, shards, max-shard-cycles, execute-time, throughput, execute-peak-rss, and a per-shard `shard-cycles` breakdown — keyed by the new `--json-name` (default: manifest stem). - run.sh: the zkVM guest-run dance (setsid + RAM watchdog + OOM classification + shm sweep + merge) is factored into one `zkvm_run` helper; when a `.ixes` sits next to the `.ixe` (bench-main only — the \!benchmark PR path has none) the zisk branch appends the env-sharded run after the per-constant loop. - bench.py bmf: nested-object flattening generalized — `phases` → `phase:<span>` as before, any other dict (e.g. `shard-cycles`) → `<key>:<sub>` — so per-shard cycles land on bencher as un-thresholded measures while the thresholded aggregates (cycles, shards, max-shard-cycles) do the alerting. zkvm job timeout 60 → 150 min: the whole-env run (own 60m cap inside run.sh) rides on top of the per-constant loop.
fetch-main previously treated main.json as all-or-nothing: any name bencher lacked at the base SHA (typically constants the PR itself adds to Vectors.csv) silently rendered n/a. It now writes the uncovered subset to --missing-out and still exits 0; bench-pr runs the base checkout on JUST those names and merges under bencher's rows (bencher wins overlaps — it is the canonical main side), so a brand-new constant gets a real main-vs-PR delta on its first \!benchmark. The full-set base fallback (exit 3, SHA not ingested) is unchanged.
A PR can change the kernel's cost profile, so the env-sharded zisk run must not inherit main's partition. run.sh now cuts the manifest in-place (ix profile → ix shard) whenever one isn't already sitting next to the .ixe — the profiler counts heartbeats rather than wall time, so an unchanged tree re-partitions deterministically and per-shard comparisons stay meaningful. bench-main's compile-job manifest keeps pre-empting the generation on the main side; the \!benchmark PR side profiles its own tree (manifest cached per head SHA under bench-pr-ixes-*). The base fallback reuses bench-main's cached manifest and pays the env-sharded run only on a full bencher miss — a partial miss means bencher already holds main's env row (ZISK_ENV_SHARD=0 skips it). bench-pr benchmark timeout 120 → 180 min for the zisk cell's worst case (per-constant loop + profile + the env execute's own 60m cap).
6949298 to
07264b9
Compare
The shard breakdown carried cycles only. Each shard now also records its guest execute time (SDK-reported) and its RAM high-water — the texray sampler's peak is reset before every shard, so the windows are independent and the env row's execute-peak-rss becomes their max (the run's execution-phase high-water; setup RAM no longer counted). Uploaded as shard-time:<k> / shard-peak-rss:<k> alongside shard-cycles:<k>, all un-thresholded like phase:<span>. No prove-side equivalent yet: zisk proving isn't wired up in CI at all (needs a GPU runner), so there is no prove-time for any zisk row.
…mand Two ways to get a closure-only `.ixe` for one constant: - `ix compile <file.lean> --consts <n1,n2,…>` (+ --consts-file): seed the compile by exact constant name instead of the whole import env, with transitive deps via the existing collectDeps — same names vocabulary as `ix check --consts`. Mutually exclusive with --module. - `ix shard extract <env.ixe> --consts <n1,n2,…> --out <sub.ixe>`: the sharding pipeline's scoping step — cut the closure out of an EXISTING env without recompiling from source. The output carries the closure's genuine constant bytes, blobs, and reducibility hints (build_sub_env, now shared via sub_env_of), plus a name→address entry per closure constant so `--consts`-style tools still resolve. Metadata is dropped (real ConstantMeta references name addresses throughout its tree and would need the full hash-consed name index) — extracted envs serve the anon pipeline; meta-mode tools need the source env. A mutual-block member extracts its whole block. Verified against a fresh initStd.ixe: both forms produce envs whose bench-typecheck fft-cost is bit-identical to the full-env run (content addressing at work), and extract → ix profile → ix shard composes cleanly (Int8.instRxcHasSize_eq: 2.1 MB closure env, 16.08e9 steps → 11 shards at the 120 GiB cap).
Replaces the whole-env sharded run (wrong scope: all of InitStd is
1.94e12 steps / 1263 shards, and its biggest atomic block is INFEASIBLE
under the cap) with per-constant closure sharding through the canonical
pipeline. For each heavy-tier primary, run.sh runs `ix shard extract`
(closure-only env, no recompile) → `ix profile` → `ix shard --max-ram
120`, then one `zisk-host --shard-plan --execute` run executes the
shards sequentially — the constant's row carries totals plus the
per-shard shard-{cycles,time,peak-rss}:<k> breakdown, uploaded to
bencher like every other measure. Cheap primaries keep the single-leaf
--consts run.
- run.sh: cut_closure_shards (also exposed as the `cutshards` backend so
bench-main's compile job pre-cuts through the same code path) +
heavy-tier dispatch in the zisk loop (ZISK_HEAVY_NAMES). A constant
whose partition still can't fit (atomic mutual block over the cap)
OOMs under the RAM watchdog: honest OOM row, remaining shards skipped,
loop proceeds to the next constant. Cutting failures fall back to the
single-leaf run.
- bench.py manifest --heavy-out: the selected heavy-tier names, from
Vectors.csv's tier column.
- bench-main: the compile job pre-cuts zkshards-<Bench>/ next to the
fresh .ixe (it has ix + the toolchain; the zkvm job stays Lean-free)
and ships it to the zkvm job in the sha-keyed bench-ixe-* cache entry.
- bench-pr: the PR side cuts its own artifacts fresh every run (cheap —
seconds per closure — and a PR can change the cost profile; profiling
counts heartbeats, so an unchanged tree re-partitions
deterministically); the base fallback reuses bench-main's pre-cut dir.
… job
A constant the kernel rejects is a correctness regression, not a
benchmark datum. End to end:
- zisk-host: every sharded loop (shard-plan execute + prove, whole-env
execute + leaf prove) bails on the FIRST failing shard via the shared
reject_failures helper — mirroring the OOM kill, which also cancels
the constant's remaining shards — instead of accumulating failures
across the full manifest. sp1 and the single-leaf paths already bailed
immediately.
- run.sh: a zkVM failure whose log carries the host's "kernel typecheck
produced" abort records the neutral `{"failed": true}` sentinel (an
::error:: annotation, not a silent drop); ooc does the same when
##check## reports failures. bench-typecheck marks Phase-1 check errors
failed and skips them in Phase 2. Failure warnings now print the log's
head AND tail (the host's abort lands at the end of a mid-manifest
log).
- bench.py compare: `failed` renders ❌ cells (outranking OOM) plus a
bold "FAILED TO TYPECHECK on the <side> side" note under the table;
bmf strips the sentinel so rejected rows never reach bencher.
- workflows: prove/zkvm/ooc jobs (bench-main) and the benchmark cell
(bench-pr) exit nonzero when the neutral JSON carries a failed row —
AFTER the clean rows upload / the table posts, so the red X lands on
the commit/PR without losing the report.
Drop this commit before merging. The zkVM build gate needs no branch trigger — it lives in ci.yml now and runs on pull_request.
07264b9 to
f1ff88a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After argumentcomputer/tracing-texray#3
Note
Update bencher testbed from aiur-typecheck-x64-32x to aiur-check-x64-32x before merge