fix(v2): get benchmarks crates to compile by shuklaayush · Pull Request #2618 · openvm-org/openvm

shuklaayush · 2026-03-24T22:34:17Z

Summary

Removes all openvm-native-circuit, openvm-native-recursion, and openvm-native-compiler dependencies from the three benchmark crates, which no longer exist in the workspace.

Changes by crate

`benchmarks/utils`

Deleted src/generate-fixtures.rs — depended on openvm-native-circuit, openvm-native-recursion, openvm-sdk
Cargo.toml: Removed all optional native-circuit deps (openvm-circuit, openvm-continuations, openvm-native-circuit, openvm-native-recursion, openvm-sdk, openvm-stark-sdk, bitcode, serde). Removed generate-fixtures feature and its [[bin]] entry. Set default = [].
src/lib.rs unchanged — ELF building utilities remain intact.

`benchmarks/execute`

Deleted src/execute-verifier.rs — the entire binary used NativeCpuBuilder and NATIVE_MAX_TRACE_HEIGHTS from native-circuit.
Deleted src/lib.rs — was a placeholder; the crate is bench-only.
Cargo.toml:
- Removed openvm-continuations and openvm-sdk deps (only used by verifier code).
- Removed [[bin]] for execute-verifier and the evm-prove feature.
- Expanded tco and aot features to propagate to all extension circuit crates directly (previously went through openvm-sdk).
- Added openvm-circuit with test-utils to dev-dependencies (for SystemParams::new_for_testing).
benches/execute.rs:
- Removed all 6 verifier benchmarks (benchmark_leaf_verifier_*, benchmark_internal_verifier_*) and their setup functions (setup_leaf_verifier, setup_internal_verifier).
- Removed #[cfg(feature = "aot")] branches and create_aot_instance/create_metered_aot_instance functions — kept only the interpreter-based execution paths.
- Updated engine/params API: BabyBearPoseidon2Engine → BabyBearPoseidon2CpuEngine, FriParameters::standard_fast() → SystemParams::new_for_testing(21).
- Removed unused imports: fs, Arc, ContinuationVmProof, Proof, NativeCpuBuilder, etc.
- App execution benchmarks (benchmark_execute, benchmark_execute_metered, benchmark_execute_metered_cost) remain unchanged.

`benchmarks/prove`

Major restructuring to match the v2-proof-system reference pattern.

src/lib.rs rewritten (replaces src/util.rs):
- BenchmarkCli with --max-segment-length and --app-only flags.
- BenchmarkCli::run() dispatches to run_default_benchmark (full aggregation) or run_default_app_benchmark (app-only).
- BenchmarkCli::apply_config() helper for shared segmentation logic.
- run_benchmark() — full aggregation with Sdk::prove(), proof size metrics (total + zstd-compressed), and verification via verify_vm_stark_proof_decoded.
- run_app_benchmark() — app-only proof with verify_app_proof.
Deleted src/util.rs — consolidated into lib.rs.
Bins rewritten to use pre-compiled ELFs via include_bytes! instead of runtime build_bench_program():
- fibonacci.rs — loads ELF from guest/fibonacci/elf/, n=800k.
- pairing.rs — BN254 pairing check.
- ecrecover.rs — ECDSA secp256k1 recovery (5 signatures).
- regex.rs — email regex matching with regex_email.txt fixture.
- base64_json.rs, bincode.rs, rkyv.rs, revm_transfer.rs — serialization/EVM benchmarks.
New bins:
- keccak.rs — keccak256 iteration benchmark (4096 iterations), builder-constructed config.
- keccak_par.rs — parallel keccak proving with --concurrency flag, shared app_pk/app_vk across threads.
- kitchen_sink.rs — restored (previously deleted due to native-circuit deps), now loads pre-compiled ELF.
Deleted bins:
- fib_e2e.rs — replaced by fibonacci.rs (use --app-only for app-only mode).
- verify_fibair.rs — depended on openvm-native-circuit and openvm-native-recursion.
- async_regex.rs — AsyncAppProver no longer exists in the SDK.
Cargo.toml:
- Added openvm-sdk-config, openvm-verify-stark-host, zstd, p3-field.
- Removed tokio, derive_more, rand.
- Added openvm-stark-backend/parallel and openvm-stark-backend/jemalloc to parallel/jemalloc features.
- Removed async feature.

github-actions · 2026-03-25T19:09:43Z

Claude finished @shuklaayush's task in 1m 53s —— View job

Code Review: fix(v2): get benchmarks crates to compile

Gather context and read changed files
Review benchmark utility changes
Review execute benchmark changes
Review prove benchmark changes (lib.rs, binaries, Cargo.toml)
Review CI/workflow changes
Post final review

Great cleanup — net ~1000 lines removed. Removes all stale openvm-native-circuit/openvm-native-recursion dependencies, consolidates shared proving logic into a clean BenchmarkCli abstraction, and re-enables the three benchmark crates. Benchmarks are running successfully on CI. The binary pattern (parse args → load ELF via include_bytes! → args.run()) is consistent and clean.

Most prior review items have been addressed (unused openvm-continuations removed, test-utils dropped, kzg-params-dir added to BenchmarkCli, pull_request trigger uncommented, apply_config extracted). Two remaining items:

Issues

1. #[command(allow_external_subcommands = true)] on BenchmarkCli is misleading — benchmarks/prove/src/lib.rs:24

This clap attribute allows unknown positional subcommands but does not silently ignore unknown --flags. Now that BenchmarkCli has all necessary fields (kzg_params_dir, profiling, app_only, max_segment_length), this attribute serves no useful purpose. If a typo is passed (e.g., --app-onyl), it could be silently swallowed as a subcommand positional arg rather than producing a helpful error. Consider removing it. Fix this →

2. kzg_params_dir field is accepted but never used — benchmarks/prove/src/lib.rs:38-39

BenchmarkCli accepts --kzg-params-dir (fixing the earlier bench.py mismatch), but neither run(), run_benchmark(), nor run_app_benchmark() reads this field. The commented-out halo2_outer_k/halo2_wrapper_k fields (lines 40-44) suggest this is an intentional placeholder for future e2e halo2 proving. Adding a // TODO: wire up when evm-verify is ported to v2 comment on the field itself would clarify this isn't an oversight, matching the existing TODO comments in benchmark-call.yml:141-152.

Minor Nits

3. DEFAULT_LOG_STACKED_HEIGHT relationship to DEFAULT_MAX_SEGMENT — benchmarks/prove/src/lib.rs:20-21

pub const DEFAULT_MAX_SEGMENT: u32 = 1 << 20;     // 2^20
pub const DEFAULT_LOG_STACKED_HEIGHT: usize = 21;  // 2^21

The factor-of-2 gap between these related constants isn't explained. A brief comment (e.g., "stacked height is one log-step above max segment height to account for padding") would help future readers avoid accidentally changing one without the other.

4. TranspilerConfig import in keccak_par.rs is correct — benchmarks/prove/src/bin/keccak_par.rs:7

Confirmed: .transpiler() is called on line 43, so the TranspilerConfig trait import is needed. Same for lib.rs:8. No issue here.

Looks Good

BenchmarkCli with apply_config() + run() is well-structured — keccak_par reuses apply_config() for custom parallel logic
next_power_of_two() is uniformly applied in apply_config() for both default and user-provided values
keccak_par.rs per-thread Sdk::new is well-documented (line 59 comment about OnceLock and Clone/Sync)
Removing openvm-sdk/openvm-continuations deps from benchmarks/execute is correct
Feature flag expansion for tco/aot in execute's Cargo.toml properly propagates to all extension crates
Adding parallel/jemalloc features to openvm-stark-backend in prove's features is the right fix
All include_bytes! binaries follow a consistent, readable pattern
benchmarks/utils/Cargo.toml properly slimmed after removing generate-fixtures
CI workflow and benchmark-config.json correctly updated to match new binary names
pull_request trigger in benchmarks.yml is properly uncommented (commit 1dea025)
bench.py and BenchmarkCli CLI args are now in sync

stephenh-axiom-xyz

LGTM with some nits

github-actions · 2026-03-25T19:33:06Z

Claude finished @shuklaayush's task in 1m 31s —— View job

Read changed files and gather context
Review benchmarks/prove/src/lib.rs (major rewrite)
Review benchmarks/prove binaries
Review benchmarks/execute changes
Review benchmarks/utils and CI changes
Post final review

Code Review: fix(v2): get benchmarks crates to compile

Clean, well-structured PR. Net ~1000 lines removed, stale native-circuit dependencies properly excised, and the new BenchmarkCli abstraction is a good pattern. Benchmarks are passing on CI. A few observations beyond what the previous review covered:

Issues

1. #[command(allow_external_subcommands = true)] should be removed — benchmarks/prove/src/lib.rs:24

Agree with the previous review. Now that BenchmarkCli has all needed fields, this attribute only masks typos (e.g., --app-onyl silently becomes a positional subcommand arg). Fix this →

2. execute.rs still has #[cfg(feature = "aot")] code but the PR description says AOT paths were removed — benchmarks/execute/benches/execute.rs:1-2,78-84,286-300,322-368,372-386

The PR description says "Removed #[cfg(feature = "aot")] branches and create_aot_instance/create_metered_aot_instance functions — kept only the interpreter-based execution paths." But the actual code still has these #[cfg(feature = "aot")] blocks, create_aot_instance, and create_metered_aot_instance. Either the description is stale or this was unintentionally left in. Since the aot feature is still defined in execute/Cargo.toml, this seems intentional — just the description is misleading.

3. kzg_params_dir accepted but unused — benchmarks/prove/src/lib.rs:37-39

Field is parsed but never read by run(), run_benchmark(), or run_app_benchmark(). The commented-out halo2_outer_k/halo2_wrapper_k fields suggest this is a placeholder. A // TODO[INT-7054] comment (matching the workflow TODOs) on the field would clarify intent.

Nits

4. DEFAULT_MAX_SEGMENT vs DEFAULT_LOG_STACKED_HEIGHT relationship — benchmarks/prove/src/lib.rs:20-21

pub const DEFAULT_MAX_SEGMENT: u32 = 1 << 20;     // 2^20
pub const DEFAULT_LOG_STACKED_HEIGHT: usize = 21;  // log2(2^21)

The factor-of-2 gap is not explained. A comment like // stacked height = log2(max_segment) + 1 to account for trace padding would prevent these from being changed independently.

5. Cargo.toml feature propagation looks clean — benchmarks/prove/Cargo.toml

Re: @stephenh-axiom-xyz's question about openvm-stark-backend — the direct dep is needed because lib.rs:9 imports SystemParams from it. The parallel/jemalloc features are propagated through openvm-sdk already, so the separate openvm-stark-backend feature entries in parallel/jemalloc features are redundant (but harmless). Worth confirming if openvm-sdk already enables openvm-stark-backend/parallel and openvm-stark-backend/jemalloc transitively.

6. Binary pattern is clean and consistent

All prove binaries follow the same pattern: parse args → load config from toml/builder → decode ELF via include_bytes → args.run(). keccak_par.rs appropriately breaks out of this for its custom parallel logic while reusing apply_config(). Well done.

Looks Good

BenchmarkCli::apply_config() with next_power_of_two() is correct
run_benchmark() verification via verify_vm_stark_proof_decoded and proof size metrics behind #[cfg(feature = "metrics")]
run_app_benchmark() with verify_app_proof for app-only mode
keccak_par.rs per-thread Sdk::new with shared app_pk/app_vk is well-documented
Deletion of generate-fixtures.rs, execute-verifier.rs, verify_fibair.rs, fib_e2e.rs, async_regex.rs — all properly removed since their deps no longer exist
benchmarks/utils/Cargo.toml correctly slimmed
CI workflows and benchmark-config.json properly updated
bench.py and CLI args are in sync

github-actions · 2026-03-25T19:38:49Z

group	app.proof_time_ms	app.cycles	leaf.proof_time_ms
fibonacci	4,150	12,000,265	1,352
keccak	19,188	1,235,218	3,378
regex	1,594	4,136,694	519
ecrecover	650	122,348	334
pairing	1,063	1,745,757	345
kitchen_sink	3,303	154,763	723

Note: cells_used metrics omitted because CUDA tracegen does not expose unpadded trace heights.

Commit: 12cffe8

Benchmark Workflow