ci: run tests on GPU server by JuArce · Pull Request #747 · yetanotherco/lambda_vm

JuArce · 2026-06-30T15:08:12Z

What

Adds a GPU test suite that runs in the merge queue and blocks the merge if it fails. It rents an RTX 5090 on Vast.ai, runs the CUDA tests there, and always destroys the box — reusing the rent → provision → run → always-destroy orchestration from benchmark-gpu.yml.

Why

CPU CI (pr_main.yaml) runs on GitHub ubuntu-latest runners, which have no GPU, so the CUDA backend is never exercised before merge. This suite runs the GPU-relevant tests on real hardware.

Test groups (run by `scripts/gpu_test.sh` via Makefile targets)

#	Target	Covers
1	`test-math-cuda`	GPU↔CPU kernel parity (NTT, LDE, barycentric, FRI, merkle, ext3, keccak, …)
2	`test-cuda-integration`	End-to-end GPU prove: every dispatch fires + the proof verifies
3	`test-cuda-fallback`	GPU dispatch error → CPU fallback still produces a verifying proof
4	`test-prover-cuda`	The `lambda-vm-prover`/`stark`/`crypto`/`ecsm` suite on the GPU path (CPU CI's sharded prover tests, run single-threaded with `--features cuda`)
5	`test-prover-comprehensive-cuda`	The comprehensive all-instructions prove (`test_prove_elfs_all_instructions_64_full`) on the GPU path

Groups 1–3 are GPU-only (no CPU equivalent); 4–5 mirror CPU CI's prover jobs but with the CUDA path enabled. All groups run even if one fails; the job fails if any group does.

How it works

Trigger: merge_group (one GPU rental per merge, not per push) + workflow_dispatch. A failure ejects the PR from the queue.
The box checks out the merge ref and runs scripts/gpu_test.sh, which:
- pins cudarc (CUDARC_PIN=cuda-12080) and verifies the GPU toolchain,
- builds the guest ELFs (compile-programs-asm + compile-programs-rust),
- runs the 5 groups, aggregating failures.
Offer selection: RTX 5090, ≥16 cores, ≥96 GB RAM, ≥64 GB disk, cuda_max_good>=13.1, driver ≥ 580, most-expensive ≤ $1/hr, with a retry loop for transient scarcity.
Teardown always destroys the instance (--yes, 3× retry, plus a label-based sweep so a cancelled-mid-rent box can't leak).
Reporting: full stdout+stderr in the step log; the job summary lists failed tests grouped by suite and the panic/assertion message for each.

CUDA version requirement

An NVIDIA driver supporting CUDA ≥ 13.1 is required: the kernels are compiled with the toolkit's nvcc (CUDA 13.1) into PTX that the driver JIT-compiles at load — an older driver rejects it with CUDA_ERROR_UNSUPPORTED_PTX_VERSION. Enforced by the cuda_max_good>=13.1 offer filter and documented in the README "GPU Tests" section + crypto/math-cuda/build.rs.

Files

.github/workflows/gpu-tests.yml — new workflow (job gpu-tests).
scripts/gpu_test.sh — new on-box runner (cudarc pin + compile + 5 groups, aggregate exit).
Makefile — adds test-cuda-fallback, test-prover-cuda, test-prover-comprehensive-cuda (groups 1 & 2 targets already existed).
README.md — GPU Tests section + targets table.
crypto/math-cuda/build.rs — comment clarifying the PTX/driver version coupling.

Before this gates merges (manual)

Add gpu-tests to main's required status checks (Settings → Branches/Rulesets) — same pattern as the merge-queue-only test-prover-comprehensive.
Repo secrets VAST_API_KEY and VAST_TEMPLATE_HASH are already set.

Status

All 5 groups pass end-to-end on a rented RTX 5090. One real test-harness bug was fixed along the way: test-cuda-fallback ran its two tests in parallel, racing on the process-global GPU dispatch counters (assert_eq! count mismatches) — fixed by adding --test-threads=1 to that Makefile target.

Notes

Tradeoff: as a merge gate, a persistent Vast outage (no offer after retries) would block merges until it clears; the retry loop + generous waits mitigate it.

github-actions · 2026-06-30T19:19:11Z

Codex Code Review

No findings.

I reviewed the PR diff only: new GPU merge-queue workflow, GPU test script, Make targets, README additions, and the math-cuda build comment. I did not identify concrete safety/security, VM semantics, GPU resource, correctness, or significant performance issues in the changed code.

Verification was static only per instructions; I did not build or run tests.

claude · 2026-06-30T19:19:44Z

+#   4. prover/stark/crypto/ecsm suite   (make test-prover-cuda) — CPU CI's prover tests on GPU
+#   5. comprehensive all-instructions   (make test-prover-comprehensive-cuda)
+#
+# Runs on the rented Vast box from the gpu-tests.yml merge-queue workflow. All three groups


Stale comment: this says "All three groups run even if one fails" but the script now runs five groups (1–5, listed just above). Update the count to avoid confusion.

claude · 2026-06-30T19:19:47Z

+# crypto/math-cuda/Cargo.toml uses `cuda-version-from-build-system` + `fallback-latest`;
+# when detection falls back to "latest", cudarc requests symbols some boxes' driver doesn't
+# export (e.g. cuDevSmResourceSplit / cuCtxGetDevice_v2) -> runtime panic. Pinning to a fixed
+# CUDA version (12.8, matching the cuda_max_good>=12.8 offer floor) avoids that.


This parenthetical ("matching the cuda_max_good>=12.8 offer floor") is stale relative to this PR. The new gpu-tests.yml uses an offer floor of cuda_max_good>=13.1, not 12.8. The 12.8 cudarc pin is still correct (it's the host-side driver-symbol floor, independent of the PTX/driver-13.1 requirement), but the comment now references a floor that doesn't exist in this workflow. Reword to reflect that the cudarc pin floor (12.8) and the offer's PTX-driven floor (13.1) are separate, so a future reader doesn't try to "reconcile" them.

claude · 2026-06-30T19:19:58Z

Review: GPU test suite in merge queue

This is an infra/CI PR (workflow + Makefile targets + scripts + docs). I verified the Makefile targets reference real features and test files (test-cuda-faults, cuda_fallback_tests, cuda_path_integration all exist), and the orchestration is well-built. Notable strengths: secrets kept out of the pip-install step, pinned Vast CLI commit, ephemeral per-instance SSH key, ref validation before remote bash -lc, pipefail so test failures aren't masked by tee, and a robust always-destroy teardown with a label-based leak sweep.

No safety, correctness, or performance issues found. Two minor readability nits (posted inline):

Low — stale comment: scripts/gpu_test.sh:12 says "All three groups run" but five groups now run.
Low — stale comment: scripts/gpu_test.sh:42 references a cuda_max_good>=12.8 offer floor; the new workflow's floor is 13.1. The 12.8 cudarc pin itself is correct — it's the host-side symbol floor and is independent of the PTX/driver-13.1 requirement — but the comment conflates the two.

Both are cosmetic; the workflow is sound. Note the PR's own caveat stands: as a required merge gate, a sustained Vast.ai outage (no offer after the retry loop) would block merges until it clears — an accepted tradeoff per the description.

MauroToscano · 2026-06-30T19:34:12Z

/bench-growth

ci: run tests on GPU server

d95e3c8

JuArce self-assigned this Jun 30, 2026

JuArce added 8 commits June 30, 2026 12:17

keep instance running for debug

70754b3

ci: require CUDA 13.1

0c3b895

ci: test prover on CUDA

9abe6f0

Merge branch 'main' into ci_run_tests_gpu

db3f7ce

ci: improve gpu failed tests summaries

cd83966

ci: test-threads=1

25904c9

remove temporary code

e862a66

Merge branch 'main' into ci_run_tests_gpu

1d08005

JuArce marked this pull request as ready for review June 30, 2026 19:17

JuArce added the ai-review Trigger the AI review label Jun 30, 2026

claude Bot reviewed Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: run tests on GPU server#747

ci: run tests on GPU server#747
JuArce wants to merge 9 commits into
mainfrom
ci_run_tests_gpu

JuArce commented Jun 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

claude Bot Jun 30, 2026

Uh oh!

claude Bot Jun 30, 2026

Uh oh!

claude Bot commented Jun 30, 2026

Uh oh!

MauroToscano commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JuArce commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Test groups (run by scripts/gpu_test.sh via Makefile targets)

How it works

CUDA version requirement

Files

Before this gates merges (manual)

Status

Notes

Uh oh!

github-actions Bot commented Jun 30, 2026

Codex Code Review

Uh oh!

claude Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Jun 30, 2026

Review: GPU test suite in merge queue

Uh oh!

MauroToscano commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JuArce commented Jun 30, 2026 •

edited

Loading

Test groups (run by `scripts/gpu_test.sh` via Makefile targets)