Skip to content

Optimize statevector simulator preprocessing#316

Open
Kenny-Heitritter wants to merge 19 commits into
mainfrom
kenny/evolved-sv-preprocess-no-cache
Open

Optimize statevector simulator preprocessing#316
Kenny-Heitritter wants to merge 19 commits into
mainfrom
kenny/evolved-sv-preprocess-no-cache

Conversation

@Kenny-Heitritter
Copy link
Copy Markdown
Member

@Kenny-Heitritter Kenny-Heitritter commented May 26, 2026

Summary

This PR targets main from the current sv branch work plus one additional commit that ports the honest, non-cache AlphaEvolve Run 06 statevector improvements.

It deliberately does not include the evolved _PROGRAM_INSTRUCTION_CACHE / repeated-QASM memoization path, since that overfit the AlphaEvolve repeated-timing harness. The simulator still preprocesses each input on every Simulator.run(...) call.

Included from sv

  • Statevector simulator package surface
  • Cython sv_sim kernels
  • simulator benchmarks and graph assets
  • statevector tests/perf tests

Additional evolved changes

  • Optional numba-jitted helpers for small rotation/diagonal utilities
  • Module-scope matrix/phase constants
  • Hybrid pending-gate representation that keeps diagonal gates as phase pairs
  • Direct diagonal-kernel flush for diagonal fused gates
  • Preallocated packed instruction arrays instead of Python list growth
  • Conservative instruction capacity bound for swap expansion + pending flushes
  • SWAP decomposition through the optimized controlled-X path
  • Preserve diagonal pending gates across CZ/CRZ when they commute
  • simulation = ["numba>=0.59"] optional extra; fallback shim keeps numba optional

Benchmark artifacts updated

The existing default benchmark plots were regenerated from this PR branch with the improved simulator:

  • benchmarks/bench_random.png
  • benchmarks/bench_qft.png

The benchmarks folder also includes explicit no-cache before/after comparison artifacts:

  • benchmarks/bench_evolved_no_cache.json
  • benchmarks/bench_random_evolved_no_cache.png
  • benchmarks/bench_qft_evolved_no_cache.png

The comparison artifacts use pre-unrolled QasmModule inputs and intentionally do not include the repeated-QASM cache path.

Validation

Performed locally against the installed sv branch kernels from the AlphaEvolve workspace:

  • random circuit correctness vs current PyQASM statevector simulator
  • QFT circuit correctness vs current PyQASM statevector simulator
  • explicit CRZ correctness check
  • raw QASM string path and QasmModule path
  • shots > 0 smoke test
  • 50 all-active random fuzz circuits at atol=1e-10

Regenerated default benchmark plots with:

NUMBA_DISABLE_CACHE=1 PYQASM_NUM_THREADS=1 .venv-bench/bin/python benchmarks/plot_benchmarks.py

No-cache comparison artifacts (PYQASM_NUM_THREADS=1, median of 5):

  • random configs geomean: ~1.15x
  • QFT configs geomean: ~1.31x
  • all benchmarked circuits: correct at atol=1e-10

AlphaEvolve full overlay results from the experiment showed the graph-compatible non-cache path at ~1.23x random and ~1.36x QFT geomean; this PR's committed benchmark artifacts were regenerated from the PR branch implementation.

ryanhill1 and others added 12 commits January 16, 2025 22:21
…g, OpenMP support

Key changes:
- Remove unroll() from Simulator.run() for QasmModule input (caller responsibility)
- Single-pass _preprocess() with inline gate fusion (replaces separate _fuse_gates pass)
- Native gate handling in _preprocess() without requiring full AST unrolling
- AST expression evaluator for raw QASM parameter expressions (pi, tau, arithmetic)
- Cache unroll() results in QasmModule (skip if already unrolled)
- New Cython apply_circuit() entry point with 5 specialized kernels
- Optional OpenMP parallelism via PYQASM_NUM_THREADS env var (default=1/serial)
- OpenMP build detection in setup.py (macOS Homebrew libomp + Linux)
- Benchmark suite comparing PyQASM vs Qiskit Aer, Cirq, PennyLane Lightning
- Performance regression tests with pytest benchmark marker
- Profiling script for detailed per-phase timing breakdown

Performance (serial, median of 5):
- Random circuits: PyQASM fastest at all sizes (4-22q), beating Qiskit/Cirq/Lightning
- QFT circuits: PyQASM fastest at ≤12q; kernel-bound at ≥16q (memory bandwidth limited)
- With PYQASM_NUM_THREADS=8: 4.1x speedup at 22q, 2.7x at 20q

Known limitation: macOS libomp conflicts with Qiskit Aer's OpenMP in same process,
so parallelism is opt-in. Future work: SIMD/AVX kernels for single-threaded perf.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port the non-cache Run 06 AlphaEvolve statevector improvements into the
sv branch implementation while deliberately excluding the QASM-string
instruction memoization that overfit the AE repeated-timing harness.

The simulator still preprocesses every input on each run call. The speedup
comes from reducing Python overhead in that preprocessing path:

- add optional numba-jitted helpers for small rotation/diagonal utilities
- hoist matrix constants and phase constants to module scope
- keep pending diagonal one-qubit gates as phase pairs during fusion
- flush diagonal gates directly to the diagonal Cython kernel
- preallocate packed instruction arrays instead of growing Python lists
- decompose SWAP through the optimized controlled-X kernel path
- preserve diagonal pending gates across CZ/CRZ because they commute
- keep the conservative capacity bound needed for swap expansion and
  pending-gate flushes

This is the honest non-cache improvement: no _PROGRAM_INSTRUCTION_CACHE,
no cross-call memoization, and no reliance on repeated identical QASM
strings. Add a simulation extra for installing numba; the code falls back
to pure Python helpers when numba is absent.
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hey there! It looks like the changelog might need an update.

Please take a moment to edit the CHANGELOG.md with:

  • A brief, one-to-two sentence summary of your changes.
  • A link back to this PR for reference.
  • (Optional) A small working example if you've added new features.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Warning

Review limit reached

@Kenny-Heitritter, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 14 minutes and 38 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 105df82c-add1-4407-bbdb-25ea1e8629a5

📥 Commits

Reviewing files that changed from the base of the PR and between b7bbe46 and 1d56076.

⛔ Files ignored due to path filters (4)
  • benchmarks/bench_qft.png is excluded by !**/*.png
  • benchmarks/bench_qft_evolved.png is excluded by !**/*.png
  • benchmarks/bench_random.png is excluded by !**/*.png
  • benchmarks/bench_random_evolved.png is excluded by !**/*.png
📒 Files selected for processing (10)
  • .github/workflows/main.yml
  • .github/workflows/pre-release.yml
  • .github/workflows/release.yml
  • .github/workflows/test-release.yml
  • benchmarks/bench_evolved.json
  • pyproject.toml
  • src/pyqasm/accelerate/sv_sim.pyx
  • src/pyqasm/simulator/__init__.py
  • src/pyqasm/simulator/statevector.py
  • tests/test_perf_regression.py

Walkthrough

This PR introduces a complete statevector simulator for PyQASM with Cython/OpenMP-accelerated gate kernels, a high-level NumPy-based driver, correctness tests against Qiskit Aer, performance regression tests, and comprehensive benchmarking/profiling tools for cross-simulator comparison.

Changes

Statevector Simulator

Layer / File(s) Summary
Build configuration and extension setup
setup.py, pyproject.toml, .gitignore
Configures Cython extension building with OpenMP detection; defines a simulation optional dependency; adds gitignore for generated C files.
Accelerated statevector kernels
src/pyqasm/accelerate/sv_sim.pyx
Low-level Cython/OpenMP gate application routines for single-qubit, controlled, diagonal phase, and two-qubit gates with optional parallel execution via prange and batch circuit dispatcher.
Python simulator layer with preprocessing
src/pyqasm/simulator/statevector.py, src/pyqasm/modules/base.py
Gate matrix definitions, AST expression evaluation, circuit preprocessing with single-qubit gate fusion and SWAP decomposition, SimulatorResult dataclass, and Simulator class. Adds unroll caching optimization.
Simulator package exports
src/pyqasm/simulator/__init__.py
Public re-export of Simulator and SimulatorResult from statevector module.
Correctness validation tests
tests/test_sv_sim.py
Parameterized tests comparing PyQASM statevector results against Qiskit Aer with global phase normalization and numerical tolerance handling.
Performance regression tests
tests/test_perf_regression.py
Benchmarks measuring median runtime for random and QFT circuits at various qubit/depth configurations; asserts performance stays within budgets and scaling is reasonable.
Benchmarking and profiling tools
benchmarks/bench_simulator.py, benchmarks/plot_benchmarks.py, benchmarks/profile_simulator.py
Cross-simulator comparison suite (PyQASM vs Qiskit Aer, Cirq, PennyLane Lightning) with endian-aware correctness checks and matplotlib visualization. Profiling tools support cProfile, wall-clock section timing, and per-opcode kernel timing via CLI.

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title "Optimize statevector simulator preprocessing" directly reflects the main objective stated in the PR description: porting preprocessing improvements into the statevector simulator implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kenny/evolved-sv-preprocess-no-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Kenny-Heitritter Kenny-Heitritter marked this pull request as draft May 26, 2026 14:44
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/bench_simulator.py`:
- Around line 344-348: The equality checks in the checks dict compare raw
statevectors (sv_pyqasm vs sv_qiskit, sv_cirq_le, sv_pl_le) and will fail when
candidates differ only by a global phase; normalize each candidate to
sv_pyqasm's global phase before calling np.allclose. For each candidate vector
(sv_qiskit, sv_cirq_le, sv_pl_le) compute a phase factor against sv_pyqasm
(e.g., using the normalized inner product np.vdot(sv_pyqasm,
sv_candidate)/abs(...), or by matching a nonzero component) and multiply/divide
the candidate by the conjugate of that phase so its global phase matches
sv_pyqasm, then use np.allclose on the aligned vectors when constructing the
checks dict.

In `@setup.py`:
- Around line 13-14: BASE_COMPILE_ARGS and BASE_LINK_ARGS are currently
hardcoded for GCC/Clang and include host-specific ISA flags; update setup.py to
make flags toolchain-aware by removing "-march=native" (and likely
"-ffast-math") from unconditional defaults and set compile/link flags based on
detected compiler type: inspect the build compiler via the distutils/setuptools
compiler instance or compiler.compiler_type and, for 'msvc', use
MSVC-appropriate flags (e.g. '/O2' and '/openmp' when OpenMP is available) and
for 'unix' use '-O3' and '-fopenmp' only when detected; change _check_openmp()
to probe OpenMP support using the project compiler (invoke
compiler.compile/compiler.link methods or use a temporary Extension building
attempt) rather than hardcoding cmd = ["cc", ...], return correct compile and
link flags to apply to extra_compile_args and extra_link_args for each Extension
(refer to BASE_COMPILE_ARGS, BASE_LINK_ARGS, extra_compile_args, and
_check_openmp in the diff).

In `@src/pyqasm/accelerate/sv_sim.pyx`:
- Around line 351-369: In apply_circuit, avoid dereferencing &array[0] on
possibly-empty Cython buffers; first check the lengths (e.g., ensure sv,
gate_params, diag_phases, two_qubit_gates have >0 elements or that
n_instructions>0 where appropriate) and only set sv_ptr, gp_ptr, dp_ptr, tq_ptr
to &array[0] when the buffer is non-empty, otherwise set the pointer to NULL (or
skip pointer usage) to prevent invalid memory access; update uses of
sv_ptr/gp_ptr/dp_ptr/tq_ptr later in the function to handle NULL/empty cases
safely.
- Around line 22-23: sv_sim.pyx currently cimports omp_get_max_threads and
omp_set_num_threads and calls omp_set_num_threads at module import time which
forces an OpenMP dependency even when setup.py disables it; modify the file to
guard all OpenMP cimports and calls behind a compile-time macro (e.g. `#IFDEF`
HAVE_OPENMP) driven by the same check used in setup.py so the Cython module
builds without OpenMP: wrap the cimports omp_get_max_threads/omp_set_num_threads
and any omp_* calls in conditional compilation blocks and provide safe no-op
fallbacks when the macro is not defined; additionally harden apply_circuit by
adding precondition checks (or early returns) for empty/mismatched memoryviews
(the function apply_circuit and its caller _preprocess/statevector code paths)
so it never takes &sv[0], &gate_params[0], &diag_phases[0], or
&two_qubit_gates[0] when the buffers are empty.

In `@tests/test_sv_sim.py`:
- Around line 309-314: The current comparison in tests/test_sv_sim.py only
normalizes global phase when the string "global phase" appears in description,
which is brittle; change the logic in the block that references description,
sv_expected, sv_actual, idx, and phase so that the global-phase normalization is
applied unconditionally (or replace it with a phase-invariant comparator helper)
by removing the conditional check on description and always computing idx =
np.argmax(np.abs(sv_expected) > 1e-10), phase = sv_actual[idx] /
sv_expected[idx], and sv_actual = sv_actual / phase before comparing vectors.
- Around line 44-45: Change the pytest.mark.parametrize argnames from a
comma-separated string to an explicit tuple so it satisfies PT006; locate the
pytest.mark.parametrize call in tests/test_sv_sim.py that currently uses "qasm,
description" and replace the argnames with a tuple form like ("qasm",
"description") (or equivalent single-quoted tuple) while keeping the existing
parameter values and test name intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a06ded4b-859c-4557-9d06-5128c623731b

📥 Commits

Reviewing files that changed from the base of the PR and between abefaf3 and b7bbe46.

⛔ Files ignored due to path filters (2)
  • benchmarks/bench_qft.png is excluded by !**/*.png
  • benchmarks/bench_random.png is excluded by !**/*.png
📒 Files selected for processing (12)
  • .gitignore
  • benchmarks/bench_simulator.py
  • benchmarks/plot_benchmarks.py
  • benchmarks/profile_simulator.py
  • pyproject.toml
  • setup.py
  • src/pyqasm/accelerate/sv_sim.pyx
  • src/pyqasm/modules/base.py
  • src/pyqasm/simulator/__init__.py
  • src/pyqasm/simulator/statevector.py
  • tests/test_perf_regression.py
  • tests/test_sv_sim.py

Comment on lines +344 to +348
checks = {
"qiskit": np.allclose(sv_pyqasm, sv_qiskit, atol=1e-10),
"cirq": np.allclose(sv_pyqasm, sv_cirq_le, atol=1e-10),
"lightning": np.allclose(sv_pyqasm, sv_pl_le, atol=1e-10),
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize global phase before statevector equality checks.

Line 345–Line 347 compare raw vectors directly; equivalent states with different global phase will be reported as FAIL. Align each candidate vector to the PyQASM reference phase before np.allclose.

🔧 Proposed fix
+def _align_global_phase(reference, candidate):
+    """Phase-align candidate statevector to reference."""
+    idx = int(np.argmax(np.abs(reference)))
+    if np.abs(reference[idx]) == 0 or np.abs(candidate[idx]) == 0:
+        return candidate
+    phase = np.angle(candidate[idx]) - np.angle(reference[idx])
+    return candidate * np.exp(-1j * phase)
+
@@
-        checks = {
-            "qiskit": np.allclose(sv_pyqasm, sv_qiskit, atol=1e-10),
-            "cirq": np.allclose(sv_pyqasm, sv_cirq_le, atol=1e-10),
-            "lightning": np.allclose(sv_pyqasm, sv_pl_le, atol=1e-10),
-        }
+        sv_qiskit_aligned = _align_global_phase(sv_pyqasm, sv_qiskit)
+        sv_cirq_aligned = _align_global_phase(sv_pyqasm, sv_cirq_le)
+        sv_pl_aligned = _align_global_phase(sv_pyqasm, sv_pl_le)
+        checks = {
+            "qiskit": np.allclose(sv_pyqasm, sv_qiskit_aligned, atol=1e-10),
+            "cirq": np.allclose(sv_pyqasm, sv_cirq_aligned, atol=1e-10),
+            "lightning": np.allclose(sv_pyqasm, sv_pl_aligned, atol=1e-10),
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/bench_simulator.py` around lines 344 - 348, The equality checks in
the checks dict compare raw statevectors (sv_pyqasm vs sv_qiskit, sv_cirq_le,
sv_pl_le) and will fail when candidates differ only by a global phase; normalize
each candidate to sv_pyqasm's global phase before calling np.allclose. For each
candidate vector (sv_qiskit, sv_cirq_le, sv_pl_le) compute a phase factor
against sv_pyqasm (e.g., using the normalized inner product np.vdot(sv_pyqasm,
sv_candidate)/abs(...), or by matching a nonzero component) and multiply/divide
the candidate by the conjugate of that phase so its global phase matches
sv_pyqasm, then use np.allclose on the aligned vectors when constructing the
checks dict.

Comment thread setup.py
Comment on lines +13 to +14
BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"]
BASE_LINK_ARGS = []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify declared OS support and compiler assumptions in build script.
rg -n 'Operating System :: Microsoft :: Windows|Operating System :: MacOS|Operating System :: POSIX' pyproject.toml
rg -n 'BASE_COMPILE_ARGS|cc"|fopenmp|march=native|ffast-math' setup.py

Repository: qBraid/pyqasm

Length of output: 732


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Show relevant parts of setup.py around BASE_COMPILE_ARGS and the compile invocation
sed -n '1,140p' setup.py | nl -ba | sed -n '1,140p'

# Also inspect further if any later use of march/native or cc exists
rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.py

Repository: qBraid/pyqasm

Length of output: 100


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect surrounding code in setup.py where BASE_COMPILE_ARGS and the compiler invocation are used
sed -n '1,160p' setup.py | nl -ba

# Locate all occurrences related to the flagged portability concerns
rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.py

Repository: qBraid/pyqasm

Length of output: 100


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show relevant parts with line numbers (avoid nl)
sed -n '1,200p' setup.py | awk '{printf "%4d:%s\n", NR, $0}'

echo "---- occurrences ----"
rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.py

Repository: qBraid/pyqasm

Length of output: 3836


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Print relevant sections of pyproject.toml (classifiers + build-system)
rg -n '^\[project\]|\[build-system\]|Operating System ::|name\s*=|requires-python' pyproject.toml
echo "---- pyproject.toml excerpt ----"
sed -n '1,220p' pyproject.toml

Repository: qBraid/pyqasm

Length of output: 3420


Make extension compile flags toolchain-aware and remove host-specific ISA defaults.

  • setup.py unconditionally uses GCC/Clang-specific flags: BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"], and applies them to both Cython extensions via extra_compile_args (may fail on MSVC and makes wheels host-ISA dependent).
  • setup.py OpenMP detection hardcodes cmd = ["cc"] ... -fopenmp in _check_openmp(); it’s not MSVC-aware, so Windows toolchains won’t reliably detect/link OpenMP.
Proposed direction
-BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"]
+BASE_COMPILE_ARGS = ["-O3"]
+# Consider opt-in env toggle for local native tuning:
+# if os.environ.get("PYQASM_NATIVE_OPT", "0") == "1": add "-march=native" (non-wheel/local builds only)

-                cmd = ["cc"] + cflags + ldflags + [f.name, "-o", f.name + ".out"]
+                # Resolve compiler via sysconfig for active toolchain instead of hardcoded 'cc'
+                # and branch flags by compiler family (MSVC vs GCC/Clang).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@setup.py` around lines 13 - 14, BASE_COMPILE_ARGS and BASE_LINK_ARGS are
currently hardcoded for GCC/Clang and include host-specific ISA flags; update
setup.py to make flags toolchain-aware by removing "-march=native" (and likely
"-ffast-math") from unconditional defaults and set compile/link flags based on
detected compiler type: inspect the build compiler via the distutils/setuptools
compiler instance or compiler.compiler_type and, for 'msvc', use
MSVC-appropriate flags (e.g. '/O2' and '/openmp' when OpenMP is available) and
for 'unix' use '-O3' and '-fopenmp' only when detected; change _check_openmp()
to probe OpenMP support using the project compiler (invoke
compiler.compile/compiler.link methods or use a temporary Extension building
attempt) rather than hardcoding cmd = ["cc", ...], return correct compile and
link flags to apply to extra_compile_args and extra_link_args for each Extension
(refer to BASE_COMPILE_ARGS, BASE_LINK_ARGS, extra_compile_args, and
_check_openmp in the diff).

Comment on lines +22 to +23
from openmp cimport omp_get_max_threads

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify mismatch between setup fallback and unconditional OpenMP usage.
rg -n 'openmp cimport|omp_get_max_threads|omp_set_num_threads' src/pyqasm/accelerate/sv_sim.pyx
rg -n '_check_openmp|return \[\], \[\]|extra_compile_args=sv_sim_compile_args|extra_link_args=sv_sim_link_args' setup.py

Repository: qBraid/pyqasm

Length of output: 491


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== sv_sim.pyx: OpenMP-related code (top + functions around _check_openmp) ==="
python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/accelerate/sv_sim.pyx")
lines = p.read_text().splitlines()
for i in range(1, 130):  # 1-based line numbers
    if i <= len(lines):
        print(f"{i:4d}:{lines[i-1]}")
PY

echo "=== sv_sim.pyx: locate apply_circuit and show around pointer usage ==="
rg -n "def apply_circuit|apply_circuit\(" -S src/pyqasm/accelerate/sv_sim.pyx
python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/accelerate/sv_sim.pyx")
text = p.read_text().splitlines()
# Find first occurrence near suspected lines 330-430
start, end = 330, 430
for i in range(start, end+1):
    if 1 <= i <= len(text):
        print(f"{i:4d}:{text[i-1]}")
PY

echo "=== sv_sim.pyx: search for boundscheck/wraparound directives ==="
rg -n "boundscheck|wraparound|initializedcheck|nonecheck|cdivision|language_level|cython:" src/pyqasm/accelerate/sv_sim.pyx

echo "=== setup.py: OpenMP probe and sv_sim compile/link args ==="
rg -n "_check_openmp|sv_sim_compile_args|sv_sim_link_args|extra_compile_args=sv_sim_compile_args|extra_link_args=sv_sim_link_args|omp_cflags|omp_ldflags" setup.py
python3 - <<'PY'
from pathlib import Path
p = Path("setup.py")
lines = p.read_text().splitlines()
# Print a focused region around the openmp probe and sv_sim Extension config if present
# Use a heuristic: print lines 1..220 and also 220..420 if file is longer.
for a,b in [(1,220),(221,420),(421,700)]:
    if a-1 < len(lines):
        b = min(b, len(lines))
        print(f"\n--- setup.py lines {a}-{b} ---")
        for i in range(a,b+1):
            print(f"{i:4d}:{lines[i-1]}")
PY

Repository: qBraid/pyqasm

Length of output: 12472


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Find call sites for sv_sim.apply_circuit ==="
rg -n "apply_circuit\s*\(" -S src | head -n 200

echo "=== Show likely wrapper/dispatcher that prepares arguments ==="
rg -n "n_instructions|opcodes|targets|controls|two_qubit_offsets|two_qubit_gates|gate_params|diag_phases" -S src/pyqasm | head -n 200

Repository: qBraid/pyqasm

Length of output: 8543


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/simulator/statevector.py")
lines = p.read_text().splitlines()
# Print around the apply_circuit call (based on rg line 579 from earlier output)
for i in range(520, 640):  # 1-based
    if 1 <= i <= len(lines):
        print(f"{i:4d}:{lines[i-1]}")
PY

echo "=== Also show where n_instructions is computed / passed (near tq_offsets variable) ==="
rg -n "n\s*=|n_instructions|tq_offsets|tq_gates|n, opcodes|opcodes," -n src/pyqasm/simulator/statevector.py | head -n 80

Repository: qBraid/pyqasm

Length of output: 3453


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/simulator/statevector.py")
lines = p.read_text().splitlines()
# Show around preprocess return logic
for i in range(440, 540):  # 1-based
    if 1 <= i <= len(lines):
        print(f"{i:4d}:{lines[i-1]}")
PY

echo "=== Also search for 'if n == 0' in _preprocess area ==="
rg -n "def _preprocess|if n == 0|instruction_idx\s*=" src/pyqasm/simulator/statevector.py

Repository: qBraid/pyqasm

Length of output: 4498


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/simulator/statevector.py")
lines = p.read_text().splitlines()
# Print the preprocess allocation + write logic region
for i in range(250, 560):  # 1-based
    if 1 <= i <= len(lines):
        print(f"{i:4d}:{lines[i-1]}")
PY

Repository: qBraid/pyqasm

Length of output: 13725


Fix OpenMP build contract: sv_sim.pyx unconditionally depends on OpenMP even when setup.py disables it

from openmp cimport omp_get_max_threads

Applies to src/pyqasm/accelerate/sv_sim.pyx (also 32-38): omp_set_num_threads(...) is also imported and called at module import time. Meanwhile, setup.py conditionally sets sv_sim_compile_args / sv_sim_link_args from _check_openmp(), and falls back to ([], []) when OpenMP isn’t available—so the extension can fail to compile/link on toolchains lacking OpenMP headers/libs. Fix by guarding the openmp cimports/usage behind a compile-time macro driven by _check_openmp().

  • apply_circuit initializes pointers via &sv[0], &gate_params[0], &diag_phases[0], &two_qubit_gates[0] with bounds checks disabled, but the current call path only invokes apply_circuit(...) when n > 0 (statevector.py), and _preprocess returns None buffers when n == 0. It also guarantees two_qubit_gates_trimmed is length ≥ 1 when there are no two-qubit ops. So the specific empty-memoryview UB doesn’t occur through this dispatcher, but apply_circuit remains unsafe if called with empty/mismatched buffers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pyqasm/accelerate/sv_sim.pyx` around lines 22 - 23, sv_sim.pyx currently
cimports omp_get_max_threads and omp_set_num_threads and calls
omp_set_num_threads at module import time which forces an OpenMP dependency even
when setup.py disables it; modify the file to guard all OpenMP cimports and
calls behind a compile-time macro (e.g. `#IFDEF` HAVE_OPENMP) driven by the same
check used in setup.py so the Cython module builds without OpenMP: wrap the
cimports omp_get_max_threads/omp_set_num_threads and any omp_* calls in
conditional compilation blocks and provide safe no-op fallbacks when the macro
is not defined; additionally harden apply_circuit by adding precondition checks
(or early returns) for empty/mismatched memoryviews (the function apply_circuit
and its caller _preprocess/statevector code paths) so it never takes &sv[0],
&gate_params[0], &diag_phases[0], or &two_qubit_gates[0] when the buffers are
empty.

Comment on lines +351 to +369
cpdef void apply_circuit(
double complex[::1] sv,
Py_ssize_t num_qubits,
int[::1] opcodes,
int[::1] targets,
int[::1] controls,
double complex[::1] gate_params,
double complex[::1] diag_phases,
int[::1] two_qubit_offsets,
double complex[::1] two_qubit_gates,
Py_ssize_t n_instructions,
) noexcept:
"""Execute an entire circuit. Each gate releases the GIL internally."""
cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset
cdef int op, tgt, ctrl
cdef double complex* sv_ptr = &sv[0]
cdef double complex* gp_ptr = &gate_params[0]
cdef double complex* dp_ptr = &diag_phases[0]
cdef double complex* tq_ptr = &two_qubit_gates[0]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard array sizes before taking raw pointers in apply_circuit.

Lines 366–369 dereference &... [0] before validating lengths; with empty buffers (including n_instructions == 0) this is unsafe and can crash due to disabled bounds checks.

Suggested fix
 cpdef void apply_circuit(
@@
 ) noexcept:
@@
-    cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset
+    cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset
     cdef int op, tgt, ctrl
-    cdef double complex* sv_ptr = &sv[0]
-    cdef double complex* gp_ptr = &gate_params[0]
-    cdef double complex* dp_ptr = &diag_phases[0]
-    cdef double complex* tq_ptr = &two_qubit_gates[0]
+    cdef double complex* sv_ptr
+    cdef double complex* gp_ptr
+    cdef double complex* dp_ptr
+    cdef double complex* tq_ptr
+
+    if n_instructions <= 0:
+        return
+    if opcodes.shape[0] < n_instructions or targets.shape[0] < n_instructions or controls.shape[0] < n_instructions:
+        raise ValueError("instruction arrays shorter than n_instructions")
+    if gate_params.shape[0] < n_instructions * 4 or diag_phases.shape[0] < n_instructions * 2:
+        raise ValueError("parameter arrays too short for n_instructions")
+    if two_qubit_offsets.shape[0] < n_instructions:
+        raise ValueError("two_qubit_offsets shorter than n_instructions")
+
+    sv_ptr = &sv[0]
+    gp_ptr = &gate_params[0]
+    dp_ptr = &diag_phases[0]
+    tq_ptr = &two_qubit_gates[0]

Also applies to: 371-405

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/pyqasm/accelerate/sv_sim.pyx` around lines 351 - 369, In apply_circuit,
avoid dereferencing &array[0] on possibly-empty Cython buffers; first check the
lengths (e.g., ensure sv, gate_params, diag_phases, two_qubit_gates have >0
elements or that n_instructions>0 where appropriate) and only set sv_ptr,
gp_ptr, dp_ptr, tq_ptr to &array[0] when the buffer is non-empty, otherwise set
the pointer to NULL (or skip pointer usage) to prevent invalid memory access;
update uses of sv_ptr/gp_ptr/dp_ptr/tq_ptr later in the function to handle
NULL/empty cases safely.

Comment on lines +610 to +612
# Cache: skip if already unrolled (no kwargs means default unroll)
if not kwargs and len(self._unrolled_ast.statements) > 0:
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalidate cached unroll when source statements changed.

This early return can skip required re-unrolls and reuse stale _unrolled_ast after mutating operations that update _statements/original_program (for example, methods that append/remove statements without resetting _unrolled_ast). That can produce incorrect downstream behavior.

Suggested fix
-        # Cache: skip if already unrolled (no kwargs means default unroll)
-        if not kwargs and len(self._unrolled_ast.statements) > 0:
+        # Cache: skip only if already unrolled and source statements did not change.
+        if (
+            not kwargs
+            and len(self._unrolled_ast.statements) > 0
+            and self._unrolled_ast.statements is self._statements
+        ):
             return
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Cache: skip if already unrolled (no kwargs means default unroll)
if not kwargs and len(self._unrolled_ast.statements) > 0:
return
# Cache: skip only if already unrolled and source statements did not change.
if (
not kwargs
and len(self._unrolled_ast.statements) > 0
and self._unrolled_ast.statements is self._statements
):
return

Comment thread tests/test_sv_sim.py
Comment on lines +44 to +45
"qasm, description",
[
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify current usage and confirm PT006-relevant callsite
rg -n -C2 '`@pytest`\.mark\.parametrize\(' tests/test_sv_sim.py

Repository: qBraid/pyqasm

Length of output: 131


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find ruff/flake8 config enabling PT006 (or general pytest style rules)
rg -n --hidden --no-ignore-vcs "PT006" -S . || true
rg -n --hidden --no-ignore-vcs "(pytest.*PT|PT006|ruff.*pytest|flake8.*pytest)" -S pyproject.toml ruff.toml setup.cfg tox.ini .flake8 . || true

# Confirm whether there are other parametrize callsites in this file
rg -n "pytest\.mark\.parametrize" tests/test_sv_sim.py

Repository: qBraid/pyqasm

Length of output: 340


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect ruff config around flake8-pytest-style and any PT006 selection/ignores
sed -n '1,120p' ruff.toml
echo "----"
rg -n "PT006|flake8-pytest-style|pytest-style" ruff.toml
echo "----"
# Show any pytest-style lint tables that might define enable/disable behavior
rg -n "flake8-pytest-style" -n ruff.toml && true

Repository: qBraid/pyqasm

Length of output: 2646


Switch pytest.mark.parametrize argnames to tuple form (PT006).

Ruff enables flake8-pytest-style via select = ["PT", ...] and does not ignore PT006, while tests/test_sv_sim.py currently uses a comma-separated string argnames ("qasm, description"), which trips the rule. Change to:

Suggested change
-@pytest.mark.parametrize(
-    "qasm, description",
+@pytest.mark.parametrize(
+    ("qasm", "description"),
🧰 Tools
🪛 Ruff (0.15.14)

[warning] 44-44: Wrong type passed to first argument of pytest.mark.parametrize; expected tuple

Use a tuple for the first argument

(PT006)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_sv_sim.py` around lines 44 - 45, Change the
pytest.mark.parametrize argnames from a comma-separated string to an explicit
tuple so it satisfies PT006; locate the pytest.mark.parametrize call in
tests/test_sv_sim.py that currently uses "qasm, description" and replace the
argnames with a tuple form like ("qasm", "description") (or equivalent
single-quoted tuple) while keeping the existing parameter values and test name
intact.

Comment thread tests/test_sv_sim.py
Comment on lines +309 to +314
if "global phase" in description:
# Compare up to global phase: find first nonzero element and normalize
idx = np.argmax(np.abs(sv_expected) > 1e-10)
phase = sv_actual[idx] / sv_expected[idx]
sv_actual = sv_actual / phase

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Global-phase normalization should not depend on description text.

Conditioning on "global phase" in description is brittle and can miss valid equivalent states in other cases. Normalize phase unconditionally (or use a phase-invariant comparator helper).

Suggested change
-    if "global phase" in description:
-        # Compare up to global phase: find first nonzero element and normalize
-        idx = np.argmax(np.abs(sv_expected) > 1e-10)
-        phase = sv_actual[idx] / sv_expected[idx]
-        sv_actual = sv_actual / phase
+    # Compare up to global phase for all cases
+    idx = np.argmax(np.abs(sv_expected) > 1e-10)
+    phase = sv_actual[idx] / sv_expected[idx]
+    sv_actual = sv_actual / phase
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_sv_sim.py` around lines 309 - 314, The current comparison in
tests/test_sv_sim.py only normalizes global phase when the string "global phase"
appears in description, which is brittle; change the logic in the block that
references description, sv_expected, sv_actual, idx, and phase so that the
global-phase normalization is applied unconditionally (or replace it with a
phase-invariant comparator helper) by removing the conditional check on
description and always computing idx = np.argmax(np.abs(sv_expected) > 1e-10),
phase = sv_actual[idx] / sv_expected[idx], and sv_actual = sv_actual / phase
before comparing vectors.

@Kenny-Heitritter Kenny-Heitritter marked this pull request as ready for review May 26, 2026 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants