[BUG] Fixing matmul to support leading dimensions > 1 by SwayamInSync · Pull Request #88 · numpy/numpy-quaddtype

SwayamInSync · 2026-05-12T00:13:24Z

closes #87
As per the title

Note: This also found a separate bug of GEMM dispatching not supporting fortran ordered arrays, the related tests are added here and marked as xfail will be discussed in a different issue

SwayamInSync · 2026-05-12T00:45:39Z

Interesting, this might be the race condition issue from the NumPy side on the lazy attribute loader?
In this PR the fix might be simple to pre-import the rec

click to expand

 ==================================== ERRORS ====================================
  _____________________ ERROR at call of test_pandas_strrep ______________________
  
      def test_pandas_strrep():
          """Test that we can construct a pandas data frame with quad precision columns
      
          Make sure the string representation can be generated
          """
          import pandas as pd
      
          BIG_NUMBER=123456789098765432123456789
          x = np.arange(500, dtype=np.float64) * BIG_NUMBER
          y = np.arange(500, dtype=QuadPrecDType()) * BIG_NUMBER
          df = pd.DataFrame({"col1": x, "col2": y})
  >       assert isinstance(str(df), str) # Make sure this doesn't fail
                            ^^^^^^^
  
  /Users/runner/work/numpy-quaddtype/numpy-quaddtype/tests/test_quaddtype.py:6041: 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/core/frame.py:1201: in __repr__
      return self.to_string(**repr_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/core/frame.py:1380: in to_string
      return fmt.DataFrameRenderer(formatter).to_string(
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/io/formats/format.py:973: in to_string
      string = string_formatter.to_string()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/io/formats/string.py:30: in to_string
      text = self._get_string_representation()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/io/formats/string.py:45: in _get_string_representation
      strcols = self._get_strcols()
                ^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/io/formats/string.py:36: in _get_strcols
      strcols = self.fmt.get_strcols()
                ^^^^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/io/formats/format.py:476: in get_strcols
      strcols = self._get_strcols_without_index()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/io/formats/format.py:729: in _get_strcols_without_index
      str_columns = self._get_formatted_column_labels(self.tr_frame)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/pandas/io/formats/format.py:788: in _get_formatted_column_labels
      fmt_columns = columns._format_flat(include_name=False)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: in __getattr__
      import numpy.rec as rec
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
  
  attr = 'rec'
  
      def __getattr__(attr):
          # Warn for expired attributes
          import warnings
      
          if attr == "linalg":
              import numpy.linalg as linalg
              return linalg
          elif attr == "fft":
              import numpy.fft as fft
              return fft
          elif attr == "dtypes":
              import numpy.dtypes as dtypes
              return dtypes
          elif attr == "random":
              import numpy.random as random
              return random
          elif attr == "polynomial":
              import numpy.polynomial as polynomial
              return polynomial
          elif attr == "ma":
              import numpy.ma as ma
              return ma
          elif attr == "ctypeslib":
              import numpy.ctypeslib as ctypeslib
              return ctypeslib
          elif attr == "exceptions":
              import numpy.exceptions as exceptions
              return exceptions
          elif attr == "testing":
              import numpy.testing as testing
              return testing
          elif attr == "matlib":
              import numpy.matlib as matlib
              return matlib
          elif attr == "f2py":
              import numpy.f2py as f2py
              return f2py
          elif attr == "typing":
              import numpy.typing as typing
              return typing
          elif attr == "rec":
  >           import numpy.rec as rec
  E           RecursionError: maximum recursion depth exceeded
  
  ../venv-test-arm64/lib/python3.14t/site-packages/numpy/__init__.py:745: RecursionError
  !!! Recursion error detected, but an error occurred locating the origin of recursion.
    The following exception happened when comparing locals in the stack frame:
      ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    Displaying first and last 10 stack frames out of 996.

SwayamInSync · 2026-05-12T15:21:44Z

Oh that's from cpython, fix is straightforward, will raise the Issue and PR there

SwayamInSync · 2026-05-12T15:40:32Z

As a fix here, adding conftest.py (to pre-import the public submodules) in a separate PR, will merge it then re-run the workflows here

ngoldbaum

I asked Claude to review this for C++ correctness and it spotted some issues, see below. Tests and overall implementation look good.

ngoldbaum · 2026-05-19T14:53:57Z

+        A_f = rng.standard_normal((batch, m, k))
+        B_f = rng.standard_normal((batch, k, n))
+        _assert_matmul_matches_float64(_qnd(A_f), _qnd(B_f), A_f, B_f,
+                                       rtol=1e-13, atol=1e-13)


No newline at end of file. Maybe add a lint or pre-commit hook for this? See e.g. for why this matters.

I will add a pre-commit workflow to ensure this in future

ngoldbaum · 2026-05-19T15:07:23Z

+        case MATMUL_GEMM:
+            temp_A_buffer = new Sleef_quad[m * n];
+            temp_B_buffer = new Sleef_quad[n * p];
+            temp_C_buffer = new Sleef_quad[m * p];


Claude points out that using new like this isn't exception-safe. To keep the allocation as-is you can either use std::unique_ptr to ensure RAII cleanup if an exception happens or new (std::nothrow) to disable exceptions for the allocation.

That said, IMO in an extension it's probably better to use PyMem_RawMalloc because it'll integrate with the interpreter better and scale on multithreaded parallelism better on the free-threaded build where it will use CPython's mimalloc.

ngoldbaum · 2026-05-19T15:14:24Z

 {
-    if (!alpha || !A || !x || !beta || !y || m == 0 || n == 0) {
+    if (m == 0 || n == 0) {
+        return 0;


To be consistent with qblas_dot, shouldn't you write zero to y? Similarly qblas_gemm should probably do the same to C.

ngoldbaum · 2026-05-19T15:15:37Z

-        case MATMUL_DOT: {
-            size_t incx = A_col_stride / sizeof(Sleef_quad);
-            size_t incy = B_row_stride / sizeof(Sleef_quad);
+        switch (op_type) {


Commenting here but the same applies to all the switch statements in your PR: add a default case that e.g. aborts to ensure you don't accidentally add code later that relies on falling through for an invalid value.

Got it, good catch!

ngoldbaum · 2026-05-19T15:18:11Z

+        case MATMUL_GEMM:
+            temp_A_buffer = new Sleef_quad[m * n];
+            temp_B_buffer = new Sleef_quad[n * p];
+            temp_C_buffer = new Sleef_quad[m * p];


Also here and below for all signed integer multiplication: you need to do overflow checking. Since signed integer overflow is UB and the compiler is free to optimize this code away or other badness if it detects a possible UB here.

SwayamInSync · 2026-05-20T07:23:14Z

Thanks @ngoldbaum all the reviews (except the ones I commented on), were already planned for a different PR.
I can perform all of them here if you feel right, but that might go out of scope for this PR and issue.

Let me know what you feel right?

SwayamInSync · 2026-05-20T07:25:48Z

The pre-commit workflow can also come in a different PR because I am guessing it might flag unrelated positions to fix

ngoldbaum · 2026-05-21T17:17:46Z

Sure, let's do the cleanups in future PRs.

SwayamInSync changed the title ~~fixing matmul N-D batch issue~~ [BUG] Fixing matmul to support leading dimensions > 1 May 12, 2026

fixing matmul N-D batch issue

8658102

SwayamInSync mentioned this pull request May 12, 2026

[BUG] matmul produces incorrect results for F-contiguous / non-row-major inputs #89

Open

SwayamInSync mentioned this pull request May 12, 2026

Free-threaded importlib race recurses on lazy-submodule __getattr__ SwayamInSync/cpython#1

Closed

SwayamInSync mentioned this pull request May 12, 2026

Free-threaded importlib race recurses on lazy-submodule __getattr__ python/cpython#149728

Open

This was referenced May 12, 2026

Pre-import numpy lazy submodules before running tests #90

Merged

[FEAT] Adding vecdot implementation #86

Open

SwayamInSync added 2 commits May 19, 2026 14:02

Merge branch 'main' into matmul-nd

b2d3f26

also fixed 0-dim issue and more tight tests

d9ae9e7

ngoldbaum reviewed May 19, 2026

View reviewed changes

adding default switch branch + newline EOF in test

44cbf16

ngoldbaum merged commit 60b1222 into numpy:main May 21, 2026
13 checks passed

Uh oh!

Conversation

SwayamInSync commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SwayamInSync commented May 12, 2026

Uh oh!

SwayamInSync commented May 12, 2026

Uh oh!

SwayamInSync commented May 12, 2026

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

ngoldbaum May 19, 2026

Choose a reason for hiding this comment

Uh oh!

SwayamInSync May 20, 2026

Choose a reason for hiding this comment

Uh oh!

ngoldbaum May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ngoldbaum May 19, 2026

Choose a reason for hiding this comment

Uh oh!

ngoldbaum May 19, 2026

Choose a reason for hiding this comment

Uh oh!

SwayamInSync May 20, 2026

Choose a reason for hiding this comment

Uh oh!

ngoldbaum May 19, 2026

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented May 20, 2026

Uh oh!

SwayamInSync commented May 20, 2026

Uh oh!

ngoldbaum commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SwayamInSync commented May 12, 2026 •

edited

Loading