GH-40348: [Python] Add Python wrapper for VariableShapeTensor#40354
GH-40348: [Python] Add Python wrapper for VariableShapeTensor#40354rok wants to merge 1 commit intoapache:mainfrom
Conversation
AlenkaF
left a comment
There was a problem hiding this comment.
Small nits =) Thank you for working on this, takes lots of work!
b6b63a6 to
a372fe6
Compare
|
The failures are connected. |
fab047e to
f58c973
Compare
5197532 to
55cc40b
Compare
4429dd2 to
b4cb6cd
Compare
| internal::Permute<int64_t>(permutation, &shape); | ||
| ARROW_ASSIGN_OR_RAISE( | ||
| auto strides, internal::ComputeStrides(ext_type.value_type(), shape, permutation)); | ||
| internal::Permute<int64_t>(permutation, &shape); | ||
|
|
There was a problem hiding this comment.
I'm thinking thinking about this still. It's probably a bug or a different interpretation at least.
0df8139 to
d5a548d
Compare
|
@AlenkaF could you do a quick pass on this? I'm a little unsure about the |
|
@rok sure! Will have a look later today. |
|
I did a very quick look at the code now. I would suggest splitting this PR into two as there is quite a lot of lines added and there might be even something that can be moved out. Moving the |
Add PyArrow bindings for the VariableShapeTensor extension type, including VariableShapeTensorType, VariableShapeTensorArray, and VariableShapeTensorScalar with support for converting to/from NumPy tensors.
Agreed, I moved these into a separate PR and rebased here. I think this is ready for a more final review. |
AlenkaF
left a comment
There was a problem hiding this comment.
Thank you for adding this @rok! Huge amount of work was needed here (and in the C++ implementation).
I did a first check but still need to go through most of the Cython code and all the tests.
As for the C++ permutation change - maybe that also fits better into a separate issue&PR?
|
|
||
| shared_ptr[CArray] storage() | ||
|
|
||
| cdef extern from "arrow/extension/variable_shape_tensor.h" namespace "arrow::extension" nogil: |
There was a problem hiding this comment.
Could this be moved little lower, after fixed shape tensor ref?
| cdef extern from "arrow/extension/fixed_shape_tensor.h" namespace "arrow::extension" nogil: | ||
| cdef cppclass CFixedShapeTensorType \ | ||
| " arrow::extension::FixedShapeTensorType"(CExtensionType): | ||
| " arrow::extension::FixedShapeTensorType"(CExtensionType) nogil: |
There was a problem hiding this comment.
Is there a reason nogil is added here and used in the variable version also?
| const shared_ptr[CBuffer] null_bitmap, | ||
| ) | ||
|
|
||
| @staticmethod |
There was a problem hiding this comment.
I am not sure where this is needed?
| void set_chunksize(int64_t chunksize) | ||
|
|
||
| cdef cppclass CTensor" arrow::Tensor": | ||
| CTensor(const shared_ptr[CDataType]& type, |
There was a problem hiding this comment.
Here also, I am not sure if we actually need this?
| if isinstance(obj, np.ndarray): | ||
| raise TypeError("obj must be a sequence of numpy arrays") |
There was a problem hiding this comment.
I think the first check falls into the second one and could be omitted?
| array_ndim = arrays[0].ndim | ||
| arrow_type = from_numpy_dtype(base_dtype) | ||
|
|
||
| if value_type is not None and value_type != arrow_type: |
There was a problem hiding this comment.
Do we need to accept value_type if the input is not empty as it has to be same as the base type anyways? Not sure what are other options, but feels a bit funny as we are giving this option but it then has to be specific.
Rationale for this change
Once C++ implementation of VariableShapeTensor is complete (#38007) we want a Python wrapper for
VariableShapeTensor.What changes are included in this PR?
This adds a Python wrapper for
VariableShapeTensor.Are these changes tested?
This adds appropriate Python tests.
Are there any user-facing changes?
Yes, a new extension and array type are exposed.