Handle 0-d ndarrays in scalar isinstance checks#1415
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #1415 +/- ##
=======================================
Coverage 93.08% 93.08%
=======================================
Files 41 41
Lines 10007 10014 +7
Branches 2060 2061 +1
=======================================
+ Hits 9315 9322 +7
Misses 416 416
Partials 276 276 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
zarr v3 scalar indexing returns 0-d ndarrays, which fail check_type(arg, int). Unwrap before the type check so ElementIdentifiers validation works with zarr v3 arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… guards - container.py: Data.__len__ uses _get_length for zarr v3 Arrays without __len__ - h5tools.py: use _get_length when sizing datasets during export - objectmapper.py: use _get_length for compound dtype shape, unwrap 0-d rows Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
VectorData.get, DynamicTableRegion.get, DynamicTableRegion.shape, DynamicTable.add_column, and EnumData.__add_term all call len() on self.data which fails with zarr v3 Arrays that lack __len__. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The function returns shape[0], not len(). The new name reflects the actual semantics: the size of the first dimension, which is what the array API standard exposes via .shape rather than __len__. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After discussion, _get_length is the clearest name: it maps to what Python programmers understand as "how many items if I iterate this", which is what every call site needs. The implementation detail of using shape[0] vs len() belongs in the docstring, not the name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…k_typing_for_type
…k_typing_for_type
|
@h-mayorquin could you please resolve the conflicts in this PR? Thanks. |
|
|
||
| def __len__(self): | ||
| return len(self.__data) | ||
| return _get_length(self.__data) |
There was a problem hiding this comment.
I am unsure why those were not merged with #1414
There was a problem hiding this comment.
Ok, I think I only found those errors later and did some further additions here.
There was a problem hiding this comment.
But maybe, we should have tests for those?
There was a problem hiding this comment.
Yeah, let's take this opportunity to add some. Those tests were missing in the first place.
There was a problem hiding this comment.
Added I also improved the docstring and unify some uses len that were not going through the new centralized function.
Let me know if you think there is something else to add
|
I haven't searched through where else |
|
Thanks @h-mayorquin ! |
Yes, I double checked. I will be on the look for more uses. |
Chained after #1414 (
_is_collection). Motivated by hdmf-dev/hdmf-zarr#325 (zarr v2 to v3 migration).hdmf's
get_typefunctions infer element dtype by recursively indexing withdata[0]until reaching a scalar, then callingtype()on it. With numpy and zarr v2,data[0]on a 1-d float array returns a numpy scalar (e.g.,numpy.float64), which passesisinstance(val, float)and has no__len__, so all downstream checks work. With zarr v3 (following the Python array API standard),data[0]returns a 0-d ndarray instead. A 0-d ndarray failsisinstance(val, (int, float, str, bool))andtype()returnsnumpy.ndarrayrather than the element dtype. PR #1414 fixed the crash path (__len__heuristic), butisinstancechecks in other parts of the codebase still silently take the wrong branch when they encounter a 0-d ndarray.This PR adds a
_unwrap_scalarhelper inhdmf.utilsthat converts 0-d ndarrays to numpy scalars via.item(), and applies it at the remainingisinstancechecks that compare against Python scalar types.Together with #1414, this eliminates the need for the
__getitem__monkey-patch in hdmf-zarr PR #325.Checklist
CHANGELOG.mdwith your changes?