Skip to content

Handle 0-d ndarrays in scalar isinstance checks#1415

Merged
rly merged 19 commits intodevfrom
remove_duck_typing_for_type
Mar 11, 2026
Merged

Handle 0-d ndarrays in scalar isinstance checks#1415
rly merged 19 commits intodevfrom
remove_duck_typing_for_type

Conversation

@h-mayorquin
Copy link
Copy Markdown
Contributor

Chained after #1414 (_is_collection). Motivated by hdmf-dev/hdmf-zarr#325 (zarr v2 to v3 migration).

hdmf's get_type functions infer element dtype by recursively indexing with data[0] until reaching a scalar, then calling type() on it. With numpy and zarr v2, data[0] on a 1-d float array returns a numpy scalar (e.g., numpy.float64), which passes isinstance(val, float) and has no __len__, so all downstream checks work. With zarr v3 (following the Python array API standard), data[0] returns a 0-d ndarray instead. A 0-d ndarray fails isinstance(val, (int, float, str, bool)) and type() returns numpy.ndarray rather than the element dtype. PR #1414 fixed the crash path (__len__ heuristic), but isinstance checks in other parts of the codebase still silently take the wrong branch when they encounter a 0-d ndarray.

This PR adds a _unwrap_scalar helper in hdmf.utils that converts 0-d ndarrays to numpy scalars via .item(), and applies it at the remaining isinstance checks that compare against Python scalar types.

Together with #1414, this eliminates the need for the __getitem__ monkey-patch in hdmf-zarr PR #325.

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Does the PR clearly describe the problem and the solution?
  • Have you reviewed our Contributing Guide?
  • Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 88.46154% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.08%. Comparing base (ebc5a3b) to head (05423f5).
⚠️ Report is 1 commits behind head on dev.

Files with missing lines Patch % Lines
src/hdmf/backends/hdf5/h5tools.py 60.00% 1 Missing and 1 partial ⚠️
src/hdmf/common/table.py 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #1415   +/-   ##
=======================================
  Coverage   93.08%   93.08%           
=======================================
  Files          41       41           
  Lines       10007    10014    +7     
  Branches     2060     2061    +1     
=======================================
+ Hits         9315     9322    +7     
  Misses        416      416           
  Partials      276      276           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

h-mayorquin and others added 11 commits March 3, 2026 02:20
zarr v3 scalar indexing returns 0-d ndarrays, which fail check_type(arg, int).
Unwrap before the type check so ElementIdentifiers validation works with zarr v3 arrays.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… guards

- container.py: Data.__len__ uses _get_length for zarr v3 Arrays without __len__
- h5tools.py: use _get_length when sizing datasets during export
- objectmapper.py: use _get_length for compound dtype shape, unwrap 0-d rows

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
VectorData.get, DynamicTableRegion.get, DynamicTableRegion.shape,
DynamicTable.add_column, and EnumData.__add_term all call len() on
self.data which fails with zarr v3 Arrays that lack __len__.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The function returns shape[0], not len(). The new name reflects the
actual semantics: the size of the first dimension, which is what the
array API standard exposes via .shape rather than __len__.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After discussion, _get_length is the clearest name: it maps to what
Python programmers understand as "how many items if I iterate this",
which is what every call site needs. The implementation detail of using
shape[0] vs len() belongs in the docstring, not the name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Base automatically changed from remove_duck_typing_for_array_detection to dev March 10, 2026 18:15
@rly
Copy link
Copy Markdown
Contributor

rly commented Mar 10, 2026

@h-mayorquin could you please resolve the conflicts in this PR? Thanks.

Comment thread src/hdmf/container.py

def __len__(self):
return len(self.__data)
return _get_length(self.__data)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure why those were not merged with #1414

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I only found those errors later and did some further additions here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe, we should have tests for those?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's take this opportunity to add some. Those tests were missing in the first place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added I also improved the docstring and unify some uses len that were not going through the new centralized function.

Let me know if you think there is something else to add

@rly
Copy link
Copy Markdown
Contributor

rly commented Mar 11, 2026

I haven't searched through where else len could be replaced, but these changes here look good to me. If you spot others, please feel free to open a new PR.

@rly
Copy link
Copy Markdown
Contributor

rly commented Mar 11, 2026

Thanks @h-mayorquin !

@rly rly merged commit d878593 into dev Mar 11, 2026
27 of 33 checks passed
@rly rly deleted the remove_duck_typing_for_type branch March 11, 2026 09:10
@h-mayorquin
Copy link
Copy Markdown
Contributor Author

I haven't searched through where else len could be replaced, but these changes here look good to me. If you spot others, please feel free to open a new PR.

Yes, I double checked. I will be on the look for more uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants