Populate per-asset dataStandard for BIDS, HED, NWB, and extensions#1811
Populate per-asset dataStandard for BIDS, HED, NWB, and extensions#1811yarikoptic wants to merge 8 commits intomasterfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1811 +/- ##
==========================================
- Coverage 75.11% 74.76% -0.36%
==========================================
Files 84 84
Lines 11925 12083 +158
==========================================
+ Hits 8958 9034 +76
- Misses 2967 3049 +82
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- BIDSDatasetDescriptionAsset.get_metadata(): always sets BIDS standard (with BIDSVersion), adds HED when HEDVersion present in JSON, warns on read failure - NWBAsset.get_metadata(): sets NWB standard - ZarrBIDSAsset.get_metadata(): sets OME/NGFF for .ome.zarr assets - Guard for older dandischema without dataStandard on BareAsset; RuntimeError if dandischema >= 0.12.2 lacks it - Register ai_generated pytest marker in tox.ini Requires dandischema with per-asset dataStandard support (0.12.2+). Works silently with older dandischema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add get_nwb_extensions() to pynwb_utils that reads h5file["specifications"] to discover ndx-* namespaces and their versions (filtering out core/hdmf) - NWBAsset.get_metadata() populates StandardsType.extensions with ndx-* extensions found in the NWB file - BIDSDatasetDescriptionAsset.get_metadata() extracts HED library schemas from list-valued HEDVersion (e.g. ["8.2.0", "sc:1.0.0"]) into extensions - Add tests for both NWB extension extraction and HED library schema parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…S_DATASTANDARD Single bool in bases.py guards all dataStandard/version/extensions features that ship together in dandischema >= 0.12.2, replacing scattered "field in Model.model_fields" checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move json, h5py, get_nwb_extensions, and _SCHEMA_BAREASSET_HAS_DATASTANDARD imports to module top level in tests. Keep pynwb_utils import deferred in bases.py (heavy transitive deps: h5py/pynwb/hdmf/numpy) per existing convention. Add import guidance to CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hed_standard does not exist in dandischema < 0.12.2, so gate the import on _SCHEMA_BAREASSET_HAS_DATASTANDARD (all new symbols ship together). HED detection is skipped when unavailable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| BIDS_DATASET_ERRORS = ("BIDS.MANDATORY_FILE_MISSING_PLACEHOLDER",) | ||
|
|
||
|
|
||
| def _add_standard( |
There was a problem hiding this comment.
line 542 of bases appears to have similar logic
Move _SCHEMA_BAREASSET_HAS_DATASTANDARD guard block after all imports to fix E402. Use getattr/setattr for dataStandard access and type: ignore[call-arg] for StandardsType(version=...) to satisfy mypy against released dandischema that lacks these fields. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| from .bases import ( | ||
| _SCHEMA_BAREASSET_HAS_DATASTANDARD, | ||
| GenericAsset, | ||
| LocalFileAsset, | ||
| NWBAsset, | ||
| ) |
Check notice
Code scanning / CodeQL
Cyclic import Note
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 1 day ago
In general, the clean way to break an import cycle is to avoid having two modules that directly depend on each other’s runtime definitions. For class hierarchies, this usually means either (1) moving shared abstractions into a third, lower-level module, or (2) converting some imports to type-only imports using typing.TYPE_CHECKING, so the runtime import no longer occurs.
Within dandi/files/bids.py, we can break the cycle without changing behavior by changing how we import from .bases. The key is to avoid importing GenericAsset (a base abstraction that likely needs to stay independent), while still ensuring that any references to it remain valid for static typing. Since Python 3.7+ with from __future__ import annotations supports postponed evaluation of annotations, we can safely import GenericAsset only under TYPE_CHECKING. At runtime, code in bids.py does not need the GenericAsset name unless it is used directly (e.g., for isinstance checks); if it is only used in type hints, moving it under TYPE_CHECKING removes the cycle.
Concretely, in dandi/files/bids.py we will:
- Keep importing
LocalFileAssetandNWBAssetfrom.basesat runtime. - Stop importing
GenericAssetat runtime. - Add a
TYPE_CHECKINGguarded import forGenericAssetso type checkers still see it.
This involves:
- Adding
from typing import TYPE_CHECKINGto the imports. - Modifying the
.basesimport block so thatGenericAssetis no longer in the runtime import. - Adding a small
if TYPE_CHECKING:block that importsGenericAssetfrom.basesfor typing only.
No functionality changes for executed code, but the runtime import graph changes so that bids.py no longer pulls in GenericAsset during module initialization, thus breaking or at least relaxing the cycle.
| @@ -7,6 +7,7 @@ | ||
| from pathlib import Path | ||
| from threading import Lock | ||
| import weakref | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from dandischema.models import ( | ||
| BareAsset, | ||
| @@ -20,11 +21,13 @@ | ||
|
|
||
| from .bases import ( | ||
| _SCHEMA_BAREASSET_HAS_DATASTANDARD, | ||
| GenericAsset, | ||
| LocalFileAsset, | ||
| NWBAsset, | ||
| ) | ||
|
|
||
| if TYPE_CHECKING: | ||
| from .bases import GenericAsset | ||
|
|
||
| if _SCHEMA_BAREASSET_HAS_DATASTANDARD: | ||
| from dandischema.models import hed_standard # type: ignore[attr-defined] | ||
| else: |
|
@yarikoptic lgtm! Let me just check on some of my own data |
Use the existing get_nwb_version() to populate the version field on the NWB StandardsType entry so per-asset metadata reports the NWB spec version (e.g. 2.9.0) alongside extensions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Added NWB spec version extraction in 279c297.
Tested against dandiset 001636 (346 NWB files, macaque electrophysiology): Before: After: |
bendichter
left a comment
There was a problem hiding this comment.
ok, I made a minor correction. LGTM now!
Summary
dataStandardfield (new in dandischema) for all standards:dataset_description.jsonwithBIDSVersionHEDVersionindataset_description.json, including library schemas (e.g."sc:1.0.0") asextensionsh5file["specifications"]group.ome.zarrassets in BIDS datasets_SCHEMA_BAREASSET_HAS_DATASTANDARDinbases.py— single bool gates all new dandischema features (dataStandard, version, extensions), with RuntimeError if dandischema >= 0.12.2 unexpectedly lacks the fieldget_nwb_extensions()topynwb_utils.py— reads HDF5specificationsgroup, filters core namespaces, returns{name: latest_version}for each extensionDepends on: dandi/dandischema#XXXX
Test plan
TestBIDSDatasetDescriptionDataStandard— 5 tests covering BIDS always set, HED detected/absent, HED list form, HED library schemas as extensionstest_get_nwb_extensions— verifies ndx-* extraction with version sorting, core namespace filteringtest_get_nwb_extensions_no_specs— empty dict when no specifications groupPR in dandi-schema:
🤖 Generated with Claude Code