Skip to content

Populate per-asset dataStandard for BIDS, HED, NWB, and extensions#1811

Open
yarikoptic wants to merge 8 commits intomasterfrom
hed
Open

Populate per-asset dataStandard for BIDS, HED, NWB, and extensions#1811
yarikoptic wants to merge 8 commits intomasterfrom
hed

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Feb 24, 2026

Summary

  • Populate per-asset dataStandard field (new in dandischema) for all standards:
    • BIDS: from dataset_description.json with BIDSVersion
    • HED: from HEDVersion in dataset_description.json, including library schemas (e.g. "sc:1.0.0") as extensions
    • NWB: from NWB file metadata, with ndx-* extensions extracted from h5file["specifications"] group
    • OME/NGFF: for .ome.zarr assets in BIDS datasets
  • Centralize backward-compat guard as _SCHEMA_BAREASSET_HAS_DATASTANDARD in bases.py — single bool gates all new dandischema features (dataStandard, version, extensions), with RuntimeError if dandischema >= 0.12.2 unexpectedly lacks the field
  • Add get_nwb_extensions() to pynwb_utils.py — reads HDF5 specifications group, filters core namespaces, returns {name: latest_version} for each extension
  • All new code is no-op against released dandischema (< 0.12.2)

Depends on: dandi/dandischema#XXXX

Test plan

  • TestBIDSDatasetDescriptionDataStandard — 5 tests covering BIDS always set, HED detected/absent, HED list form, HED library schemas as extensions
  • test_get_nwb_extensions — verifies ndx-* extraction with version sorting, core namespace filtering
  • test_get_nwb_extensions_no_specs — empty dict when no specifications group
  • 460 tests pass with dev dandischema, 426 pass with released dandischema 0.12.1 (new feature tests correctly skip)
  • TODO1: add matrix run to test against dandi-schema#371, and then remove that or do it properly as a dedicated run based on description here annotation

PR in dandi-schema:

🤖 Generated with Claude Code

@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 47.53086% with 85 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.76%. Comparing base (0fe497d) to head (279c297).

Files with missing lines Patch % Lines
dandi/files/bids.py 20.75% 42 Missing ⚠️
dandi/files/bases.py 4.54% 21 Missing ⚠️
dandi/tests/test_files.py 62.22% 17 Missing ⚠️
dandi/pynwb_utils.py 75.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1811      +/-   ##
==========================================
- Coverage   75.11%   74.76%   -0.36%     
==========================================
  Files          84       84              
  Lines       11925    12083     +158     
==========================================
+ Hits         8958     9034      +76     
- Misses       2967     3049      +82     
Flag Coverage Δ
unittests 74.76% <47.53%> (-0.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yarikoptic yarikoptic added the minor Increment the minor version when merged label Mar 3, 2026
yarikoptic and others added 6 commits March 3, 2026 14:52
- BIDSDatasetDescriptionAsset.get_metadata(): always sets BIDS standard
  (with BIDSVersion), adds HED when HEDVersion present in JSON, warns
  on read failure
- NWBAsset.get_metadata(): sets NWB standard
- ZarrBIDSAsset.get_metadata(): sets OME/NGFF for .ome.zarr assets
- Guard for older dandischema without dataStandard on BareAsset;
  RuntimeError if dandischema >= 0.12.2 lacks it
- Register ai_generated pytest marker in tox.ini

Requires dandischema with per-asset dataStandard support (0.12.2+).
Works silently with older dandischema.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add get_nwb_extensions() to pynwb_utils that reads h5file["specifications"]
  to discover ndx-* namespaces and their versions (filtering out core/hdmf)
- NWBAsset.get_metadata() populates StandardsType.extensions with ndx-*
  extensions found in the NWB file
- BIDSDatasetDescriptionAsset.get_metadata() extracts HED library schemas
  from list-valued HEDVersion (e.g. ["8.2.0", "sc:1.0.0"]) into extensions
- Add tests for both NWB extension extraction and HED library schema parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…S_DATASTANDARD

Single bool in bases.py guards all dataStandard/version/extensions
features that ship together in dandischema >= 0.12.2, replacing
scattered "field in Model.model_fields" checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move json, h5py, get_nwb_extensions, and _SCHEMA_BAREASSET_HAS_DATASTANDARD
imports to module top level in tests.  Keep pynwb_utils import deferred in
bases.py (heavy transitive deps: h5py/pynwb/hdmf/numpy) per existing
convention.  Add import guidance to CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hed_standard does not exist in dandischema < 0.12.2, so gate the import
on _SCHEMA_BAREASSET_HAS_DATASTANDARD (all new symbols ship together).
HED detection is skipped when unavailable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
BIDS_DATASET_ERRORS = ("BIDS.MANDATORY_FILE_MISSING_PLACEHOLDER",)


def _add_standard(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 542 of bases appears to have similar logic

Move _SCHEMA_BAREASSET_HAS_DATASTANDARD guard block after all imports
to fix E402.  Use getattr/setattr for dataStandard access and
type: ignore[call-arg] for StandardsType(version=...) to satisfy
mypy against released dandischema that lacks these fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +21 to +26
from .bases import (
_SCHEMA_BAREASSET_HAS_DATASTANDARD,
GenericAsset,
LocalFileAsset,
NWBAsset,
)

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
dandi.files.bases
begins an import cycle.

Copilot Autofix

AI 1 day ago

In general, the clean way to break an import cycle is to avoid having two modules that directly depend on each other’s runtime definitions. For class hierarchies, this usually means either (1) moving shared abstractions into a third, lower-level module, or (2) converting some imports to type-only imports using typing.TYPE_CHECKING, so the runtime import no longer occurs.

Within dandi/files/bids.py, we can break the cycle without changing behavior by changing how we import from .bases. The key is to avoid importing GenericAsset (a base abstraction that likely needs to stay independent), while still ensuring that any references to it remain valid for static typing. Since Python 3.7+ with from __future__ import annotations supports postponed evaluation of annotations, we can safely import GenericAsset only under TYPE_CHECKING. At runtime, code in bids.py does not need the GenericAsset name unless it is used directly (e.g., for isinstance checks); if it is only used in type hints, moving it under TYPE_CHECKING removes the cycle.

Concretely, in dandi/files/bids.py we will:

  • Keep importing LocalFileAsset and NWBAsset from .bases at runtime.
  • Stop importing GenericAsset at runtime.
  • Add a TYPE_CHECKING guarded import for GenericAsset so type checkers still see it.

This involves:

  • Adding from typing import TYPE_CHECKING to the imports.
  • Modifying the .bases import block so that GenericAsset is no longer in the runtime import.
  • Adding a small if TYPE_CHECKING: block that imports GenericAsset from .bases for typing only.

No functionality changes for executed code, but the runtime import graph changes so that bids.py no longer pulls in GenericAsset during module initialization, thus breaking or at least relaxing the cycle.

Suggested changeset 1
dandi/files/bids.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/dandi/files/bids.py b/dandi/files/bids.py
--- a/dandi/files/bids.py
+++ b/dandi/files/bids.py
@@ -7,6 +7,7 @@
 from pathlib import Path
 from threading import Lock
 import weakref
+from typing import TYPE_CHECKING
 
 from dandischema.models import (
     BareAsset,
@@ -20,11 +21,13 @@
 
 from .bases import (
     _SCHEMA_BAREASSET_HAS_DATASTANDARD,
-    GenericAsset,
     LocalFileAsset,
     NWBAsset,
 )
 
+if TYPE_CHECKING:
+    from .bases import GenericAsset
+
 if _SCHEMA_BAREASSET_HAS_DATASTANDARD:
     from dandischema.models import hed_standard  # type: ignore[attr-defined]
 else:
EOF
@@ -7,6 +7,7 @@
from pathlib import Path
from threading import Lock
import weakref
from typing import TYPE_CHECKING

from dandischema.models import (
BareAsset,
@@ -20,11 +21,13 @@

from .bases import (
_SCHEMA_BAREASSET_HAS_DATASTANDARD,
GenericAsset,
LocalFileAsset,
NWBAsset,
)

if TYPE_CHECKING:
from .bases import GenericAsset

if _SCHEMA_BAREASSET_HAS_DATASTANDARD:
from dandischema.models import hed_standard # type: ignore[attr-defined]
else:
Copilot is powered by AI and may make mistakes. Always verify output.
@bendichter
Copy link
Member

@yarikoptic lgtm! Let me just check on some of my own data

Use the existing get_nwb_version() to populate the version field on the
NWB StandardsType entry so per-asset metadata reports the NWB spec
version (e.g. 2.9.0) alongside extensions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bendichter
Copy link
Member

Added NWB spec version extraction in 279c297.

get_nwb_version() already existed in pynwb_utils.py but wasn't being used when constructing the StandardsType. Now the version is populated from the HDF5 nwb_version attribute.

Tested against dandiset 001636 (346 NWB files, macaque electrophysiology):

Before:

Neurodata Without Borders (NWB) (version=None)
  ext: ndx-events v0.4.0
  ext: ndx-hed v0.2.0

After:

Neurodata Without Borders (NWB) (version=2.9.0)
  ext: ndx-events v0.4.0
  ext: ndx-hed v0.2.0

@bendichter bendichter self-requested a review March 3, 2026 20:23
Copy link
Member

@bendichter bendichter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I made a minor correction. LGTM now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

minor Increment the minor version when merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants