#1011 is adding basic metadata extraction for BIDS datasets. Introduced in #1011 is more of a "hack" than proper addition of support for BIDS datasets. It is 'ad-hoc' in part due to the clear separation of "asset types" in https://github.com/dandi/dandi-cli/blob/HEAD/dandi/files.py e.g. to NWBAsset (with custom metadata extraction and validation) vs VideoAsset (nothing special ATM) to GenericAsset (really nothing special ;) ). With introduction of support of BIDS datasets it gets tricky:
- we need to upload pretty much every file (not just .nwb) if it is found to be a BIDS dataset
- we decide if it is a BIDS dataset if there is
dataset_description.json with BIDSVersion in it
- we might have "super-BIDS datasets" like https://dandiarchive.org/dandiset/000026/draft/files?location= where we have following hierarchy within a dandiset
derivatives/<some subdatasets some of which are BIDS>/
rawdata/ - BIDS dataset
so a dandiset can contain multiple BIDS (sb)datasets
- There is multiple "files" from which metadata could be loaded from. Below I outline 3 possible ways, but most likely we would offload both to 1 - file format specific (nwb and its .overwrite.json) + 2 - BIDS specific (using BIDS library), with BIDS overloading what prior one provided. But here are the details
- metadata-precedence-1: metadata will/can come from filename in addition to being extracted from the data file. .nwb files are legit within BIDS datasets, so NWBAsset by itself is not describing entirety of the case. And for NWBAsset belonging to BIDS dataset we would want filename based metadata overload what is in the file.
- metadata-precedence-2 NWB folks are working on introducing overlays support (WiP, not yet finalized). So for
sub-1_slice-1.nwb it would likely to come from sub-1_slice-1.overwrite.json (if present).
- metadata-precedence-3: metadata can come from BIDS sidecar file, e.g for
sub-1_slice-1.nwb it could come from sub-1_slice-1.json
- note: it will be up for
validator to complain whenever there is incongruence between different sources of metadata
@jwodder -- how files.py and anything else needed should be refactored so we support such multiple sources of metadata: file format based + BIDS
#1011 is adding basic metadata extraction for BIDS datasets. Introduced in #1011 is more of a "hack" than proper addition of support for BIDS datasets. It is 'ad-hoc' in part due to the clear separation of "asset types" in https://github.com/dandi/dandi-cli/blob/HEAD/dandi/files.py e.g. to
NWBAsset(with custom metadata extraction and validation) vsVideoAsset(nothing special ATM) toGenericAsset(really nothing special ;) ). With introduction of support of BIDS datasets it gets tricky:dataset_description.jsonwithBIDSVersionin itso a dandiset can contain multiple BIDS (sb)datasets
sub-1_slice-1.nwbit would likely to come fromsub-1_slice-1.overwrite.json(if present).sub-1_slice-1.nwbit could come fromsub-1_slice-1.jsonvalidatorto complain whenever there is incongruence between different sources of metadata@jwodder -- how
files.pyand anything else needed should be refactored so we support such multiple sources of metadata: file format based + BIDS