Skip to content

Reading data that was written with deprecated bytes codec #3513

Description

@rabernat

Prior to 3.1, it was possible to write an array that looked like this

doc = {
 'shape': [],
 'data_type': 'bytes',
 'chunk_grid': {'name': 'regular', 'configuration': {'chunk_shape': []}},
 'chunk_key_encoding': {'name': 'default',
  'configuration': {'separator': '/'}},
 'fill_value': [],
 'codecs': [{'name': 'vlen-bytes', 'configuration': {}},
  {'name': 'zstd', 'configuration': {'level': 0, 'checksum': False}}],
 'attributes': {},
 'zarr_format': 3,
 'node_type': 'array',
 'storage_transformers': []
}

Attempting to load this data errors

import zarr
zarr.core.metadata.ArrayV3Metadata.from_dict(doc)
File ~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/dtype/registry.py:208, in DataTypeRegistry.match_json(self, data, zarr_format)
    206     except DataTypeValidationError:
    207         pass
--> [208](https://file+.vscode-resource.vscode-cdn.net/Users/rabernat/gh/earth-mover/demos/~/mambaforge/envs/earthmover-demos/lib/python3.12/site-packages/zarr/core/dtype/registry.py:208) raise ValueError(f"No Zarr data type found that matches {data!r}")

ValueError: No Zarr data type found that matches 'bytes'

The following tweaks make it loadable

doc["data_type"] = "variable_length_bytes"
doc["fill_value"] = ""

It would be nice if we

  1. Had an alias for the deprecated bytes dtype to variable_length_bytes
  2. Could deal with fill_value = [] here

Otherwise data that was written with older Zarr versions is not interoperable with new ones.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python libraryspec complianceRelated to the library's compliance with the Zarr specifications

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions