Skip to content

Bug: delete_dir in stateful test uses raw startswith for bookkeeping, causing flaky KeyError in delete_group_using_del #3977

@d-v-b

Description

@d-v-b

this issue was written by claude. it was discovered while working on #3961

Bug: delete_dir in stateful test uses raw startswith for bookkeeping, causing flaky KeyError in delete_group_using_del

The hypothesis state machine in src/zarr/testing/stateful.py tracks created nodes in two sets, self.all_arrays and self.all_groups. When delete_dir(path) runs, it prunes those sets using a raw string-prefix match:

# src/zarr/testing/stateful.py:307-312
matches = set()
for node in self.all_groups | self.all_arrays:
    if node.startswith(path):
        matches.add(node)
self.all_groups = self.all_groups - matches
self.all_arrays = self.all_arrays - matches

node.startswith(path) matches any node whose path string begins with path, not just nodes that are descendants of the directory path. So delete_dir('6/f') matches a sibling node at 6/faNT7p7jvJsO3_C._HYi and incorrectly removes it from all_arrays.

The real store-level delete_dir('6/f') only removes objects under 6/f/, so 6/faNT... survives in the store. The bookkeeping and the model now disagree. When delete_group_using_del later walks members(...) of an ancestor group and tries self.all_arrays.remove(obj.path), the entry has already been pruned by the broken delete_dir, and the call raises KeyError.

Reproduction

Slow Hypothesis CI run https://github.com/zarr-developers/zarr-python/actions/runs/25939320276 found this in two distinct falsifying examples in the same job:

File "src/zarr/testing/stateful.py", line 372, in delete_group_using_del
    self.all_arrays.remove(obj.path)
KeyError: '6/j3pnC'
File "src/zarr/testing/stateful.py", line 372, in delete_group_using_del
    self.all_arrays.remove(obj.path)
KeyError: '6/faNT7p7jvJsO3_C._HYi'

The shrunk trace shows the pattern clearly: an array is created at 6/faNT7p7jvJsO3_C._HYi, delete_dir('6/f') is invoked, and the next delete_group_using_del targeting '6' blows up because the bookkeeping for 6/faNT... is gone but the store still has it.

The bug is non-deterministic in CI because .github/workflows/hypothesis.yaml does not pin a hypothesis seed. Most runs pass; the example only surfaces when node_names generates a name that is a string-prefix-collision with another sibling's name and the action ordering exposes the bookkeeping drift.

Root cause

delete_dir strips entries by string prefix instead of by path-segment prefix. The check needs to require that any match is either equal to path or has path followed by the / segment separator.

Suggested fix

Replace the body of the delete_dir cleanup loop with a segment-aware check:

matches = {
    node for node in self.all_groups | self.all_arrays
    if node == path or node.startswith(path + "/")
}

Origin

Introduced in #3130 (commit c972f7f) when the additional stateful actions were ported from icechunk. Unrelated to the current zarr-metadata refactor; surfaced there only because Hypothesis randomization happened to find it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions