Skip to content

deprecate open_zarr#7496

Closed
jhamman wants to merge 5 commits into
pydata:mainfrom
jhamman:dep/open_zarr
Closed

deprecate open_zarr#7496
jhamman wants to merge 5 commits into
pydata:mainfrom
jhamman:dep/open_zarr

Conversation

@jhamman

@jhamman jhamman commented Jan 31, 2023

Copy link
Copy Markdown
Member

This PR deprecates open_zarr in favor of open_dataset(..., engine='zarr').

@github-actions github-actions Bot added io topic-backends topic-zarr Related to zarr storage library labels Jan 31, 2023
@jhamman jhamman requested a review from rabernat January 31, 2023 16:47
Comment thread xarray/backends/zarr.py Outdated
@Illviljan

Copy link
Copy Markdown
Contributor

It would be nice to have a good copy/paste example for open_dataset. For example I think chunks have different defaults compared to open_zarr.

@jhamman

jhamman commented Jan 31, 2023

Copy link
Copy Markdown
Member Author

@paigem recently added the following to the open_dataset docstring (#7438):

In order to reproduce the default behavior of xr.open_zarr(...) use 
`xr.open_dataset(..., engine='zarr', chunks={})`.

@jhamman jhamman marked this pull request as ready for review January 31, 2023 21:17
Comment thread xarray/backends/zarr.py Outdated
Comment thread xarray/backends/zarr.py Outdated
jhamman and others added 2 commits January 31, 2023 14:00
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
@shoyer

shoyer commented Feb 1, 2023

Copy link
Copy Markdown
Member

I like open_zarr(...) because it's less typing than open_dataset(..., engine='zarr'). The automatic backend detection logic doesn't currently work for Zarr, and in every case it adds overhead, which could be significant in the case of remote storage backends like Zarr.

So personally I would rather go the other direction and add open_netcdf().

The inconsistency in the chunks argument is non-ideal, but that could be handled by a separate deprecation process.

@rabernat

rabernat commented Feb 1, 2023

Copy link
Copy Markdown
Contributor

It is true that Xarray is now becoming very different from pandas in how it opens data.

@weiji14

weiji14 commented Feb 19, 2023

Copy link
Copy Markdown
Contributor

The inconsistency in the chunks argument is non-ideal, but that could be handled by a separate deprecation process.

There was some discussion on whether xr.open_dataset should default to chunks=None, or switch to open_zarr's chunks="auto" at #4187 (comment) (the last attempt at deprecating open_zarr 🙂). Just bringing it up for debate.

Also, quite a few people were in favour of keeping open_zarr back in 2020 (see e.g. #4187 (comment)), but maybe times have changed, and a consistent API is more desirable than convenience?

@krokosik

Copy link
Copy Markdown
Contributor

As a frequent Zarr user I would vote in favor of deprecating open_zarr as it is misleading. It always opens the store as a Dataset, even when it is a DataArray saved using to_zarr (new feature, I know). Looking at the issues, there are inconsistencies between open_zarr and open_dataset, so deprecating seems favorable as it removes the need for extra maintenance to keep both methods behaving in a similar way.

@max-sixty

max-sixty commented Sep 21, 2023

Copy link
Copy Markdown
Collaborator

Personal view — I find .open_zarr convenient! But I would not fight for it to stay...

It always opens the store as a Dataset, even when it is a DataArray saved using to_zarr

I agree — though we could fix that...

@krokosik

krokosik commented Sep 21, 2023

Copy link
Copy Markdown
Contributor

Yeah, we could fix it, but my point is that having two for opening zarr files raises more issues and requires extra work, while the only benefit I can see is a bit shorter syntax. Some other issue: #8095.

@max-sixty

max-sixty commented Sep 22, 2023

Copy link
Copy Markdown
Collaborator

having two for opening zarr files raises more issues and requires extra work, while the only benefit I can see is a bit shorter syntax.

Yes, definitely agree...

Not sure how to turn that into a decision — maybe "soft-deprecate" .open_zarr? i.e. de-emphasize, remove from main docs — but don't issue warnings — and then reassess whether .open_data{set;array} has taken over in the future.

That worked well for .drop ("wait we have a .drop method?", ideally you would ask).

@krokosik

krokosik commented Sep 22, 2023

Copy link
Copy Markdown
Contributor

I don't have much experience on handling such things, as I'm fairly new to collaborating and maintaining an open source project. Sounds good to me, altough I don't see why we wouldn't want to add the deprecation warning?

@max-sixty

Copy link
Copy Markdown
Collaborator

although I don't see why we wouldn't want to add the deprecation warning?

The logic is that if we're not confident / don't have consensus on the best final state, we can take a small step without committing to anything, and then make another assessment later with more experience.

By de-emphasizing the function from the docs, we reduce the number of new uses, without causing noisy warnings to show up for folks already it. Then we can assess in the future if we properly depreciate & eventually remove it. Or if the feedback is that the function is useful, we can revert.

Otherwise I worry we're not going to get consensus, and so not going to move forward with making any changes.

(But this is just my idea, I'm one of many, we should do whatever the aggregate view is!)

@krokosik

Copy link
Copy Markdown
Contributor

I see, thank you for explaining. It does sound better, considering that open_zarr is probably used more than the other methods.

@max-sixty

Copy link
Copy Markdown
Collaborator

Adding a "needs discussion" label — a few things are meandering because of this — #8095, #2660 — and so picking a route would let us focus our energy on a path forward for these

@jhamman

jhamman commented Mar 20, 2025

Copy link
Copy Markdown
Member Author

This never made it in. Closing as I think the conclusion here is that folks like open_zarr more than we thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecate open_zarr in favor of open_dataset(..., engine='zarr')

7 participants