Change default loadable_variables (and indexes) to match xarray's behaviour#477
Merged
TomNicholas merged 63 commits intoMar 24, 2025
Merged
Conversation
for more information, see https://pre-commit.ci
…icholas/VirtualiZarr into refactor_loadable_variables
for more information, see https://pre-commit.ci
Member
Author
|
I just removed |
8 tasks
Member
Author
|
FYI @maxrjones @sharkinsspatial this PR has got to the point where I think the only failing tests are those which use a kerchunk-based reader, as I haven't ported the kerchunk translation code yet. So you could maybe build off this branch already... |
TomNicholas
commented
Mar 21, 2025
| from virtualizarr import open_virtual_dataset | ||
|
|
||
| with open_virtual_dataset(netcdf4_file, indexes={}) as ds: | ||
| with open_virtual_dataset(netcdf4_file, loadable_variables=[]) as ds: |
Member
Author
There was a problem hiding this comment.
Required otherwise we get inlined variables in the kerchunk file which we don't know how to read (#489)
TomNicholas
commented
Mar 21, 2025
TomNicholas
commented
Mar 21, 2025
TomNicholas
commented
Mar 21, 2025
TomNicholas
commented
Mar 21, 2025
TomNicholas
commented
Mar 21, 2025
Member
Author
There was a problem hiding this comment.
Oh I also moved this to a new api.py file.
…icholas/VirtualiZarr into refactor_loadable_variables
6 tasks
TomNicholas
commented
Mar 24, 2025
for more information, see https://pre-commit.ci
maxrjones
approved these changes
Mar 24, 2025
maxrjones
left a comment
Member
There was a problem hiding this comment.
Looks good, thank you @TomNicholas! Just had a few nits
TomNicholas
added a commit
that referenced
this pull request
Mar 25, 2025
* need latest version of xarray to import internals correctly * Fix metadata equality for nan fill value (#502) * add check that works for fill_values too * note about removing once merged upstream * type hint * regression test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove accidental changes to pyproject.toml * Update pyproject.toml * ignore mypy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Setup intersphinx mapping for docs (#503) * Setup intersphinx mapping for docs --------- Co-authored-by: Kyle Barron <kylebarron2@gmail.com> * Change default loadable_variables (and indexes) to match xarray's behaviour (#477) * draft refactor * sketch of simplified handling of loadable_variables * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get at least some tests working * separate VirtualBackend api definition from common utilities * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove indexes={} everywhere in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * stop passing through loadable_variables to where it isn't used * implement logic to load 1D dimension coords by default * remove more instances of indexes={} * remove more indexes={} * refactor logic for choosing loadable_variables * fix more tets * xfail Aimee's test that I don't understand * xfail test that explicitly specifies no indexes * made a bunch more stuff pass * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix netcdf3 reader * fix bad import in FITS reader * fix import in tiff reader * fix import in icechunk test * release note * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update docstring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix fits reader * xfail on empty dict for indexes * linting * actually test new expected behaviour * fix logic for setting loadable_variables * update docs page to reflect new behaviour * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix expected behaviour in another tests * additional assert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use encode_dataset_coordinates in kerchunk writer * Encode zarr vars * fix some mypy errors * move drop_variables implmentation to the end of every reader * override loadable_variables and raise warning * fix failing test by not creating loadable variables that would get inlined by default * improve error message * remove some more occurrences of indexes={} * skip slow test * slay mypy errors * docs typos * should fix dmrpp test * Delete commented-out code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unecessary test skip --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> * Update pyproject.toml deps (#504) * re-add icechunk to upstream tests * add pytest-asyncio to test envs * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
TomNicholas
added a commit
that referenced
this pull request
Mar 29, 2025
* copy implementation from xarray * sketch idea for lithops parallelization * standardize naming of variables * add to public API * fix errors caused by trying to import xarray types * start writing tests * passing test for combining in serial * requires_kerchunk * test for lithops with default LocalHost executor * notes on confusing AssertionError * ensure lithops is installed * remove uneeded fixture * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Additions to `open_virtual_mfdataset` (#508) * need latest version of xarray to import internals correctly * Fix metadata equality for nan fill value (#502) * add check that works for fill_values too * note about removing once merged upstream * type hint * regression test * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove accidental changes to pyproject.toml * Update pyproject.toml * ignore mypy --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Setup intersphinx mapping for docs (#503) * Setup intersphinx mapping for docs --------- Co-authored-by: Kyle Barron <kylebarron2@gmail.com> * Change default loadable_variables (and indexes) to match xarray's behaviour (#477) * draft refactor * sketch of simplified handling of loadable_variables * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * get at least some tests working * separate VirtualBackend api definition from common utilities * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove indexes={} everywhere in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * stop passing through loadable_variables to where it isn't used * implement logic to load 1D dimension coords by default * remove more instances of indexes={} * remove more indexes={} * refactor logic for choosing loadable_variables * fix more tets * xfail Aimee's test that I don't understand * xfail test that explicitly specifies no indexes * made a bunch more stuff pass * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix netcdf3 reader * fix bad import in FITS reader * fix import in tiff reader * fix import in icechunk test * release note * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update docstring * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix fits reader * xfail on empty dict for indexes * linting * actually test new expected behaviour * fix logic for setting loadable_variables * update docs page to reflect new behaviour * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix expected behaviour in another tests * additional assert * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use encode_dataset_coordinates in kerchunk writer * Encode zarr vars * fix some mypy errors * move drop_variables implmentation to the end of every reader * override loadable_variables and raise warning * fix failing test by not creating loadable variables that would get inlined by default * improve error message * remove some more occurrences of indexes={} * skip slow test * slay mypy errors * docs typos * should fix dmrpp test * Delete commented-out code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unecessary test skip --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> * Update pyproject.toml deps (#504) * re-add icechunk to upstream tests * add pytest-asyncio to test envs * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> * Additions to `open_virtual_mfdataset` (#509) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * Add ManifestStore for loading data from ManifestArrays (#490) * Draft ManifestStore implementation --------- Co-authored-by: Tom Nicholas <tom@earthmover.io> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * make it work for dask delayed * correct docstring --------- Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * More open_virtual_mfdataset (#510) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * make it work for dask delayed * correct docstring * added compliant executor for lithops * add links to lithops issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Final fixes for open_virtual_mfdataset (#517) * need latest version of xarray to import internals correctly * passing serial open_virtual_mfdataset test * passes with lithops but only for the HDF backend * add test for dask * refactored serial and lithops codepaths to use an executor pattern * xfail lithops * consolidate tests by parametrizing over parallel kwarg * re-enable lithops test * remove unneeded get_executor function * add test for using dask distributed to parallelize * make it work for dask delayed * correct docstring * added compliant executor for lithops * add links to lithops issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * specify dask and lithops executors with a string again * fix easy typing stuff * fix typing errors by aligning executor signatures * remove open_virtual_mfdataset from public API for now * release note * refactor construction of expected result * implement preprocess arg, and dodge lithops bug * update comment --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Apply suggestions from code reviewRemRemove new deps * remove rogue print statement --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> Co-authored-by: Kyle Barron <kylebarron2@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a second attempt at addressing #335, being more brutal about removing options that aren't used. It is also intended to make implementing #473 easier.
The idea is that no-one really cares about all the complexity of distinguishing between 1D coordinate variables with and without indexes. Instead we should just default to the same index-creation behaviour as xarray uses, and the easiest way to do that is just to use
xr.open_datasetand drop variables the user didn't actually want to load.This will be inefficient right now (in the same way that the current implementation is inefficient) because we fully scan over the whole file twice. But this sets up for #473, which will avoid scanning over the file more than once.
docs/releases.rstNew functions/methods are listed inapi.rst