Skip to content

Commit 132d417

Browse files
committed
precommit
1 parent 7243f36 commit 132d417

File tree

12 files changed

+116
-73
lines changed

12 files changed

+116
-73
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ repos:
44
rev: 1.5.0
55
hooks:
66
- id: interrogate
7-
exclude: ^(setup.py|tests)
7+
exclude: ^(setup.py|model_catalogs/tests)
88
args: [--config=pyproject.toml]
99

1010
- repo: https://github.com/pre-commit/pre-commit-hooks

docs/add_model.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Scenario: you want to use a new model with ``model_catalogs``. How should you go
1414
Make a new catalog for it
1515
-------------------------
1616

17-
One Intake catalog file should represent a single model domain whose sources all provide access to model output from running on the same grid. Take a look at the top of an existing catalog file to see what catalog-level metadata is set to know what should be consistent between sources in a catalog file. If the horizontal grid is different (e.g., a subset of a model domain), that should be a different catalog file. If the vertical grid is different (e.g., the output is only at the surface of a 3D model), that should be a different catalog file.
17+
One Intake catalog file should represent a single model domain whose sources all provide access to model output from running on the same grid. Take a look at the top of an existing catalog file to see what catalog-level metadata is set to know what should be consistent between sources in a catalog file. If the horizontal grid is different (e.g., a subset of a model domain), that should be a different catalog file. If the vertical grid is different (e.g., the output is only at the surface of a 3D model), that should be a different catalog file.
1818

1919
What should your catalog file include?
2020
**************************************
@@ -61,8 +61,8 @@ Freshness
6161

6262
The "freshness" parameters, which determine how much time can pass before different actions must be rerun, now have defaults (set in the `__init__` file) for each of the five actions that have freshness parameters associated with them. Possible parameters are:
6363

64-
* start
65-
* end
64+
* start
65+
* end
6666
* catrefs
6767
* file_locs
6868
* compiled

docs/catalog_modes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ See all available installed catalogs with:
3434
list(intake.cat)
3535
```
3636

37-
Installed catalogs to be used with `model_catalogs` should have a specified prefix ending with an underscore so they can be easily selected from the default catalog. `model_catalogs` required the installation of [`mc-goods`](https://github.com/axiom-data-science/mc-goods), a package of model catalogs, which have the prefix "mc_".
37+
Installed catalogs to be used with `model_catalogs` should have a specified prefix ending with an underscore so they can be easily selected from the default catalog. `model_catalogs` required the installation of [`mc-goods`](https://github.com/axiom-data-science/mc-goods), a package of model catalogs, which have the prefix "mc_".
3838

3939
+++
4040

@@ -113,7 +113,7 @@ with TemporaryDirectory() as tmpdirname:
113113
- Time
114114
"""
115115
fp = open(fname, 'w')
116-
fp.write(catalog_text)
116+
fp.write(catalog_text)
117117
fp.close()
118118
119119
main_cat = mc.setup(str(fname), override=True)

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ To install from conda-forge:
2929
aggregations
3030
whats_new
3131
api
32-
32+
3333
.. toctree::
3434
:maxdepth: 2
3535
:hidden:

docs/update_boundaries.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Boundaries files will be searched for automatically when `mc.setup()` is run. If the command has previously been run with the requested catalog files, then the boundaries files should already exist. If new catalog files are being used, then boundaries files will be calculated as each catalog file is handled.
44

5-
Boundaries files are saved to `mc.FILE_PATH_BOUNDARIES(catalog_name)` where the `catalog_name` is determined at the top of the catalog file itself under "name".
5+
Boundaries files are saved to `mc.FILE_PATH_BOUNDARIES(catalog_name)` where the `catalog_name` is determined at the top of the catalog file itself under "name".
66

77
If you want to calculate the boundaries separately from the call to `mc.setup()`, you can do so with
88

docs/whats_new.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,16 @@ v0.6.0 (unreleased)
88
* The "known" GOODS model catalog yaml files are no longer distributed with ``model_catalogs`` itself in order to enforce more separation between the catalog files themselves and this code. However, the package of catalogs is currently a requirement of ``model_catalogs`` and can be found at `mc-goods <https://github.com/axiom-data-science/mc-goods>`_. Note that catalog names that had names like `CBOFS-RGRID` are now called `CBOFS_RGRID` with underscores instead of hyphens. This was a necessary change for setting up the models in their own packages with entry points.
99
* Enforcing single threading in ``model_catalogs`` to avoid issue when using ``xr.open_mfdataset`` (which is used with `noagg` sources) in which the first time you read something in you hit an error but the second time it works. For more information check this `xarray issue <https://github.com/pydata/xarray/issues/7079>`_ or this `netcdf issue <https://github.com/Unidata/netcdf4-python/issues/1192>`_.
1010
* User can work with a local catalog file now! See :doc:`here <catalog_modes>` for details.
11-
11+
1212
* boundaries are optionally calculated when using `mc.open_catalog()`.
1313
* boundaries are calculated the first time a catalog file is worked with through `mc.setup()`
14-
14+
1515
* Removed requirement for `filetype` to be in catalog if sources in catalog do not need to be aggregated.
1616
* LSOFS and LOOFS have new FVCOM versions. So, there are new versions of the model files:
17-
17+
1818
* `lsofs.yaml` and `loofs.yaml` are still the legacy POM version of the models but no longer have source `coops-forecast-noagg`, and their metadata have been updated to reflect the end dates of the model sources.
1919
* new catalog files `lsofs-fvcom.yaml` and `loofs-fvcom.yaml` have source `coops-forecast-noagg` that points to the new FVCOM version of the models.
20-
20+
2121
* If user requests time range that is not available for a source, it will now error instead of warn.
2222
* Bug fixed in `find_availability` so that when a source that does not have a catloc entry is checked, the Dataset is read in without extra processing and checks (including limiting the time range which otherwise would impact checking the time availability).
2323

model_catalogs/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,13 @@
3939
except PackageNotFoundError:
4040
# package is not installed
4141
__version__ = "unknown"
42-
42+
4343
# this forces single threading which avoids an issue described here:
4444
# https://github.com/pydata/xarray/issues/7079
4545
# https://github.com/Unidata/netcdf4-python/issues/1192
4646
import dask
47+
48+
4749
dask.config.set(scheduler="single-threaded")
4850

4951
# set up known locations for catalogs.

model_catalogs/model_catalogs.py

Lines changed: 51 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
import warnings
66

77
from datetime import datetime
8-
from pathlib import Path
8+
from pathlib import Path, PurePath
99

1010
import cf_xarray # noqa
1111
import intake
@@ -19,7 +19,6 @@
1919
from intake.catalog import Catalog
2020
from intake.catalog.local import LocalCatalogEntry
2121
from intake_xarray.opendap import OpenDapSource
22-
from pathlib import PurePath
2322

2423
import model_catalogs as mc
2524

@@ -125,12 +124,18 @@ def make_catalog(
125124
return cat
126125

127126

128-
def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, boundaries=False,
129-
save_boundaries=False):
127+
def open_catalog(
128+
cat_loc,
129+
return_cat=True,
130+
save_catalog=False,
131+
override=False,
132+
boundaries=False,
133+
save_boundaries=False,
134+
):
130135
"""Open an intake catalog file and set up code to apply processing/transform.
131-
136+
132137
Optionally calculate the boundaries of the model represented in cat_log.
133-
138+
134139
Note that saved boundaries files will be saved under the name inside the catalog, not the name of the file if you input a catalog path.
135140
136141
Parameters
@@ -148,7 +153,7 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
148153
save_boundaries : bool, optional
149154
Defaults to False, and saves to mc.FILE_PATH_BOUNDARIES(model).
150155
"""
151-
156+
152157
if isinstance(cat_loc, Catalog):
153158
cat_orig = cat_loc
154159
else:
@@ -162,8 +167,10 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
162167
with open(mc.FILE_PATH_BOUNDARIES(cat_orig.name.lower()), "r") as stream:
163168
boundary = yaml.safe_load(stream)
164169
else:
165-
boundary = mc.calculate_boundaries(cat_orig, save_files=save_boundaries, return_boundaries=True)[cat_orig.name]
166-
170+
boundary = mc.calculate_boundaries(
171+
cat_orig, save_files=save_boundaries, return_boundaries=True
172+
)[cat_orig.name]
173+
167174
# add to cat_orig metadata
168175
cat_orig.metadata["bounding_box"] = boundary["bbox"]
169176
cat_orig.metadata["geospatial_bounds"] = boundary["wkt"]
@@ -172,10 +179,9 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
172179
# original file but applies metadata from original catalog file
173180
# to the resulting dataset after calling `to_dask()`
174181
source_transforms = [
175-
mc.transform_source(cat_orig[model_source])
176-
for model_source in list(cat_orig)
182+
mc.transform_source(cat_orig[model_source]) for model_source in list(cat_orig)
177183
]
178-
184+
179185
metadata = cat_orig.metadata
180186
metadata.update({"cat_path": cat_orig.path})
181187

@@ -191,29 +197,29 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
191197
save_catalog=save_catalog,
192198
return_cat=True,
193199
)
194-
200+
195201
if return_cat:
196202
return cat
197203

198204

199205
def setup(locs="mc_", override=False):
200206
"""Setup reference catalogs for models.
201-
207+
202208
Loops over catalogs that have been previously installed as data packages to intake that start with the string(s) in locs. The default is to read in the required GOODS model catalogs which are prefixed with `"mc_"`. Alternatively, one or more local catalog files can be input as strings or Paths.
203-
209+
204210
This function calls ``open_catalog`` which reads in previously-saved model boundary information (or calculates it if not available) and saves temporary catalog files for each model (called "compiled"), then this function links those together into the returned main catalog. For some models, reading in the original catalogs applies a "today" and/or "yesterday" date Intake user parameter that supplies two example model files that can be used for examining the model output for the example times. Those are rerun each time this function is rerun, filling the parameters using the proper dates.
205-
211+
206212
Note that saved compiled catalog files will be saved under the name inside the catalog, not the name of the file if you input a catalog path.
207213
208214
Parameters
209215
----------
210216
locs : str, Path, list
211217
This can be:
212-
218+
213219
* a string or Path describing where a Catalog file is located
214220
* a string of the prefix for selecting catalogs from the default intake catalog, ``intake.cat``. It is expected to be of the form "PREFIX_CATALOGNAME" with an underscore at the end followed by the catalog name, and there could be many catalogs with that `"PREFIX_"` set up.
215221
* a list of a combination of the previous options.
216-
222+
217223
override : boolean, optional
218224
Use `override=True` to compile the catalog files together regardless of freshness.
219225
@@ -236,38 +242,40 @@ def setup(locs="mc_", override=False):
236242
Examine the model_sources for a specific model in the catalog:
237243
238244
>>> list(main_cat['CBOFS'])
239-
245+
240246
Separate from ``model_catalogs`` you can check the default Intake catalog with:
241-
247+
242248
>>> list(intake.cat)
243249
"""
244-
250+
245251
locs = mc.astype(locs, list)
246-
252+
247253
# arrange inputs into list of known Catalog instances and Paths to catalogs
248254
initial_cats = []
249255
for loc in locs:
250-
256+
251257
# initial_cats is a list of Catalogs in this case
252-
cats = [intake.cat[cat_name] for cat_name in list(intake.cat) if loc in cat_name]
258+
cats = [
259+
intake.cat[cat_name] for cat_name in list(intake.cat) if loc in cat_name
260+
]
253261

254262
# remove the prefix from the catalog name
255263
for cat in cats:
256264
cat.name = cat.name.lstrip(loc)
257-
265+
258266
# check for if loc is instead a path to a catalog
259267
if len(cats) == 0:
260268
# initial_cats is a list of one Path in this case
261269
# cats = [PurePath(loc)]
262270
# initial_cats is a list of one Catalog in this case
263271
cats = [intake.open_catalog(loc)]
264-
272+
265273
# now cats is a list of Catalog(s)
266274
initial_cats.extend(cats)
267275

268276
cat_transform_locs = []
269277
for cat in list(initial_cats):
270-
278+
271279
# if isinstance(cat, PurePath):
272280
# name = cat.stem
273281
# elif isinstance(cat, Catalog):
@@ -277,7 +285,14 @@ def setup(locs="mc_", override=False):
277285
# existing file or if is not fresh
278286
if override or not mc.is_fresh(mc.FILE_PATH_COMPILED(name)):
279287
# override for open_catalog is about calculating boundaries
280-
open_catalog(cat, return_cat=False, save_catalog=True, boundaries=True, save_boundaries=True, override=False)
288+
open_catalog(
289+
cat,
290+
return_cat=False,
291+
save_catalog=True,
292+
boundaries=True,
293+
save_boundaries=True,
294+
override=False,
295+
)
281296
cat_transform_locs.append(mc.FILE_PATH_COMPILED(name))
282297

283298
# have to read these from disk in order to make them type
@@ -342,11 +357,13 @@ def find_datetimes(source, find_start_datetime, find_end_datetime, override=Fals
342357

343358
# for when we need to aggregate which is for model_source: ncei-archive-noagg and coops-forecast-noagg
344359
else:
345-
360+
346361
if "filetype" not in source.cat.metadata:
347-
raise KeyError("If your model requires aggregation, it also requires `filetype` in the catalog-level metadata.")
362+
raise KeyError(
363+
"If your model requires aggregation, it also requires `filetype` in the catalog-level metadata."
364+
)
348365
else:
349-
filetype = source.cat.metadata["filetype"]
366+
filetype = source.cat.metadata["filetype"]
350367

351368
if not override and mc.is_fresh(
352369
mc.FILE_PATH_CATREFS(source.cat.name, source.name), source
@@ -456,7 +473,9 @@ def find_availability_source(source, override=False):
456473
else:
457474
find_start_datetime = True # need to still find the start_datetime
458475

459-
if not override and mc.is_fresh(mc.FILE_PATH_END(source.cat.name, source.name), source):
476+
if not override and mc.is_fresh(
477+
mc.FILE_PATH_END(source.cat.name, source.name), source
478+
):
460479
with open(mc.FILE_PATH_END(source.cat.name, source.name), "r") as stream:
461480
end_datetime = yaml.safe_load(stream)["end_datetime"]
462481
find_end_datetime = False

model_catalogs/process.py

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ def status(self):
6666
"""
6767

6868
if not hasattr(self, "_status"):
69-
69+
7070
if self.target.describe()["driver"][0] == "opendap":
7171
suffix = ".das"
7272
else:
@@ -200,7 +200,11 @@ def to_dask(self):
200200
)
201201

202202
# Alert if triangularmesh engine is required (from FVCOM) but not present
203-
if self.target.describe()["driver"][0] == "opendap" and self.target.engine == "triangularmesh_netcdf" and not EM_AVAILABLE:
203+
if (
204+
self.target.describe()["driver"][0] == "opendap"
205+
and self.target.engine == "triangularmesh_netcdf"
206+
and not EM_AVAILABLE
207+
):
204208
raise ModuleNotFoundError( # pragma: no cover
205209
"`extract_model` is not available but contains the 'triangularmesh_netcdf' engine that is required for a model."
206210
)
@@ -326,20 +330,31 @@ def add_attributes(ds, metadata: Optional[dict] = None):
326330
var_names = mc.astype(var_names, list)
327331
for var_name in var_names:
328332

329-
# Check dims, coords, and data_vars:
330-
if var_name in ds.dims or var_name in ds.data_vars.keys() or var_name in ds.coords:
333+
# Check dims, coords, and data_vars:
334+
if (
335+
var_name in ds.dims
336+
or var_name in ds.data_vars.keys()
337+
or var_name in ds.coords
338+
):
331339
# var_name needs to be a coord to have attributes
332340
if var_name not in ds.coords:
333-
ds = ds.assign_coords({var_name: (var_name,np.arange(ds[var_name].size), {"axis": ax_name},)})
341+
ds = ds.assign_coords(
342+
{
343+
var_name: (
344+
var_name,
345+
np.arange(ds[var_name].size),
346+
{"axis": ax_name},
347+
)
348+
}
349+
)
334350
else:
335351
ds[var_name].attrs["axis"] = ax_name
336-
352+
337353
else:
338354
warnings.warn(
339355
f"The variable {var_name} input in a catalog file is not present in the Dataset.",
340356
UserWarning,
341357
)
342-
343358

344359
# this won't run for e.g. GFS which has multiple time variables
345360
# but also doesn't need to have the calendar updated

0 commit comments

Comments
 (0)