precommit

kthyng · kthyng · commit 132d4173f6b8 · 2023-02-10T17:07:31.000-06:00
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -4,7 +4,7 @@ repos:
   rev: 1.5.0
   hooks:
     - id: interrogate
-      exclude: ^(setup.py|tests)
+      exclude: ^(setup.py|model_catalogs/tests)
       args: [--config=pyproject.toml]
 
 - repo: https://github.com/pre-commit/pre-commit-hooks
diff --git a/docs/add_model.rst b/docs/add_model.rst
@@ -14,7 +14,7 @@ Scenario: you want to use a new model with ``model_catalogs``. How should you go
 Make a new catalog for it
 -------------------------
 
-One Intake catalog file should represent a single model domain whose sources all provide access to model output from running on the same grid. Take a look at the top of an existing catalog file to see what catalog-level metadata is set to know what should be consistent between sources in a catalog file. If the horizontal grid is different (e.g., a subset of a model domain), that should be a different catalog file. If the vertical grid is different (e.g., the output is only at the surface of a 3D model), that should be a different catalog file. 
+One Intake catalog file should represent a single model domain whose sources all provide access to model output from running on the same grid. Take a look at the top of an existing catalog file to see what catalog-level metadata is set to know what should be consistent between sources in a catalog file. If the horizontal grid is different (e.g., a subset of a model domain), that should be a different catalog file. If the vertical grid is different (e.g., the output is only at the surface of a 3D model), that should be a different catalog file.
 
 What should your catalog file include?
 **************************************
@@ -61,8 +61,8 @@ Freshness
 
 The "freshness" parameters, which determine how much time can pass before different actions must be rerun, now have defaults (set in the `__init__` file) for each of the five actions that have freshness parameters associated with them. Possible parameters are:
 
-* start 
-* end 
+* start
+* end
 * catrefs
 * file_locs
 * compiled
diff --git a/docs/catalog_modes.md b/docs/catalog_modes.md
@@ -34,7 +34,7 @@ See all available installed catalogs with:
 list(intake.cat)
 ```
 
-Installed catalogs to be used with `model_catalogs` should have a specified prefix ending with an underscore so they can be easily selected from the default catalog. `model_catalogs` required the installation of [`mc-goods`](https://github.com/axiom-data-science/mc-goods), a package of model catalogs, which have the prefix "mc_". 
+Installed catalogs to be used with `model_catalogs` should have a specified prefix ending with an underscore so they can be easily selected from the default catalog. `model_catalogs` required the installation of [`mc-goods`](https://github.com/axiom-data-science/mc-goods), a package of model catalogs, which have the prefix "mc_".
 
 +++
 
@@ -113,7 +113,7 @@ with TemporaryDirectory() as tmpdirname:
                     - Time
     """
     fp = open(fname, 'w')
-    fp.write(catalog_text) 
+    fp.write(catalog_text)
     fp.close()
 
     main_cat = mc.setup(str(fname), override=True)
diff --git a/docs/index.rst b/docs/index.rst
@@ -29,7 +29,7 @@ To install from conda-forge:
    aggregations
    whats_new
    api
- 
+
 .. toctree::
    :maxdepth: 2
    :hidden:
diff --git a/docs/update_boundaries.md b/docs/update_boundaries.md
@@ -2,7 +2,7 @@
 
 Boundaries files will be searched for automatically when `mc.setup()` is run. If the command has previously been run with the requested catalog files, then the boundaries files should already exist. If new catalog files are being used, then boundaries files will be calculated as each catalog file is handled.
 
-Boundaries files are saved to `mc.FILE_PATH_BOUNDARIES(catalog_name)` where the `catalog_name` is determined at the top of the catalog file itself under "name". 
+Boundaries files are saved to `mc.FILE_PATH_BOUNDARIES(catalog_name)` where the `catalog_name` is determined at the top of the catalog file itself under "name".
 
 If you want to calculate the boundaries separately from the call to `mc.setup()`, you can do so with
 
diff --git a/docs/whats_new.rst b/docs/whats_new.rst
@@ -8,16 +8,16 @@ v0.6.0 (unreleased)
 * The "known" GOODS model catalog yaml files are no longer distributed with ``model_catalogs`` itself in order to enforce more separation between the catalog files themselves and this code. However, the package of catalogs is currently a requirement of ``model_catalogs`` and can be found at `mc-goods <https://github.com/axiom-data-science/mc-goods>`_. Note that catalog names that had names like `CBOFS-RGRID` are now called `CBOFS_RGRID` with underscores instead of hyphens. This was a necessary change for setting up the models in their own packages with entry points.
 * Enforcing single threading in ``model_catalogs`` to avoid issue when using ``xr.open_mfdataset`` (which is used with `noagg` sources) in which the first time you read something in you hit an error but the second time it works. For more information check this `xarray issue <https://github.com/pydata/xarray/issues/7079>`_ or this `netcdf issue <https://github.com/Unidata/netcdf4-python/issues/1192>`_.
 * User can work with a local catalog file now! See :doc:`here <catalog_modes>` for details.
-  
+
   * boundaries are optionally calculated when using `mc.open_catalog()`.
   * boundaries are calculated the first time a catalog file is worked with through `mc.setup()`
-  
+
 * Removed requirement for `filetype` to be in catalog if sources in catalog do not need to be aggregated.
 * LSOFS and LOOFS have new FVCOM versions. So, there are new versions of the model files:
-  
+
   * `lsofs.yaml` and `loofs.yaml` are still the legacy POM version of the models but no longer have source `coops-forecast-noagg`, and their metadata have been updated to reflect the end dates of the model sources.
   * new catalog files `lsofs-fvcom.yaml` and `loofs-fvcom.yaml` have source `coops-forecast-noagg` that points to the new FVCOM version of the models.
-  
+
 * If user requests time range that is not available for a source, it will now error instead of warn.
 * Bug fixed in `find_availability` so that when a source that does not have a catloc entry is checked, the Dataset is read in without extra processing and checks (including limiting the time range which otherwise would impact checking the time availability).
 
diff --git a/model_catalogs/__init__.py b/model_catalogs/__init__.py
@@ -39,11 +39,13 @@
 except PackageNotFoundError:
     # package is not installed
     __version__ = "unknown"
-    
+
 # this forces single threading which avoids an issue described here:
 # https://github.com/pydata/xarray/issues/7079
 # https://github.com/Unidata/netcdf4-python/issues/1192
 import dask
+
+
 dask.config.set(scheduler="single-threaded")
 
 # set up known locations for catalogs.
diff --git a/model_catalogs/model_catalogs.py b/model_catalogs/model_catalogs.py
@@ -5,7 +5,7 @@
 import warnings
 
 from datetime import datetime
-from pathlib import Path
+from pathlib import Path, PurePath
 
 import cf_xarray  # noqa
 import intake
@@ -19,7 +19,6 @@
 from intake.catalog import Catalog
 from intake.catalog.local import LocalCatalogEntry
 from intake_xarray.opendap import OpenDapSource
-from pathlib import PurePath
 
 import model_catalogs as mc
 
@@ -125,12 +124,18 @@ def make_catalog(
         return cat
 
 
-def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, boundaries=False, 
-                 save_boundaries=False):
+def open_catalog(
+    cat_loc,
+    return_cat=True,
+    save_catalog=False,
+    override=False,
+    boundaries=False,
+    save_boundaries=False,
+):
     """Open an intake catalog file and set up code to apply processing/transform.
-    
+
     Optionally calculate the boundaries of the model represented in cat_log.
-    
+
     Note that saved boundaries files will be saved under the name inside the catalog, not the name of the file if you input a catalog path.
 
     Parameters
@@ -148,7 +153,7 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
     save_boundaries : bool, optional
         Defaults to False, and saves to mc.FILE_PATH_BOUNDARIES(model).
     """
-    
+
     if isinstance(cat_loc, Catalog):
         cat_orig = cat_loc
     else:
@@ -162,8 +167,10 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
             with open(mc.FILE_PATH_BOUNDARIES(cat_orig.name.lower()), "r") as stream:
                 boundary = yaml.safe_load(stream)
         else:
-            boundary = mc.calculate_boundaries(cat_orig, save_files=save_boundaries, return_boundaries=True)[cat_orig.name]
-        
+            boundary = mc.calculate_boundaries(
+                cat_orig, save_files=save_boundaries, return_boundaries=True
+            )[cat_orig.name]
+
         # add to cat_orig metadata
         cat_orig.metadata["bounding_box"] = boundary["bbox"]
         cat_orig.metadata["geospatial_bounds"] = boundary["wkt"]
@@ -172,10 +179,9 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
     # original file but applies metadata from original catalog file
     # to the resulting dataset after calling `to_dask()`
     source_transforms = [
-        mc.transform_source(cat_orig[model_source])
-        for model_source in list(cat_orig)
+        mc.transform_source(cat_orig[model_source]) for model_source in list(cat_orig)
     ]
-    
+
     metadata = cat_orig.metadata
     metadata.update({"cat_path": cat_orig.path})
 
@@ -191,29 +197,29 @@ def open_catalog(cat_loc, return_cat=True, save_catalog=False, override=False, b
         save_catalog=save_catalog,
         return_cat=True,
     )
-    
+
     if return_cat:
         return cat
 
 
 def setup(locs="mc_", override=False):
     """Setup reference catalogs for models.
-    
+
     Loops over catalogs that have been previously installed as data packages to intake that start with the string(s) in locs. The default is to read in the required GOODS model catalogs which are prefixed with `"mc_"`. Alternatively, one or more local catalog files can be input as strings or Paths.
-    
+
     This function calls ``open_catalog`` which reads in previously-saved model boundary information (or calculates it if not available) and saves temporary catalog files for each model (called "compiled"), then this function links those together into the returned main catalog. For some models, reading in the original catalogs applies a "today" and/or "yesterday" date Intake user parameter that supplies two example model files that can be used for examining the model output for the example times. Those are rerun each time this function is rerun, filling the parameters using the proper dates.
-    
+
     Note that saved compiled catalog files will be saved under the name inside the catalog, not the name of the file if you input a catalog path.
 
     Parameters
     ----------
     locs : str, Path, list
         This can be:
-        
+
         * a string or Path describing where a Catalog file is located
         * a string of the prefix for selecting catalogs from the default intake catalog, ``intake.cat``. It is expected to be of the form "PREFIX_CATALOGNAME" with an underscore at the end followed by the catalog name, and there could be many catalogs with that `"PREFIX_"` set up.
         * a list of a combination of the previous options.
-        
+
     override : boolean, optional
         Use `override=True` to compile the catalog files together regardless of freshness.
 
@@ -236,38 +242,40 @@ def setup(locs="mc_", override=False):
     Examine the model_sources for a specific model in the catalog:
 
     >>> list(main_cat['CBOFS'])
-    
+
     Separate from ``model_catalogs`` you can check the default Intake catalog with:
-    
+
     >>> list(intake.cat)
     """
-    
+
     locs = mc.astype(locs, list)
-    
+
     # arrange inputs into list of known Catalog instances and Paths to catalogs
     initial_cats = []
     for loc in locs:
-    
+
         # initial_cats is a list of Catalogs in this case
-        cats = [intake.cat[cat_name] for cat_name in list(intake.cat) if loc in cat_name]
+        cats = [
+            intake.cat[cat_name] for cat_name in list(intake.cat) if loc in cat_name
+        ]
 
         # remove the prefix from the catalog name
         for cat in cats:
             cat.name = cat.name.lstrip(loc)
-    
+
         # check for if loc is instead a path to a catalog
         if len(cats) == 0:
             # initial_cats is a list of one Path in this case
             # cats = [PurePath(loc)]
             # initial_cats is a list of one Catalog in this case
             cats = [intake.open_catalog(loc)]
-        
+
         # now cats is a list of Catalog(s)
         initial_cats.extend(cats)
 
     cat_transform_locs = []
     for cat in list(initial_cats):
-        
+
         # if isinstance(cat, PurePath):
         #     name = cat.stem
         # elif isinstance(cat, Catalog):
@@ -277,7 +285,14 @@ def setup(locs="mc_", override=False):
         # existing file or if is not fresh
         if override or not mc.is_fresh(mc.FILE_PATH_COMPILED(name)):
             # override for open_catalog is about calculating boundaries
-            open_catalog(cat, return_cat=False, save_catalog=True, boundaries=True, save_boundaries=True, override=False)
+            open_catalog(
+                cat,
+                return_cat=False,
+                save_catalog=True,
+                boundaries=True,
+                save_boundaries=True,
+                override=False,
+            )
         cat_transform_locs.append(mc.FILE_PATH_COMPILED(name))
 
     # have to read these from disk in order to make them type
@@ -342,11 +357,13 @@ def find_datetimes(source, find_start_datetime, find_end_datetime, override=Fals
 
     # for when we need to aggregate which is for model_source: ncei-archive-noagg and coops-forecast-noagg
     else:
-        
+
         if "filetype" not in source.cat.metadata:
-            raise KeyError("If your model requires aggregation, it also requires `filetype` in the catalog-level metadata.")
+            raise KeyError(
+                "If your model requires aggregation, it also requires `filetype` in the catalog-level metadata."
+            )
         else:
-            filetype = source.cat.metadata["filetype"]            
+            filetype = source.cat.metadata["filetype"]
 
         if not override and mc.is_fresh(
             mc.FILE_PATH_CATREFS(source.cat.name, source.name), source
@@ -456,7 +473,9 @@ def find_availability_source(source, override=False):
         else:
             find_start_datetime = True  # need to still find the start_datetime
 
-        if not override and mc.is_fresh(mc.FILE_PATH_END(source.cat.name, source.name), source):
+        if not override and mc.is_fresh(
+            mc.FILE_PATH_END(source.cat.name, source.name), source
+        ):
             with open(mc.FILE_PATH_END(source.cat.name, source.name), "r") as stream:
                 end_datetime = yaml.safe_load(stream)["end_datetime"]
             find_end_datetime = False
diff --git a/model_catalogs/process.py b/model_catalogs/process.py
@@ -66,7 +66,7 @@ def status(self):
         """
 
         if not hasattr(self, "_status"):
-            
+
             if self.target.describe()["driver"][0] == "opendap":
                 suffix = ".das"
             else:
@@ -200,7 +200,11 @@ def to_dask(self):
                 )
 
             # Alert if triangularmesh engine is required (from FVCOM) but not present
-            if self.target.describe()["driver"][0] == "opendap" and self.target.engine == "triangularmesh_netcdf" and not EM_AVAILABLE:
+            if (
+                self.target.describe()["driver"][0] == "opendap"
+                and self.target.engine == "triangularmesh_netcdf"
+                and not EM_AVAILABLE
+            ):
                 raise ModuleNotFoundError(  # pragma: no cover
                     "`extract_model` is not available but contains the 'triangularmesh_netcdf' engine that is required for a model."
                 )
@@ -326,20 +330,31 @@ def add_attributes(ds, metadata: Optional[dict] = None):
             var_names = mc.astype(var_names, list)
             for var_name in var_names:
 
-                # Check dims, coords, and data_vars: 
-                if var_name in ds.dims or var_name in ds.data_vars.keys() or var_name in ds.coords:
+                # Check dims, coords, and data_vars:
+                if (
+                    var_name in ds.dims
+                    or var_name in ds.data_vars.keys()
+                    or var_name in ds.coords
+                ):
                     # var_name needs to be a coord to have attributes
                     if var_name not in ds.coords:
-                        ds = ds.assign_coords({var_name: (var_name,np.arange(ds[var_name].size), {"axis": ax_name},)})
+                        ds = ds.assign_coords(
+                            {
+                                var_name: (
+                                    var_name,
+                                    np.arange(ds[var_name].size),
+                                    {"axis": ax_name},
+                                )
+                            }
+                        )
                     else:
                         ds[var_name].attrs["axis"] = ax_name
-                        
+
                 else:
                     warnings.warn(
                         f"The variable {var_name} input in a catalog file is not present in the Dataset.",
                         UserWarning,
                     )
-                    
 
     # this won't run for e.g. GFS which has multiple time variables
     # but also doesn't need to have the calendar updated
diff --git a/model_catalogs/tests/test_catalog_inputs.py b/model_catalogs/tests/test_catalog_inputs.py
diff --git a/model_catalogs/tests/test_catalogs.py b/model_catalogs/tests/test_catalogs.py
diff --git a/model_catalogs/utils.py b/model_catalogs/utils.py