Skip to content

CMOR check generic level coordinates in CMIP6#598

Merged
valeriupredoi merged 18 commits into
masterfrom
dev_check_generic_levels
Nov 16, 2020
Merged

CMOR check generic level coordinates in CMIP6#598
valeriupredoi merged 18 commits into
masterfrom
dev_check_generic_levels

Conversation

@sloosvel

@sloosvel sloosvel commented Mar 31, 2020

Copy link
Copy Markdown
Contributor

For each generic level, all coordinates that have the same generic_level_name get loaded when reading the tables under a new CoordinateInfo attribute. The checker checks if the out_name and standard_name match any of the possible coordinates, and if so proceeds with the checks with the right coordinate.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


Closes #516

@sloosvel sloosvel added the cmor Related to the CMOR standard label Mar 31, 2020
@sloosvel sloosvel requested a review from jvegreg March 31, 2020 09:53
@sloosvel sloosvel marked this pull request as ready for review May 19, 2020 09:37
@rswamina

rswamina commented Jun 4, 2020

Copy link
Copy Markdown

@sloosvel - Am I correct in understanding that these fixes will fix the problem of variables having different aliases in standard names (e.g. mrsos has moisture_content_of_soil_layer
or mass_content_of_water_in_soil_layer) as well as the problem that some models like IPSL have where ocean levels is not "plev" or "lev" but "olev"? I will be interested in knowing when this branch will be merged.

@sloosvel

sloosvel commented Jun 5, 2020

Copy link
Copy Markdown
Contributor Author

fix the problem of variables having different aliases in standard names (e.g. mrsos has moisture_content_of_soil_layer or mass_content_of_water_in_soil_layer)

Not really. This pull request does not affect or modify the checking of variable's standard names, it just adds a check for generic levels in CMIP6.

the problem that some models like IPSL have where ocean levels is not "plev" or "lev" but "olev"

Thanks for the reference, I checked the dataset and there are so many wrong things with it. It seems like it's not reporting the error because I assumed that the coordinate would have at least a standard_name, and it that case it does not even have that. I will need to rework the code a bit to make it work.

@rswamina

rswamina commented Jun 5, 2020

Copy link
Copy Markdown

@jvegasbsc and @sloosvel - I followed the thread of an original ESMValTool issue regarding variable alias management to this one, via issue ESMValGroup/ESMValTool#950 . This is with reference to the soil moisture variable mrsos having two aliases for short name - "moisture_content_of_soil_layer" and "mass_content_of_water_in_soil_layer". The UKESM and MRI model data for mrsos are two examples where the latter alias causes the esmvaltool preprocessor to fail. Will these be fixed ?

@jvegreg jvegreg added this to the v2.0.0 milestone Jun 8, 2020
@sloosvel

Copy link
Copy Markdown
Contributor Author

I think it's a mistake on the reference, this pull request has nothing to do with the issue with the aliases. Maybe you are looking for #595 ?

Comment thread esmvalcore/cmor/check.py Outdated
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
@bouweandela

Copy link
Copy Markdown
Member

@jvegasbsc @mattiarighi Today is the deadline for anything that is included in the first version 2 release. Would you have time to review/test this today? If not, this can go into the next release.

Javier Vegas-Regidor added 2 commits July 1, 2020 13:20

@jvegreg jvegreg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tweaked a for loop.

Nice work, @sloosvel

@bouweandela

Copy link
Copy Markdown
Member

I just tried to test this and got the following error message:

thetao: does not match coordinate rank

Is there indeed an error in this dataset, or is the check too strict?
ncdump of the input data
recipe_python.yml

@sloosvel

sloosvel commented Jul 2, 2020

Copy link
Copy Markdown
Contributor Author

I guess this is because of this attribute?

thetao:coordinates = "lat lon" ;. It's missing the depth.

@bouweandela bouweandela modified the milestones: v2.0.0, v2.1.0 Jul 2, 2020
@jvegreg

jvegreg commented Jul 2, 2020

Copy link
Copy Markdown
Contributor

I guess this is because of this attribute?

thetao:coordinates = "lat lon" ;. It's missing the depth.

It is ok, lev is a dimension, so you don't need to specify it in the coordinates

@jvegreg

jvegreg commented Jul 2, 2020

Copy link
Copy Markdown
Contributor

@sloosvel , take a look at check_rank:

def _check_rank(self):

Maybe var_info.coords contains all possible generic levels, making it expect too much dimensions?

@schlunma

schlunma commented Oct 5, 2020

Copy link
Copy Markdown
Contributor

Because in master the level coordinate is not being checked because of:

           # Cannot check generic_level coords as no CMOR information
            if coordinate.generic_level:
                continue

The point of this pull request is to start checking generic level coordinates. Which probably means more errors will be reported, considering how "diverse" datasets are.

Ahh, alright, that makes totally sense. Sorry for the misunderstanding. So I guess this PR is working.

However, I'm not sure if we should merge this now without providing fixes to the corresponding datasets, since this PR will introduce new errors that were not present in the old master. However, I do not know how to fix these issues (i.e. how to reasonably project the coordinate values outside of [0,1] into [0,1]).

@schlunma

schlunma commented Oct 5, 2020

Copy link
Copy Markdown
Contributor

Found 3 more errors (for the "full" time range 1980-2014 but also for 1980-1980):

CESM2

esmvalcore.cmor.check.CMORCheckError: There were errors in variable cl:
lev: units should be 1, not hPa
 lev: has values < valid_min = 0.0

CESM2-WACCM

esmvalcore.cmor.check.CMORCheckError: There were errors in variable cl:
lev: units should be m, not hPa
 lev: has values < valid_min = 0.0

CESM2-FV2

esmvalcore.cmor.check.CMORCheckError: There were errors in variable cl:
lev: units should be 1, not hPa
 lev: has values < valid_min = 0.0

@bouweandela

Copy link
Copy Markdown
Member

However, I'm not sure if we should merge this now without providing fixes to the corresponding datasets

Good point. Let's wait with merging this until the feature freeze (@valeriupredoi and I will do that around 3 PM CEST today), if everything is working as expected here we can get this merged after and have an opportunity to implement the required fixes in the next few months, before the next release.

@sloosvel

sloosvel commented Oct 5, 2020

Copy link
Copy Markdown
Contributor Author

I am bit worried because esmvaltool run ~/recipe_generic_levels.yml --max-years 2 --skip-nonexistent with the complete example that you send HadGEM fails. If I comment all the others, it goes thorugh, both with --max-years and without

I think there is an ambiguity problem with this particular variable. HadGem or UKESM have the alevel coordinate set to atmosphere_hybrid_height_coordinate, whereas dataset BCC-ESM1 has it set to atmosphere_hybrid_sigma_pressure_coordinate.
And those two coordinates have different standards regarding units, valid_min and max, etc. There is no way to know beforehand if the alevel is one or the other, because both are valid standard names for alevel.

If the recipe only contains datasets that share the standard name, it will pass the checks without problems. For HadGem and UKESM the level coordinate for cl will be assumed to be atmosphere_hybrid_height_coordinate, and since that is the case for both datasets, the recipe will finish succesfully.

If you run a recipe with BCC and HadGem (different standard names), the checker will assume that the right coordinate is the one from the dataset that checks first. So if BCC is checked first, HadGem will fail because the alevel coordinate will checked against to atmosphere_hybrid_sigma_pressure_coordinate.

So what should be done in those cases? Overwrite the cmor information for every dataset?

@jvegreg

jvegreg commented Oct 5, 2020

Copy link
Copy Markdown
Contributor

So what should be done in those cases? Overwrite the cmor information for every dataset?

Yes, the check should be independent for each dataset.

@sloosvel

sloosvel commented Oct 5, 2020

Copy link
Copy Markdown
Contributor Author

With the latest changes, I can run UKESM and HadGEM without failing anymore when called with other models that have a different standard name for the lev variable .

CESM2 is not passing the checker, but that is expected with the new checks because there is not any alevel coordinate with units of 'hPa' and the valid_min is '0.0' for all alevel . Other CESM datasets I do not have access to, but I would expect them to make the checker fail because of the same reason as CESM2. So fixes for those datasets should be added.

I also don't have access to CAMS-CSM1-0, CAS-ESM2-0, CMCC-CM2-SR5, E3SM-1-0, FGOALS-g3, GISS-E2-1-H, MIROC-ES2L, MPI-ESM-1-2-HAM, MPI-ESM1-2-LR, NESM3, NorESM2-LM, NorESM2-MM, SAM0-UNICON and TaiESM1. But I will try to test them tomorrow.

@valeriupredoi

Copy link
Copy Markdown
Contributor

picking up from #859 for CESM2/thetao I am getting this:

esmvalcore.cmor.check.CMORCheckError: There were errors in variable thetao:
Coordinate olevel has wrong standard_name or is not set.
in cube:
sea_water_potential_temperature / (degC) (time: 1980; lev: 60; cell index along second dimension: 384; cell index along first dimension: 320)

loading the input data off ESGF:

c = iris.load_cube("/badc/cmip6/data/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/thetao/gn/latest/thetao_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc")

indeed for lev thse standard_name is None - good work @sloosvel - can we merge this now or are we waiting for anything else? Note that I ran it through various check levels and they behave as expected 👍 Also ran recipe_python.yml and another recipe with UKESM1-0-LL - no problemos! 🍺

@valeriupredoi valeriupredoi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge this, it's a very useful feature 👍

@sloosvel

Copy link
Copy Markdown
Contributor Author

The idea was to add fixes before merging I think. Because some of the coordinates like in #840 have wrong values and it has gone undetected. But in some cases it's not that straightforward to fix.

@valeriupredoi

Copy link
Copy Markdown
Contributor

no no - it is beyond the scope of this PR to add fixes, plus, it'd bloat it out of proportion - this is a checker enhancement, if you can add an automated fix in _fix.py then yeah but it's not that straightforward; e.g. I will add a fix for CESM2 as per #859 but only after this gets merged

@sloosvel

Copy link
Copy Markdown
Contributor Author

Then I guess it can be merged? I think that the problem with mixing datasets above got solved as well.

@valeriupredoi

Copy link
Copy Markdown
Contributor

yep - @bouweandela and @jvegasbsc as reviewers you OK with merge? @schlunma as well, since you've noticed that issue?

@jvegreg

jvegreg commented Nov 16, 2020

Copy link
Copy Markdown
Contributor

Yes, go ahead

@valeriupredoi valeriupredoi merged commit a6f35b0 into master Nov 16, 2020
@valeriupredoi valeriupredoi deleted the dev_check_generic_levels branch November 16, 2020 13:24
@valeriupredoi

Copy link
Copy Markdown
Contributor

done, cheers Saskia and Javi! Manu man, if you still see that issue, forget about it - just kidding, open an Issue pls, bro 🍺

@schlunma

schlunma commented Nov 16, 2020

Copy link
Copy Markdown
Contributor

I think we should have waited with the merge...That PR broke at least one recipe (https://github.com/ESMValGroup/ESMValTool/blob/master/esmvaltool/recipes/recipe_ecs_constraints.yml) and maybe others that we are not aware of.

See ESMValGroup/ESMValTool#1903

@valeriupredoi

Copy link
Copy Markdown
Contributor

@schlunma this is a chicken-egg situation - we need to fry the chicken now ie fix the data that gets into those recipes because inherently those recipes may be wrong since the data that gets in and analyzed may be wrong. Cheers for opening the issue! 🍺 But an enhanced functionality in Core should not wait on the fix(es) needed in Tool 👍

@schlunma

Copy link
Copy Markdown
Contributor

I still have this weird issue that (I think) depends on the order of which the datasets are read:

The following recipe sometimes works, sometimes fails with

esmvalcore.cmor.check.CMORCheckError: There were errors in variable cl:
lev: units should be 1, not m
 lev: has values > valid_max = 1.0

for HadGEM3-GC31-MM and sometimes with the same error for ACCESS-ESM1-5.

Each of theses dataset does NOT throw an error when used alone.

What is going on here???

The recipe:

# ESMValTool
---
documentation:
  authors:
    - schlund_manuel

  maintainer:
    - schlund_manuel

  references:
    - gregory04grl

  projects:
    - crescendo


diagnostics:

  diag_test:
    variables:
      cl:
        project: CMIP6
        exp: historical
        mip: Amon
        start_year: 2004
        end_year: 2004
    additional_datasets:
      - {dataset: ACCESS-ESM1-5,   ensemble: r1i1p1f1, grid: gn}
      - {dataset: GISS-E2-1-G,     ensemble: r1i1p1f1, grid: gn}
      - {dataset: GISS-E2-1-H,     ensemble: r1i1p1f1, grid: gn}
      - {dataset: HadGEM3-GC31-LL, ensemble: r1i1p1f3, grid: gn}
      - {dataset: HadGEM3-GC31-MM, ensemble: r1i1p1f3, grid: gn}
    scripts:
      null

@bouweandela

Copy link
Copy Markdown
Member

@schlunma To create more visibility for this, it might be good to create an issue for that instead of commenting on closed pull request.

@schlunma

Copy link
Copy Markdown
Contributor

Makes sense, see #883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cmor Related to the CMOR standard

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CMOR checker does not check CMIP6 generic levels

6 participants