Skip to content

Deal with different data_specs_versions in CMIP6 #159

Description

@zklaus

As discussed in #158, there is a problem that we are slowly approaching: CMIP6 data will not follow a uniform data_specs_version. This matters because it means that for example data from different models will have different standard_names. For example the two already published datasets

{dataset: GISS-E2-1-G, exp: historical, ensemble: r1i1p1f1, mip: Amon, short_name: psl, grid: gn}
{dataset: GFDL-CM4, exp: historical, ensemble: r1i1p1f1, mip: Amon, short_name: psl, grid: gr1}

use standard_name air_pressure_at_sea_level and air_pressure_at_mean_sea_level, respectively.

Both of them do so following the data_specs_version that they give in the file, namely 01.00.23 and 01.00.27. But how do you compare this?

The oldest data_specs_version I have seen in CMIP6 data is 01.00.23, but there is no guarantee that there isn't and older one and I have not checked what has changed since then.

I am not sure what, if anything, to do about it right now, but at some point we will get this kind of error.
Broadly speaking there seem to be three approaches:

  • Do nothing
  • Support multiple data spec versions
  • Choose one data spec version and provide "fixes" to align other data with that.

@jvegasbsc already commented in #158 in support of the third option, saying

Choose one data spec version and provide "fixes" to align other data with that.

I prefer this, with one condition: we choose last version released and keep updating. We should assume that changes are done for a reason. Most variables will be the same in all versions anyway, so I think the number of fixes will be manageable. We can add fixes at PROJECT level that are applied after MODEL fixes to convert data to the latest version of the request.

In your example, we can add a PROJECT fix that changes the standard_name for psl to the new one automatically if it finds the old one.

What do other people think?

Metadata

Metadata

Labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions