As discussed in #158, there is a problem that we are slowly approaching: CMIP6 data will not follow a uniform data_specs_version. This matters because it means that for example data from different models will have different standard_names. For example the two already published datasets
{dataset: GISS-E2-1-G, exp: historical, ensemble: r1i1p1f1, mip: Amon, short_name: psl, grid: gn}
{dataset: GFDL-CM4, exp: historical, ensemble: r1i1p1f1, mip: Amon, short_name: psl, grid: gr1}
use standard_name air_pressure_at_sea_level and air_pressure_at_mean_sea_level, respectively.
Both of them do so following the data_specs_version that they give in the file, namely 01.00.23 and 01.00.27. But how do you compare this?
The oldest data_specs_version I have seen in CMIP6 data is 01.00.23, but there is no guarantee that there isn't and older one and I have not checked what has changed since then.
I am not sure what, if anything, to do about it right now, but at some point we will get this kind of error.
Broadly speaking there seem to be three approaches:
- Do nothing
- Support multiple data spec versions
- Choose one data spec version and provide "fixes" to align other data with that.
@jvegasbsc already commented in #158 in support of the third option, saying
Choose one data spec version and provide "fixes" to align other data with that.
I prefer this, with one condition: we choose last version released and keep updating. We should assume that changes are done for a reason. Most variables will be the same in all versions anyway, so I think the number of fixes will be manageable. We can add fixes at PROJECT level that are applied after MODEL fixes to convert data to the latest version of the request.
In your example, we can add a PROJECT fix that changes the standard_name for psl to the new one automatically if it finds the old one.
What do other people think?
As discussed in #158, there is a problem that we are slowly approaching: CMIP6 data will not follow a uniform
data_specs_version. This matters because it means that for example data from different models will have differentstandard_names. For example the two already published datasetsuse
standard_nameair_pressure_at_sea_levelandair_pressure_at_mean_sea_level, respectively.Both of them do so following the
data_specs_versionthat they give in the file, namely01.00.23and01.00.27. But how do you compare this?The oldest
data_specs_versionI have seen in CMIP6 data is01.00.23, but there is no guarantee that there isn't and older one and I have not checked what has changed since then.I am not sure what, if anything, to do about it right now, but at some point we will get this kind of error.
Broadly speaking there seem to be three approaches:
@jvegasbsc already commented in #158 in support of the third option, saying
What do other people think?