-
Notifications
You must be signed in to change notification settings - Fork 48
Dataset fixes for MSWEP #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
c12a02d
Implement fixes for monthly MSWEP data
stefsmeets 725c94f
Use lazy operation for faster loading of data
stefsmeets 8bb4f73
Add fixes for daily data
stefsmeets 380b96d
Fix longitude, bounds, and re-organize code
stefsmeets 44128b1
Add tests for MSWEP data [WIP]
stefsmeets cbca174
Merge branch 'master' into mswep_fixes
stefsmeets 1163f3c
Make data rolling more robust
stefsmeets dd1cab4
Complete tests for daily/monthly data
stefsmeets a604dd5
Add support for `3hr` frequency (same fixes as daily)
stefsmeets 4e54a7f
Fix some codacy issues
stefsmeets cedc5ec
Update tests/integration/cmor/_fixes/native6/test_mswep.py
stefsmeets 270818e
Add documentation for MSWEP support
stefsmeets c4a811c
Tweak text
stefsmeets d909a9a
'Fix' Codacy issues
stefsmeets 9bbca2f
Remove unused function call
stefsmeets File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| """Fixes for MSWEP.""" | ||
| from datetime import datetime | ||
|
|
||
| import cf_units | ||
| import numpy as np | ||
| from cf_units import Unit | ||
|
|
||
| from ..fix import Fix | ||
|
|
||
|
|
||
| def fix_time_month(cube): | ||
| """Fix time coordinates for monthly values. | ||
|
|
||
| Convert from months since 1899-12 to days since 1850 as per CMOR | ||
| standard. | ||
| """ | ||
| time_coord = cube.coord('time') | ||
| origin = time_coord.units.origin | ||
|
|
||
| origin_year, origin_month = [ | ||
| int(val) for val in origin.split()[2].split('-') | ||
| ] | ||
|
|
||
| dates = [] | ||
|
|
||
| for time_point in time_coord.points: | ||
| new_year = origin_year + (origin_month - 1 + time_point) // 12 | ||
| new_month = (origin_month - 1 + time_point) % 12 + 1 | ||
| dates.append(datetime(int(new_year), int(new_month), 15)) | ||
|
|
||
| t_unit = cf_units.Unit("days since 1850-01-01", calendar="standard") | ||
|
|
||
| cube.coord('time').points = t_unit.date2num(dates) | ||
| cube.coord('time').units = t_unit | ||
|
|
||
|
|
||
| def fix_time_day(cube): | ||
| """Fix time coordinates for monthly values. | ||
|
|
||
| Convert from days since 1899-12-31 to days since 1850 as per CMOR | ||
| standard. | ||
| """ | ||
| time_coord = cube.coord('time') | ||
| time_coord.convert_units('days since 1850-1-1 00:00:00.0') | ||
|
|
||
|
|
||
| def fix_longitude(cube): | ||
| """Fix longitude coordinate from -180:180 to 0:360.""" | ||
| lon_axis = cube.coord_dims('longitude') | ||
| lon = cube.coord(axis='X') | ||
|
|
||
| if not lon.is_monotonic(): | ||
| raise ValueError("Data must be monotonic to fix longitude.") | ||
|
|
||
| # roll data because iris forces `lon.points` to be strictly monotonic. | ||
| shift = np.sum(lon.points < 0) | ||
| points = np.roll(lon.points, -shift) % 360 | ||
| cube.data = np.roll(cube.core_data(), -shift, axis=lon_axis) | ||
|
|
||
| lon.points = points | ||
|
|
||
|
|
||
| class Pr(Fix): | ||
| """Fixes for pr.""" | ||
|
|
||
| def fix_metadata(self, cubes): | ||
| """Fix metadata.""" | ||
| for cube in cubes: | ||
| self._fix_names(cube) | ||
| self._fix_units(cube) | ||
| self._fix_time(cube) | ||
| fix_longitude(cube) | ||
| self._fix_bounds(cube) | ||
|
|
||
| return cubes | ||
|
|
||
| def _fix_time(self, cube): | ||
| """Fix time.""" | ||
| frequency = self.vardef.frequency | ||
|
|
||
| if frequency in ('day', '3hr'): | ||
| fix_time_day(cube) | ||
| elif frequency == 'mon': | ||
| fix_time_month(cube) | ||
| else: | ||
| raise ValueError(f'Cannot fix time for frequency: {frequency!r}') | ||
|
|
||
| def _fix_units(self, cube): | ||
| """Convert units from mm/[t] to kg m-2 s-1 units.""" | ||
| frequency = self.vardef.frequency | ||
|
|
||
| cube.units = Unit(self.vardef.units) | ||
|
|
||
| if frequency in ('day', '3hr'): | ||
| # divide by number of seconds in a month | ||
| cube.data = cube.core_data() / 60 * 60 * 24 * 30 | ||
| elif frequency == 'mon': | ||
| # divide by number of seconds in a day | ||
| cube.data = cube.core_data() / 60 * 60 * 24 | ||
| else: | ||
| raise ValueError(f'Cannot fix units for frequency: {frequency!r}') | ||
|
|
||
| def _fix_bounds(self, cube): | ||
| """Add bounds to coords.""" | ||
| coord_defs = tuple(coord_def | ||
| for coord_def in self.vardef.coordinates.values()) | ||
|
|
||
| for coord_def in coord_defs: | ||
| if not coord_def.must_have_bounds == 'yes': | ||
| continue | ||
|
|
||
| coord = cube.coord(axis=coord_def.axis) | ||
|
|
||
| if coord.bounds is None: | ||
| coord.guess_bounds() | ||
|
|
||
| def _fix_names(self, cube): | ||
| """Fix miscellaneous.""" | ||
| cube.var_name = self.vardef.short_name | ||
| cube.standard_name = self.vardef.standard_name | ||
| cube.long_name = self.vardef.long_name | ||
|
|
||
| coord_defs = tuple(coord_def | ||
| for coord_def in self.vardef.coordinates.values()) | ||
|
|
||
| for coord_def in coord_defs: | ||
| coord = cube.coord(axis=coord_def.axis) | ||
| if not coord.long_name: | ||
| coord.long_name = coord_def.long_name |
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| """Tests for the fixes of MSWEP.""" | ||
| from pathlib import Path | ||
|
|
||
| import iris | ||
| import numpy as np | ||
| import pytest | ||
|
|
||
| from esmvalcore.cmor._fixes.native6.mswep import ( | ||
| Pr, | ||
| fix_longitude, | ||
| fix_time_day, | ||
| fix_time_month, | ||
| ) | ||
| from esmvalcore.cmor.fix import Fix | ||
| from esmvalcore.cmor.table import CMOR_TABLES | ||
|
|
||
|
|
||
| @pytest.mark.parametrize('mip_table', ('Amon', 'day')) | ||
| def test_get_pr_fix(mip_table): | ||
| """Test whether the right fix gets found.""" | ||
| fix = Fix.get_fixes('native6', 'MSWEP', mip_table, 'pr') | ||
| assert isinstance(fix[0], Pr) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def cube_month(): | ||
| """Return extract from mswep monthly data (shape 3x5x5).""" | ||
| # out = cube[0:3, 0:360:72, 0:720:144] | ||
| # iris.save(out, 'mswep_month.nc') | ||
| path = Path(__file__).with_name('mswep_month.nc') | ||
| return iris.load_cube(str(path)) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def cube_day(): | ||
| """Return extract from mswep daily data (shape 3x5x5).""" | ||
| # out = cube[0:3, 0:360:72, 0:720:144] | ||
| # iris.save(out, 'mswep_day.nc') | ||
| path = Path(__file__).with_name('mswep_day.nc') | ||
| return iris.load_cube(str(path)) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def fix_month(): | ||
| """Return fix for monthly pr data.""" | ||
| cmor_table = CMOR_TABLES['native6'] | ||
| vardef = cmor_table.get_variable('Amon', 'pr') | ||
| return Pr(vardef) | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def fix_day(): | ||
| """Return fix for daily pr data.""" | ||
| cmor_table = CMOR_TABLES['native6'] | ||
| vardef = cmor_table.get_variable('day', 'pr') | ||
| return Pr(vardef) | ||
|
|
||
|
|
||
| def test_fix_names(fix_month, cube_month): | ||
| """Test `Pr._fix_names`.""" | ||
| fix_month._fix_names(cube_month) | ||
|
|
||
| vardef = fix_month.vardef | ||
|
|
||
| assert cube_month.var_name == vardef.short_name | ||
| assert cube_month.long_name == vardef.long_name | ||
| assert cube_month.standard_name == vardef.standard_name | ||
|
|
||
| coord_defs = tuple(coord_def for coord_def in vardef.coordinates.values()) | ||
|
|
||
| for coord_def in coord_defs: | ||
| coord = cube_month.coord(axis=coord_def.axis) | ||
| assert coord.long_name == coord_def.long_name | ||
|
|
||
|
|
||
| def test_fix_units_month(fix_month, cube_month): | ||
| """Test `Pr._fix_units_month`.""" | ||
| fix_month._fix_units(cube_month) | ||
| assert cube_month.units == fix_month.vardef.units | ||
|
|
||
|
|
||
| def test_fix_units_day(fix_day, cube_day): | ||
| """Test `Pr._fix_units_day`.""" | ||
| fix_day._fix_units(cube_day) | ||
| assert cube_day.units == fix_day.vardef.units | ||
|
|
||
|
|
||
| def test_fix_time_month(cube_month): | ||
| """Test `fix_time_month`.""" | ||
| fix_time_month(cube_month) | ||
|
|
||
| time = cube_month.coord('time') | ||
| assert time.units == 'days since 1850-01-01' | ||
|
|
||
|
|
||
| def test_fix_time_day(cube_day): | ||
| """Test `fix_time_day`.""" | ||
| fix_time_day(cube_day) | ||
|
|
||
| time = cube_day.coord('time') | ||
| assert time.units == 'days since 1850-01-01' | ||
|
|
||
|
|
||
| def test_fix_longitude(fix_month, cube_month): | ||
| """Test `Pr._fix_longitude`.""" | ||
| unfixed_data = cube_month.data.copy() | ||
| unfixed_lon = cube_month.coord(axis='X') | ||
| shift = (unfixed_lon.points < 0).sum() | ||
|
|
||
| fix_longitude(cube_month) | ||
|
|
||
| lon = cube_month.coord(axis='X') | ||
|
|
||
| assert lon.is_monotonic | ||
|
|
||
| coord_def = fix_month.vardef.coordinates['longitude'] | ||
| valid_min = float(coord_def.valid_min) | ||
| valid_max = float(coord_def.valid_max) | ||
|
|
||
| assert lon.points.min() >= valid_min | ||
| assert lon.points.max() <= valid_max | ||
|
|
||
| # make sure that data are rolled correctly along lon axis | ||
| assert np.all(unfixed_data[:, :, 0] == cube_month.data[:, :, -shift]) | ||
|
|
||
|
|
||
| def test_fix_bounds(fix_month, cube_month): | ||
| """Test `Pr._fix_bounds`.""" | ||
| fix_month._fix_bounds(cube_month) | ||
|
|
||
| for axis in 'XYT': | ||
| coord = cube_month.coord(axis=axis) | ||
| assert coord.has_bounds() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a general structure now for all datasets or specific to MSWEP? As in: do we need the same for ERA5?
Also: would it make sense to use the actual version ("V220") instead of the latest version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just put ERA5 there so the documentation is complete, I agree that this is best done in a separate PR for documentation fixes. Putting
latestversionin the path always works, whereas puttingV220may go out of date at some point.This is not meant as a general structure, just something that should be complete for now. When more datasets will be supported, we can think about how to better organize the documentation there.