Public contiguity checking#3144
Conversation
| 'where the discontiguity occurs. The ' \ | ||
| 'mesh created for this plot may cause a ' \ | ||
| 'misrepresentation of the input data. ' \ | ||
| 'Please use iris.contiguity_check(coord) ' \ |
There was a problem hiding this comment.
E501 line too long (80 > 79 characters)
| anchor.patch.set_boxstyle('round, pad=0, rounding_size=0.2') | ||
| axes = axes if axes else figure.gca() | ||
| axes.add_artist(anchor) | ||
|
|
| returned = find_discontiguities(coord) | ||
| self.assertEqual(expected, returned) | ||
|
|
||
| # TODO Make sure everything works when there are multiple discontiguities |
There was a problem hiding this comment.
E501 line too long (81 > 79 characters)
|
I can already see that I could do with making these changes:
|
|
|
||
|
|
||
| def _check_contiguity_and_bounds(coord, data, abs_tol=1e-4, transpose=False): | ||
| def _check_contiguity_and_mask(coord, data, abs_tol=1e-4, transpose=False): |
There was a problem hiding this comment.
I agree with the inclusion of mask but by not mentioning bounds it isn't clear that that's what it is referring to.
How about _check_bounds_contiguity_and_mask?
There was a problem hiding this comment.
Yeah that'll do. It's a bit of a mouthful but it's private anyway so it shouldn't matter.
| ' Matplotlib'.format(coord.name())) | ||
| discontiguity_warning = 'The bounds of the {} coordinate are ' \ | ||
| 'not contiguous and data is not masked ' \ | ||
| 'where the discontiguity occurs. The ' \ |
There was a problem hiding this comment.
Not a change you've made but now rereading through this intial sentence I see it doesn't make sense. We say discontiguous twice: "The bounds of the {} coordinate are not contiguous ... where the discontiguity occurs.".
There was a problem hiding this comment.
Oh I see, this says "there is a discontiguity and the data is not masked where that discontiguity is (or are)"
|
@lbdreyer The changes I made for the last commit prompted numpy to give me this warning: I'm not exactly sure what it's referring to, I have a horrible feeling it's something to do with the zip output, and that's been a really horrible nightmare to put into a user-friendly return format. We will obviously have to address this at some point, but I could do with your help with it I think. |
| """ | ||
|
|
||
| from __future__ import (absolute_import, division, print_function) | ||
| from six.moves import (filter, input, map, range, zip) # noqa |
There was a problem hiding this comment.
We are using these two lines as boilerplate for all our modules to help with Python 3 compatibility. (you'll find them at the top of all files.)
I think it would be best to keep all of them (even if they aren't being used) as otherwise we may forget to include them.
In fact that is what the #noqa tag is there for - to stop flake8 complaining about unused imports
There was a problem hiding this comment.
I didn't even realise it wasn't there. I'll pop it in.
There was a problem hiding this comment.
No idea how I removed that...
|
|
||
| (b) the :attr:`standard_name`, :attr:`long_name`, or | ||
| :attr:`var_name` of an instance of an instance of | ||
| :class:`iris.coords.Coord`. |
There was a problem hiding this comment.
If you are going to allow for a user to do this, you need to also add a line at the start that gets the coordinate.
This function starts by doing
_, diffs_x, diffs_y = iris.coords._discontiguity_in_2d_bounds(coord.bounds,
abs_tol)
so if that needs to be a coord instance by then
There was a problem hiding this comment.
Thing is, if we do allow that user to use a name instead of a coord object, then we have to pass the cube in as well so that we can get the coord. This seems a bit unnecessary, so it might make sense to only allow a coord object.
| try: | ||
| _check_contiguity_and_bounds(coord, data=cube.data, | ||
| abs_tol=twodim_contig_atol) | ||
| _check_contiguity_and_mask(coord, data=cube.data, |
There was a problem hiding this comment.
F821 undefined name '_check_contiguity_and_mask'
| if _check_contiguity_and_bounds(coord, data=cube.data, | ||
| abs_tol=twodim_contig_atol, | ||
| transpose=True) is True: | ||
| if _check_contiguity_and_mask(coord, data=cube.data, |
There was a problem hiding this comment.
F821 undefined name '_check_contiguity_and_mask'
| _check_contiguity_and_bounds(coord, data=cube.data, | ||
| abs_tol=twodim_contig_atol) | ||
| _check_bounds_contiguity_and_mask(coord, data=cube.data, | ||
| abs_tol=twodim_contig_atol) |
There was a problem hiding this comment.
E128 continuation line under-indented for visual indent
| abs_tol=twodim_contig_atol, | ||
| transpose=True) is True: | ||
| if _check_bounds_contiguity_and_mask(coord, data=cube.data, | ||
| abs_tol=twodim_contig_atol, |
There was a problem hiding this comment.
E128 continuation line under-indented for visual indent
| transpose=True) is True: | ||
| if _check_bounds_contiguity_and_mask(coord, data=cube.data, | ||
| abs_tol=twodim_contig_atol, | ||
| transpose=True) is True: |
There was a problem hiding this comment.
E128 continuation line under-indented for visual indent
| absolute tolerance in degrees | ||
|
|
||
| Returns: | ||
| Dictionary of values representing data array indices where |
There was a problem hiding this comment.
I'm a bit 👎 on this.
Would it not be much simpler to return an boolean array of ok/bad-point values ?
Likewise in the code, avoid use of np.where + just use boolean arrays
bad = (diffs_x > abs_tol) | (diffs_y > abs_tol)
There was a problem hiding this comment.
Not really. Not simpler. You still have to move everything along by one point to the right for x-values and down for y-values because the bad point is after the discontiguity in the array. That's really where the complication arises.
There was a problem hiding this comment.
Also I don't see how that's any more use to the user. If they have a particularly large dataset then they will have to search through it for places where it's True anyway, so what's the point?
And also, would we return the boolean array for the data or for the coord points? What are you expecting the user to do with a boolean array?
| return None | ||
|
|
||
|
|
||
| def mask_discontiguities(coord, data, abs_tol=1e-4): |
There was a problem hiding this comment.
I thought if we did have a simple boolean array from the previous, this routine would not be needed..
Something like
badpts = find_discontiguities(coord)
points = ma.masked_array(coord.points)
points[badpts] = ma.masked
coord.points = points
There was a problem hiding this comment.
A boolean array would be fine for this because we don't need anything to be human-readable; we are just doing operations. But I still disagree with creating a boolean array for the previous.
There was a problem hiding this comment.
Also what do you mean by 'this routine would not be needed?'
This routine is pretty much as simple as it gets. The whole point of having the two separate routines is that one allows you to check which points are discontiguous (as a human) and look at them and think about them, and the other allows you to just mask them without thought. Those are both valid options and it seems stupid to just not offer one of them because that's simpler.
| # Construct an array the size of the points array filled with 'False' | ||
| # to represent an empty discontiguity array | ||
| self.empty_bool = np.zeros((self.lons.points.shape), | ||
| dtype=bool) |
There was a problem hiding this comment.
E127 continuation line over-indented for visual indent
|
Changes following last review:
|
|
@lbdreyer All tests passed. What do you think? |
lbdreyer
left a comment
There was a problem hiding this comment.
Looking good! I have a few comments, mostly renaming things more descriptive names.
The tests are in a bit of a jumble. Maybe keep the old tests where they are (as I have moved them in the PR I am working on, but feel free to do so in this PR instead), but the new tests that are introduced in this PR (e.g. for find_discontiguities and mask_discontiguities) should go in the appropriate places
|
|
||
| # If discontinuity occurs but not masked, any grid will be created | ||
| # incorrectly, so raise a warning | ||
| # incorrectly, so raise a warning for the user |
There was a problem hiding this comment.
You don't really need this clarification as all warnings are for the user
| 'Please use ' \ | ||
| 'iris.contiguity_check(coord) to find ' \ | ||
| 'and mask your data ' \ | ||
| 'appropriately'.format(coord.name()) |
There was a problem hiding this comment.
Don't you mean iris.util.find_discontiguities and iris.util.mask_discontiguities instead of iris.contiguity_check?
There was a problem hiding this comment.
Yep, this is leftover from before the full implementation.
There was a problem hiding this comment.
Is this going to end up a bit long and rambly?
| 'iris.contiguity_check(coord) to find ' \ | ||
| 'and mask your data ' \ | ||
| 'appropriately'.format(coord.name()) | ||
| warnings.warn(discontiguity_warning) |
There was a problem hiding this comment.
We have chosen to raise this as an error but I make that change in the next PR so we'll leave this as it is in this PR
There was a problem hiding this comment.
Ah. Good decision.
| try: | ||
| return bad_points_boolean | ||
| except ValueError: | ||
| return None |
There was a problem hiding this comment.
I assume you forgot to remove this? Otherwise it will catch all errors which would working out what is going wrong difficult.
There was a problem hiding this comment.
Yes I did. I needed this for an earlier implementation but it will never return None now (unless it's broken), so you're quite right.
| # awkward but supported cube. | ||
| cube = self.latlon_2d | ||
| result = iplt._draw_2d_from_bounds('pcolormesh', cube) | ||
| self.assertTrue(result) |
There was a problem hiding this comment.
draw_2d_from_bounds is too complicated to unit test. I suggest you just remove this test.
I am in the process of adding an integration test that will test draw_2d_from_bounds (which will at the same time check that it is not raising an error)
There was a problem hiding this comment.
Yeah, this was meant as a placeholder really, I think it was one of the first tests I put in, just to help me think. I'll take it out.
| Copy of the data array, masked at points where bounds are | ||
| discontiguous. | ||
| """ | ||
| mask = find_discontiguities(coord, abs_tol=abs_tol) |
There was a problem hiding this comment.
mask isn't that clear what it means as it doesn't refer to the cube's original mask, it refers to the locations of the discontiguities and so the new places that should be masked.
How about something like discontiguous_locations or discontiguous or discontiguous_bounds_locations... I can't really think of a good name.
There was a problem hiding this comment.
Maybe just new_mask or extra_mask
|
|
||
| with warnings.catch_warnings(): | ||
| warnings.simplefilter("error") | ||
| with self.assertRaises(UserWarning): |
There was a problem hiding this comment.
This really needs a assertRaisesRegexp to check it's getting the right warning`
| self.empty_bool = np.zeros(self.lons.points.shape, | ||
| dtype=bool) | ||
| self.masked = self.empty_bool.copy() | ||
| self.masked[3, 3] = True |
There was a problem hiding this comment.
Why not just use the mask of self.latlon_2d?
There was a problem hiding this comment.
And unmask it first? I suppose so. Don't see that it makes a lot of difference but I'll pop it in anyway.
There was a problem hiding this comment.
Oh no wait a minute, I'm not quite sure what I'm actually trying to achieve here, maybe I need to think about this a bit more and add some comments.
There was a problem hiding this comment.
I think I will need the array of Falses that I construct here because that's the easiest way to do it, but there is no need to then mask the single point I have done here. I can just use the cube's mask instead of constructing self.masked.
| def setUp(self): | ||
| self.latlon_2d = full2d_global() | ||
| make_bounds_discontiguous_at_point(self.latlon_2d, 3, 2) | ||
| self.lons = self.latlon_2d.coord('longitude') |
There was a problem hiding this comment.
I'd prefer more descriptive names, it would make reading the test easier, so:
self.latlon_2d_cube and self.lons_coord
|
|
||
| import numpy as np | ||
| import numpy.ma as ma | ||
| import warnings |
There was a problem hiding this comment.
F401 'warnings' imported but unused
| import warnings | ||
|
|
||
| import iris.coords as coords | ||
| import iris.util |
There was a problem hiding this comment.
F401 'iris.util' imported but unused
| # # with warnings.catch_warnings(): | ||
| # # warnings.simplefilter("error") | ||
| # # with self.assertRaises(UserWarning): | ||
| # # iris.plot._check_bounds_contiguity_and_mask(coord, cube.data) |
There was a problem hiding this comment.
E501 line too long (81 > 79 characters)
| # Construct an array the size of the points array filled with 'False' | ||
| # to represent a mask showing no discontiguities | ||
| empty_mask = np.zeros(self.longitude_coord.points.shape, | ||
| dtype=bool) |
There was a problem hiding this comment.
E127 continuation line over-indented for visual indent
|
|
||
| # Apply mask for x-direction discontiguities: | ||
| bad_points_boolean[:, :-1] = np.logical_or(bad_points_boolean[:, :-1], | ||
| gaps_x) |
There was a problem hiding this comment.
E128 continuation line under-indented for visual indent
|
|
||
| # apply mask for y-direction discontiguities: | ||
| bad_points_boolean[:-1, :] = np.logical_or(bad_points_boolean[:-1, :], | ||
| gaps_y) |
There was a problem hiding this comment.
E128 continuation line under-indented for visual indent
| import iris.plot | ||
| import iris.tests as tests | ||
|
|
||
| import numpy as np |
There was a problem hiding this comment.
F401 'numpy as np' imported but unused
|
|
||
| import iris.coords as coords | ||
| import iris.util | ||
| from iris.util import find_discontiguities_in_bounds |
There was a problem hiding this comment.
F401 'iris.util.find_discontiguities_in_bounds' imported but unused
| import iris.coords as coords | ||
| import iris.util | ||
| from iris.util import find_discontiguities_in_bounds | ||
| from iris.util import mask_data_at_discontiguities |
There was a problem hiding this comment.
F401 'iris.util.mask_data_at_discontiguities' imported but unused
| from iris.util import mask_data_at_discontiguities | ||
| from iris.tests.stock import simple_2d_w_multidim_coords as cube_2dcoords | ||
| from iris.tests.stock import simple_3d_w_multidim_coords as cube3d_2dcoords | ||
| from iris.tests.stock import simple_3d |
There was a problem hiding this comment.
F401 'iris.tests.stock.simple_3d' imported but unused
| from iris.tests.stock import simple_3d | ||
| from iris.tests.stock import sample_2d_latlons | ||
| from iris.tests.stock import make_bounds_discontiguous_at_point | ||
| from iris.tests.stock._stock_2d_latlons import grid_coords_2d_from_1d |
There was a problem hiding this comment.
F401 'iris.tests.stock._stock_2d_latlons.grid_coords_2d_from_1d' imported but unused
| # Construct an array the size of the points array filled with 'False' | ||
| # to represent a mask showing no discontiguities | ||
| empty_mask = np.zeros(self.longitude_coord.points.shape, | ||
| dtype=bool) |
There was a problem hiding this comment.
E127 continuation line over-indented for visual indent
|
Had some trouble with I have been finding that many coordinate points from This is possibly a good test for a relative tolerance option though... |
|
@lbdreyer I'm slightly worried about bits still being a problem, particularly test data. But for now, I'm afraid this is the best you're gonna get. Thanks for all your help and good luck getting to the finish with this. |
|
Hi @corinnebosley @lbdreyer However, discussing today with @lbdreyer I'm still not very sure about the API. It also seems to me that this should absolutely be checking both x- and y-coords of a given cube, or every use will be calling it twice ?? Rather more seriously ... I'm wondering now where + why we need this ?? |
Forget that -- I found out why today ! |
|
... and consider #3105 while getting this fixed. |
…orking pylint corrections, small corrections to zip nightmare and return format corrections for Laura reinstated imports of six that pycharm keeps secretly changing some changes on Laura's suggestions, more to come pylint changes more stickler requested changes (not pylint at all) some fixes and tweaks
582c164 to
29d0e95
Compare
PR moved on since suggested changes (some changes made, others discarded)
| rtol * cell_size) | ||
|
|
||
| points_close_enough = (diffs_along_axis <= (atol + | ||
| rtol * (lower_bounds - upper_bounds))) |
There was a problem hiding this comment.
E128 continuation line under-indented for visual indent
|
@pp-mo Still have to remove all references to discontiguity from |
|
@pp-mo I have now removed all references to discontiguity in Thanks! |
Looks fine to me 😃 ! |
Test discontiguity checking for a y-direction mismatch.
lbdreyer
left a comment
There was a problem hiding this comment.
I've been getting lots of notifications from this PR so I couldn't help but have a very very quick look, as I was interested in what the API would look like.
I have a couple quick comments about doc strings.
This PR contains:
plot.py:
test_2d_coords.py:
util.py:
Enjoy!