Skip to content

Moved extraction of reference datasets for (horizontal and vertical) regridding into preprocessor functions#1455

Closed
schlunma wants to merge 4 commits into
mainfrom
fix_target_levels_before_using
Closed

Moved extraction of reference datasets for (horizontal and vertical) regridding into preprocessor functions#1455
schlunma wants to merge 4 commits into
mainfrom
fix_target_levels_before_using

Conversation

@schlunma

@schlunma schlunma commented Feb 2, 2022

Copy link
Copy Markdown
Contributor

Description

This PR moves the extraction of reference datasets used for horizontal and vertical regridding to the dedicated preprocessor functions. This allows us to specify reference datasets that need to be (automatically) downloaded first.

In addition, it adds fixes prior to the loading step of the reference dataset for horizontal regridding, similar to what is already done for the target levels for vertical regridding.

Closes #1454
Closes #56

Link to documentation:


Before you get started

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.


To help with the number pull requests:

@schlunma schlunma added the bug Something isn't working label Feb 2, 2022
@schlunma schlunma added this to the v2.5.0 milestone Feb 2, 2022
@schlunma schlunma self-assigned this Feb 2, 2022
@schlunma

schlunma commented Feb 2, 2022

Copy link
Copy Markdown
Contributor Author

@valeriupredoi @zklaus Before I adapt the tests here, can I briefly get your feedback on this implementation? Does it make sense in your opinion?

@valeriupredoi

Copy link
Copy Markdown
Contributor

thanks a bunch @schlunma - looking at it now, sorry it escaped my wits yesterday!

@valeriupredoi valeriupredoi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

draft request changes to a draft PR, not gonna be anal before it's actually RfR 😁

Comment thread esmvalcore/preprocessor/_regrid.py Outdated
----
If ``levels`` is a ``dict`` and it does not contain the key ``filename``,
it is automatically assumed that you specified the target grid with
``start_longitude``, ``end_longitude``, etc. If a valid reference dataset

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait what? A grid is specified by an MxN specification, not by defining a box, am confusado here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nevermind, it's the _spec_to_latlonvals() stuff - can you add a pointer to that func here maybe, that confused me, prob gonna confuse somebody else too

@valeriupredoi

Copy link
Copy Markdown
Contributor

so here's the thing - as I understand it, your are now passing all the basic ingredients to the regrid preprocessor, rather than have some bits done first in recipe and then others (like the reference level extraction) be done via the regridder preprocessor - I like it! I like that stuff gets done by the module that needs to do it, and not outside it. What happens if there's an issue with the reference dataset, other than a CMOR issue, that might not even be picked up if the user chooses to run with all CMOR shields down? Will that surface only when the regridder is called? That might be after a lot of stuff has already been computed

Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
@schlunma

schlunma commented Feb 3, 2022

Copy link
Copy Markdown
Contributor Author

What happens if there's an issue with the reference dataset, other than a CMOR issue, that might not even be picked up if the user chooses to run with all CMOR shields down? Will that surface only when the regridder is called? That might be after a lot of stuff has already been computed

It will surface once the reference dataset has entered the preprocessor chain, so probably a little bit earlier than the actual regridding. I agree that this is not ideal because it also increases the computation times by duplicating the loading for every dataset, but I don't see another way right now.

Something that might be possible is to add an additional step between the check for data availability (and the downloading) and the actual start of the preprocessing, but this is something I can't do for v2.5.

I'm not sure how to proceed. Should I try to get this in or leave it to v2.6 and come up with a cleaner solution?

@valeriupredoi

Copy link
Copy Markdown
Contributor

thanks for the note, Manu! The question is what's the immediate benefit of this vs the cost it introduces and vs a longer mulling over and possible implementation of a download me/check me/use me function for datasets before the preprocessor starts? I'd also want to hear what @zklaus and @bouweandela think about this 🍺

@schlunma

schlunma commented Feb 3, 2022

Copy link
Copy Markdown
Contributor Author

The short-term benefit is that the bug mentioned in #1454 is fixed.

But after giving this some more thoughts I think this should be implemented properly, e.g., by adding a function _load_reference_datasets() between the download and run here:

def run(self):
"""Run all tasks in the recipe."""
self.write_filled_recipe()
if not self.tasks:
raise RecipeError('No tasks to run!')
# Download required data
if not self._cfg['offline']:
esgf.download(self._download_files, self._cfg['download_dir'])
self.tasks.run(max_parallel_tasks=self._cfg['max_parallel_tasks'])
self.write_html_summary()

Maybe I can address this for v2.5, but if not we can do a bugfix or even postpone this to v2.6, I don't think it's that urgent.

@valeriupredoi

Copy link
Copy Markdown
Contributor

thanks, Manu! I am of the personal opinion that this should not be hurried up, but rather done in a take it easy, test proper manner - up to you and the others, am not gonna block progress 🇨🇳 😁

@schlunma schlunma modified the milestones: v2.5.0, v2.6.0 Feb 4, 2022
@schlunma

Copy link
Copy Markdown
Contributor Author

I cannot finish this PR in time for v2.6, moving this to v2.7.

@schlunma schlunma modified the milestones: v2.6.0, v2.7.0 May 19, 2022
@bouweandela

Copy link
Copy Markdown
Member

This would also be addressed (differently) in #1609.

@schlunma schlunma closed this Aug 30, 2022
@schlunma schlunma deleted the fix_target_levels_before_using branch March 1, 2023 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vertical regridding to reference levels fails when reference dataset needs to be downloaded Regridding to reference dataset fails sometimes

3 participants