Skip to content

Use glob in datafinder#210

Merged
mattiarighi merged 9 commits into
developmentfrom
glob_data_finder
Oct 10, 2019
Merged

Use glob in datafinder#210
mattiarighi merged 9 commits into
developmentfrom
glob_data_finder

Conversation

@jvegreg

@jvegreg jvegreg commented Aug 26, 2019

Copy link
Copy Markdown
Contributor

A small change to allow users to use wildcard characters in path definiton (we already allow it on files).

I added it because of a quirk our observations convention has: variable folder can just be the variable name or it can have appended the original frequency used for the computation:

Example:

  • psl monthly means for ERA5 if directly provided by ECMF: /esarchive/recon/ecmwf/era5/daily_mean/psl

  • psl monthly means for ERA5 if computed from hourly outputs provided by ECMF: /esarchive/recon/ecmwf/era5/daily_mean/psl_f1h

Both will be found with the new code if folder path is defined whatever\[short_name]* but I will need to add a extra tag for the user to provide with the old version

As a bonus, things like PRIMAVERA workspaces in jasmin (which are primavera1 to primavera5) can be add now as only one 'primavera?'

@jvegreg jvegreg added the enhancement New feature or request label Aug 26, 2019
@jvegreg jvegreg requested a review from bouweandela August 26, 2019 10:10
@valeriupredoi

Copy link
Copy Markdown
Contributor

not a big fan of glob.glob man, why not use os.walk 🍺

@jvegreg

jvegreg commented Aug 27, 2019

Copy link
Copy Markdown
Contributor Author

Because os.walk does not support wildcards and that is the reason to use glob. Also, I do not want to walk the whole tree

@valeriupredoi

valeriupredoi commented Aug 27, 2019

Copy link
Copy Markdown
Contributor

ok, the "I do not want to walk the whole tree" sold me 😁

@bouweandela bouweandela left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition, can you add unit tests?

@mattiarighi

Copy link
Copy Markdown
Contributor

Can this be extended to also address #250?

@jvegreg

jvegreg commented Sep 17, 2019

Copy link
Copy Markdown
Contributor Author

Can this be extended to also address #250?
Yes, will do it

Another question is how to make both work with #206

@jvegreg

jvegreg commented Sep 17, 2019

Copy link
Copy Markdown
Contributor Author

Done!

Comment thread esmvalcore/config-developer.yml Outdated
Comment thread esmvalcore/_data_finder.py Outdated
@mattiarighi

mattiarighi commented Oct 8, 2019

Copy link
Copy Markdown
Contributor

We should test this on several machines/drs'.
I added @valeriupredoi and @bascrezee as reviewers.
Also @schlunma could help checking with CMIP6 data at DKRZ (I'm testing CMIP5/OBS/obs4mips).

@valeriupredoi

valeriupredoi commented Oct 8, 2019 via email

Copy link
Copy Markdown
Contributor

@bascrezee

Copy link
Copy Markdown
Contributor

I am testing using the settings from recipe_check_obs, is that how I am intended to do the testing? @ruthlorenz If you sent me your datasets (e-mail) I can include them as well in this test.

@mattiarighi

Copy link
Copy Markdown
Contributor

recipe_check_obs is a good test case for this PR.

@schlunma schlunma left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully tested for many CMIP5 and CMIP6 models (using this recipe).

@valeriupredoi

Copy link
Copy Markdown
Contributor

hold merging, this breaks the data finding for fx variables. I am taking a look now

@valeriupredoi

Copy link
Copy Markdown
Contributor

so why was config-developer.yml not aligned from development?

@mattiarighi

Copy link
Copy Markdown
Contributor

There were some conflicts which I fixed, maybe I overlooked something.

@mattiarighi

Copy link
Copy Markdown
Contributor

But now you reverted all the changes of this PR, we use { instead of [...

@valeriupredoi

Copy link
Copy Markdown
Contributor

yeah I just saw that, need to revert the revert and fix the issues manually

@valeriupredoi

Copy link
Copy Markdown
Contributor

nobody move an inch, am making the changes now

@valeriupredoi valeriupredoi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok tested with @ruthlorenz collins recipe, OK to go

@bascrezee

Copy link
Copy Markdown
Contributor

Testing is ongoing here.

@bascrezee

Copy link
Copy Markdown
Contributor

Testing is ongoing here.

What is the meaning of this? I do not specify any preprocessor, but it does call a 'default' preprocessor. This seems not to occur for OBS.
INFO Creating preprocessor 'default' task for variable 'tas'

@valeriupredoi

Copy link
Copy Markdown
Contributor

Testing is ongoing here.

What is the meaning of this? I do not specify any preprocessor, but it does call a 'default' preprocessor. This seems not to occur for OBS.
INFO Creating preprocessor 'default' task for variable 'tas'

any time you don't specify a preprocessor name for a variable, it will create the default one -> this is per variable and not per dataset; the default preprocessor will contain only the basic preproc steps: load, cmor check, fix, extract time etc (note that extract time does not happen for default prprocs for fx varibales that are time-invariant)

@mattiarighi mattiarighi merged commit 7de7aac into development Oct 10, 2019
@mattiarighi mattiarighi deleted the glob_data_finder branch October 10, 2019 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants