-
Notifications
You must be signed in to change notification settings - Fork 48
Add a method to save citation information #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
108 commits
Select commit
Hold shift + click to select a range
0e7bb54
Add a method to save citation information
SarahAlidoost 21e56b5
Fixing the function write_citation_file
SarahAlidoost db80953
Fix the function write_citation_file
SarahAlidoost 4b63ef6
fix the function _write_citation_file
SarahAlidoost 6a38723
style
SarahAlidoost 528eee7
refactor and style
SarahAlidoost a7a6368
Add esmvaltool paper to the provenance, and style
SarahAlidoost 65ee713
update the tag
SarahAlidoost 36d7e36
Add a method to save citation information
SarahAlidoost 5f70c73
Fixing the function write_citation_file
SarahAlidoost aff593d
Fix the function write_citation_file
SarahAlidoost 3406215
fix the function _write_citation_file
SarahAlidoost 05f4ce8
style
SarahAlidoost 585cc92
refactor and style
SarahAlidoost 46f7b45
Add esmvaltool paper to the provenance, and style
SarahAlidoost 2ea1c98
update the tag
SarahAlidoost bb30b19
Merge branch 'save_citation' of github.com:ESMValGroup/ESMValCore int…
SarahAlidoost 638d08c
change the method to a function
SarahAlidoost f61be82
fix the function get_esmvaltool_porvenance
SarahAlidoost 21f09d2
fix the if-else condition
SarahAlidoost eb50f0b
Add CMIP citation info, and refactor
SarahAlidoost c319a9e
remove pybtex, fix _json_to_bibtex function
SarahAlidoost 227460b
Refactor and style
SarahAlidoost 6cccf25
Refactor and style
SarahAlidoost c482757
fix open_url
SarahAlidoost 824f869
fix the _get_response function
SarahAlidoost 8cc7bab
add documentation
SarahAlidoost 96a5e85
Style
SarahAlidoost 87c7010
Merge remote-tracking branch 'origin/master' into save_citation
Peter9192 c1214a4
Refactor and style
SarahAlidoost b0db9ec
add a test checking if jason data includes bibtex keys
SarahAlidoost cf54ae0
style
SarahAlidoost c5bcdd5
add new module and remove functions from task
SarahAlidoost 824c0f9
fix citation parts
SarahAlidoost a0a05da
fix the citation functions, fix provenance to not replace the tags fo…
SarahAlidoost 3a87afe
keep the references tags and not to replace them
SarahAlidoost b7a6773
remove unnecessary imports and refactor
SarahAlidoost b88e2dd
update the documentation
SarahAlidoost 438960e
check if the reference folder does not exist
SarahAlidoost 36677e3
remove validating json data
SarahAlidoost 748987c
refactor json to bibtex function
SarahAlidoost 299bd84
refactor
SarahAlidoost 6508877
refactor
SarahAlidoost df3b008
remove unused function
SarahAlidoost 7b1c339
refactor wrtite and save functions
SarahAlidoost 622bb86
remove new line
SarahAlidoost 0968425
use diagnostics path instaed of finding references path
SarahAlidoost e7d6bd7
use pathlib instead of os.path
SarahAlidoost 776bd72
add esmvaltool technical paper as default citation entry
SarahAlidoost d51a3f0
fix the logger error message
SarahAlidoost b835eac
style and refactor
SarahAlidoost 800cdb9
refactor
SarahAlidoost b96dc6d
fix broken tests due to removing references and replace tags
SarahAlidoost 8e40366
add a unit test for citation.py
SarahAlidoost f9e89c8
safe to remove esmvaltool bibtex file
SarahAlidoost 66dd95a
move the esmvaltool paper tag to citation module
SarahAlidoost 85f03f2
add the esmvaltool paper tag
SarahAlidoost a5da5f3
remove unused import
SarahAlidoost dc37c39
remove unused import
SarahAlidoost cdd5181
refactor json to bibtex function
SarahAlidoost 4e65db1
fix tests for _citation.py
SarahAlidoost 1eb18b2
remove unused monkeypatch
SarahAlidoost 2f49ecb
fix typo
SarahAlidoost 789df2b
add support for references that are not in diagnostics, refactor
SarahAlidoost 0b96874
fix test for new codes in citation.py
SarahAlidoost c452ba6
fix newlines in entries
SarahAlidoost dfe6e12
style
SarahAlidoost 492c26d
refactor
SarahAlidoost 4332963
add a function to convert bibtex to reference entry
SarahAlidoost 38a1a18
fix the function cite_tag_value
SarahAlidoost 7a061cd
remove the unnecessary condition for TAGS
SarahAlidoost f30a0d6
add tests to check if references have been added
SarahAlidoost a3c7e42
refactor
SarahAlidoost 1d81db7
refactor
SarahAlidoost 504a17f
fix broken test
SarahAlidoost 857832b
fix the test for tags in test_recipe
SarahAlidoost 0997805
add a space after , for joining tags
SarahAlidoost 16043f1
remove pop() and refactor
SarahAlidoost 0ee0047
fix flake8 error
SarahAlidoost b0a2372
remove cite_tag_value
SarahAlidoost 1d64c43
move \t to begning of the line, remove + from get attribute
SarahAlidoost 7bb95a8
refactor write_citation_file function
SarahAlidoost d57c984
refactor clean_tag function, fix the logger
SarahAlidoost 035a442
fix minor things
SarahAlidoost f267a02
style
SarahAlidoost 07c04ba
add import, refactor jason_to_bitex func
SarahAlidoost 558e109
move the test to esmvaltool repo
SarahAlidoost d60b531
undo the changes
SarahAlidoost 824143d
refactor
SarahAlidoost d85156c
fix the tests
SarahAlidoost 4e9e014
fix get_recipe_provenance function
SarahAlidoost a1bbbff
refactor extract_tags function
SarahAlidoost f106cd0
remove esmvaltool_paper_tag
SarahAlidoost d54c012
remove esmvaltool_paper_tag
SarahAlidoost a9f1323
refcator bibtex string
SarahAlidoost ef43a82
add import to fix merge conflict
SarahAlidoost ddc5e71
remove import from _citation
SarahAlidoost aba4457
remove lstrip()
SarahAlidoost 27f4eff
add str and fix the test
SarahAlidoost fb2f057
style
SarahAlidoost 60b9080
refactor write_citation_file function
SarahAlidoost fd0ff2e
fix multiline docstring
SarahAlidoost 6bb038f
fix title for info_url
SarahAlidoost c995e66
fix minor things
SarahAlidoost 096e70d
remove duplicated cmip6
SarahAlidoost f078010
style
SarahAlidoost 66fdf11
Improve text and avoid duplicate citation entries
bouweandela 7a85ee1
Update ESMValTool reference
bouweandela File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,247 @@ | ||
| """Citation module.""" | ||
| import logging | ||
| import os | ||
| import re | ||
| import textwrap | ||
| from functools import lru_cache | ||
|
|
||
| import requests | ||
|
|
||
| from ._config import DIAGNOSTICS_PATH | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| REFERENCES_PATH = DIAGNOSTICS_PATH / 'references' | ||
|
|
||
| CMIP6_URL_STEM = 'https://cera-www.dkrz.de/WDCC/ui/cerasearch' | ||
|
|
||
| # The technical overview paper should always be cited | ||
| ESMVALTOOL_PAPER = ( | ||
| "@article{righi20gmd,\n" | ||
| "\tdoi = {10.5194/gmd-13-1179-2020},\n" | ||
| "\turl = {https://doi.org/10.5194/gmd-13-1179-2020},\n" | ||
| "\tyear = {2020},\n" | ||
| "\tmonth = mar,\n" | ||
| "\tpublisher = {Copernicus {GmbH}},\n" | ||
| "\tvolume = {13},\n" | ||
| "\tnumber = {3},\n" | ||
| "\tpages = {1179--1199},\n" | ||
| "\tauthor = {Mattia Righi and Bouwe Andela and Veronika Eyring " | ||
| "and Axel Lauer and Valeriu Predoi and Manuel Schlund " | ||
| "and Javier Vegas-Regidor and Lisa Bock and Bj\"{o}rn Br\"{o}tz " | ||
| "and Lee de Mora and Faruk Diblen and Laura Dreyer " | ||
| "and Niels Drost and Paul Earnshaw and Birgit Hassler " | ||
| "and Nikolay Koldunov and Bill Little and Saskia Loosveldt Tomas " | ||
| "and Klaus Zimmermann},\n" | ||
| "\ttitle = {Earth System Model Evaluation Tool (ESMValTool) v2.0 " | ||
| "-- technical overview},\n" | ||
| "\tjournal = {Geoscientific Model Development}\n" | ||
| "}\n") | ||
|
|
||
|
|
||
| def _write_citation_files(filename, provenance): | ||
| """ | ||
| Write citation information provided by the recorded provenance. | ||
|
|
||
| Recipe and cmip6 data references are saved into one bibtex file. | ||
| cmip6 data references are provided by CMIP6 data citation service. | ||
| Each cmip6 data reference has a json link. In the case of internet | ||
| connection, cmip6 data references are saved into a bibtex file. | ||
| Also, cmip6 data reference links are saved into a text file. | ||
| """ | ||
| product_name = os.path.splitext(filename)[0] | ||
|
|
||
| tags = set() | ||
| cmip6_json_urls = set() | ||
| cmip6_info_urls = set() | ||
| other_info = set() | ||
|
|
||
| for item in provenance.records: | ||
| # get cmip6 data citation info | ||
| cmip6_data = 'CMIP6' in item.get_attribute('attribute:mip_era') | ||
| if cmip6_data: | ||
| url_prefix = _make_url_prefix(item.attributes) | ||
| cmip6_info_urls.add(_make_info_url(url_prefix)) | ||
| cmip6_json_urls.add(_make_json_url(url_prefix)) | ||
|
|
||
| # get other citation info | ||
| references = item.get_attribute('attribute:references') | ||
| if not references: | ||
| # ESMValTool CMORization scripts use 'reference' (without final s) | ||
| references = item.get_attribute('attribute:reference') | ||
| if references: | ||
| if item.identifier.namespace.prefix == 'recipe': | ||
| # get recipe citation tags | ||
| tags.update(references) | ||
| elif item.get_attribute('attribute:script_file'): | ||
| # get diagnostics citation tags | ||
| tags.update(references) | ||
| elif not cmip6_data: | ||
| # get any other data citation tags, e.g. CMIP5 | ||
| other_info.update(references) | ||
|
|
||
| _save_citation_bibtex(product_name, tags, cmip6_json_urls) | ||
| _save_citation_info_txt(product_name, cmip6_info_urls, other_info) | ||
|
|
||
|
|
||
| def _save_citation_bibtex(product_name, tags, json_urls): | ||
| """Save the bibtex entries in a bibtex file.""" | ||
| citation_entries = [ESMVALTOOL_PAPER] | ||
|
|
||
| # convert tags to bibtex entries | ||
| if tags: | ||
| entries = set() | ||
| for tag in _extract_tags(tags): | ||
| entries.add(_collect_bibtex_citation(tag)) | ||
| citation_entries.extend(sorted(entries)) | ||
|
|
||
| # convert json_urls to bibtex entries | ||
| entries = set() | ||
| for json_url in json_urls: | ||
| cmip_citation = _collect_cmip_citation(json_url) | ||
| if cmip_citation: | ||
| entries.add(cmip_citation) | ||
| citation_entries.extend(sorted(entries)) | ||
|
|
||
| with open(f'{product_name}_citation.bibtex', 'w') as file: | ||
| file.write('\n'.join(citation_entries)) | ||
|
|
||
|
|
||
| def _save_citation_info_txt(product_name, info_urls, other_info): | ||
| """Save all data citation information in one text file.""" | ||
| lines = [] | ||
| # Save CMIP6 url_info | ||
| if info_urls: | ||
| lines.append( | ||
| "Follow the links below to find more information about CMIP6 data:" | ||
| ) | ||
| lines.extend(f'- {url}' for url in sorted(info_urls)) | ||
|
|
||
| # Save any references from the 'references' and 'reference' NetCDF global | ||
| # attributes. | ||
| if other_info: | ||
| if lines: | ||
| lines.append('') | ||
| lines.append("Additional data citation information was found, for " | ||
| "which no entry is available in the bibtex file:") | ||
| lines.extend('- ' + str(t).replace('\n', ' ') | ||
| for t in sorted(other_info)) | ||
|
|
||
| if lines: | ||
| with open(f'{product_name}_data_citation_info.txt', 'w') as file: | ||
| file.write('\n'.join(lines) + '\n') | ||
|
|
||
|
|
||
| def _extract_tags(tags): | ||
| """Extract tags. | ||
|
|
||
| Tags are recorded as a list of strings converted to a string in provenance. | ||
| For example, a single entry in the list `tags` could be the string | ||
| "['acknow_project', 'acknow_author']". | ||
| """ | ||
| pattern = re.compile(r'\w+') | ||
| return set(pattern.findall(str(tags))) | ||
|
|
||
|
|
||
| def _get_response(url): | ||
| """Return information from CMIP6 Data Citation service in json format.""" | ||
| json_data = None | ||
| if url.lower().startswith('https'): | ||
| try: | ||
| response = requests.get(url) | ||
| if response.status_code == 200: | ||
| json_data = response.json() | ||
| else: | ||
| logger.warning('Error in the CMIP6 citation link: %s', url) | ||
| except IOError: | ||
| logger.info('No network connection, ' | ||
| 'unable to retrieve CMIP6 citation information') | ||
| return json_data | ||
|
|
||
|
|
||
| def _json_to_bibtex(data): | ||
| """Make a bibtex entry from CMIP6 Data Citation json data.""" | ||
| url = 'url not found' | ||
| title = data.get('titles', ['title not found'])[0] | ||
| publisher = data.get('publisher', 'publisher not found') | ||
| year = data.get('publicationYear', 'publicationYear not found') | ||
| authors = 'creators not found' | ||
| doi = 'doi not found' | ||
|
|
||
| if 'creators' in data: | ||
| author_list = [ | ||
| item.get('creatorName', '') for item in data['creators'] | ||
| ] | ||
| authors = ' and '.join(author_list) | ||
| if not authors: | ||
| authors = 'creators not found' | ||
|
|
||
| if 'identifier' in data: | ||
| doi = data['identifier'].get('id', 'doi not found') | ||
| url = f'https://doi.org/{doi}' | ||
|
|
||
| bibtex_entry = textwrap.dedent(f""" | ||
| @misc{{{url}, | ||
| \turl = {{{url}}}, | ||
| \ttitle = {{{title}}}, | ||
| \tpublisher = {{{publisher}}}, | ||
| \tyear = {year}, | ||
| \tauthor = {{{authors}}}, | ||
| \tdoi = {{{doi}}}, | ||
| }} | ||
| """).lstrip() | ||
| return bibtex_entry | ||
|
|
||
|
|
||
| @lru_cache(maxsize=1024) | ||
| def _collect_bibtex_citation(tag): | ||
| """Collect information from bibtex files.""" | ||
| bibtex_file = REFERENCES_PATH / f'{tag}.bibtex' | ||
| if bibtex_file.is_file(): | ||
| entry = bibtex_file.read_text() | ||
| else: | ||
| entry = '' | ||
| logger.warning( | ||
| "The reference file %s does not exist, citation information " | ||
| "incomplete.", bibtex_file) | ||
| return entry | ||
|
|
||
|
|
||
| @lru_cache(maxsize=1024) | ||
| def _collect_cmip_citation(json_url): | ||
| """Collect information from CMIP6 Data Citation Service.""" | ||
| json_data = _get_response(json_url) | ||
| if json_data: | ||
| bibtex_entry = _json_to_bibtex(json_data) | ||
| else: | ||
| bibtex_entry = '' | ||
| return bibtex_entry | ||
|
|
||
|
|
||
| def _make_url_prefix(attribute): | ||
| """Make url prefix based on CMIP6 Data Citation Service.""" | ||
| # the order of keys is important | ||
| localpart = { | ||
| 'mip_era': '', | ||
| 'activity_id': '', | ||
| 'institution_id': '', | ||
| 'source_id': '', | ||
| 'experiment_id': '', | ||
| } | ||
| for key, value in attribute: | ||
| if key.localpart in localpart: | ||
| localpart[key.localpart] = value | ||
| url_prefix = '.'.join(localpart.values()) | ||
| return url_prefix | ||
|
|
||
|
|
||
| def _make_json_url(url_prefix): | ||
| """Make json url based on CMIP6 Data Citation Service.""" | ||
| json_url = f'{CMIP6_URL_STEM}/cerarest/exportcmip6?input={url_prefix}' | ||
| return json_url | ||
|
|
||
|
|
||
| def _make_info_url(url_prefix): | ||
| """Make info url based on CMIP6 Data Citation Service.""" | ||
| info_url = f'{CMIP6_URL_STEM}/cmip6?input={url_prefix}' | ||
| return info_url | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| """Test _citation.py.""" | ||
| import textwrap | ||
|
|
||
| from prov.model import ProvDocument | ||
|
|
||
| import esmvalcore | ||
| from esmvalcore._citation import (CMIP6_URL_STEM, ESMVALTOOL_PAPER, | ||
| _write_citation_files) | ||
| from esmvalcore._provenance import ESMVALTOOL_URI_PREFIX | ||
|
|
||
|
|
||
| def test_references(tmp_path, monkeypatch): | ||
| """Test1: references are replaced with bibtex.""" | ||
| # Create fake provenance | ||
| provenance = ProvDocument() | ||
| provenance.add_namespace('file', uri=ESMVALTOOL_URI_PREFIX + 'file') | ||
| provenance.add_namespace('attribute', | ||
| uri=ESMVALTOOL_URI_PREFIX + 'attribute') | ||
| filename = str(tmp_path / 'output.nc') | ||
| attributes = { | ||
| 'attribute:references': 'test_tag', | ||
| 'attribute:script_file': 'diagnostics.py' | ||
| } | ||
| provenance.entity('file:' + filename, attributes) | ||
|
|
||
| # Create fake bibtex references tag file | ||
| references_path = tmp_path / 'references' | ||
| references_path.mkdir() | ||
| monkeypatch.setattr(esmvalcore._citation, 'REFERENCES_PATH', | ||
| references_path) | ||
| fake_bibtex_file = references_path / 'test_tag.bibtex' | ||
| fake_bibtex = "Fake bibtex file content\n" | ||
| fake_bibtex_file.write_text(fake_bibtex) | ||
|
|
||
| _write_citation_files(filename, provenance) | ||
| citation_file = tmp_path / 'output_citation.bibtex' | ||
| citation = citation_file.read_text() | ||
| assert citation == '\n'.join([ESMVALTOOL_PAPER, fake_bibtex]) | ||
|
|
||
|
|
||
| def mock_get_response(url): | ||
| """Mock _get_response() function.""" | ||
| json_data = False | ||
| if url.lower().startswith('https'): | ||
| json_data = {'titles': ['title is found']} | ||
| return json_data | ||
|
|
||
|
|
||
| def test_cmip6_data_citation(tmp_path, monkeypatch): | ||
| """Test2: CMIP6 citation info is retrieved from ES-DOC.""" | ||
| # Create fake provenance | ||
| provenance = ProvDocument() | ||
| provenance.add_namespace('file', uri=ESMVALTOOL_URI_PREFIX + 'file') | ||
| provenance.add_namespace('attribute', | ||
| uri=ESMVALTOOL_URI_PREFIX + 'attribute') | ||
| attributes = { | ||
| 'attribute:mip_era': 'CMIP6', | ||
| 'attribute:activity_id': 'activity', | ||
| 'attribute:institution_id': 'institution', | ||
| 'attribute:source_id': 'source', | ||
| 'attribute:experiment_id': 'experiment', | ||
| } | ||
| filename = str(tmp_path / 'output.nc') | ||
| provenance.entity('file:' + filename, attributes) | ||
|
|
||
| monkeypatch.setattr(esmvalcore._citation, '_get_response', | ||
| mock_get_response) | ||
| _write_citation_files(filename, provenance) | ||
| citation_file = tmp_path / 'output_citation.bibtex' | ||
|
|
||
| # Create fake bibtex entry | ||
| url = 'url not found' | ||
| title = 'title is found' | ||
| publisher = 'publisher not found' | ||
| year = 'publicationYear not found' | ||
| authors = 'creators not found' | ||
| doi = 'doi not found' | ||
| fake_bibtex_entry = textwrap.dedent(f""" | ||
| @misc{{{url}, | ||
| \turl = {{{url}}}, | ||
| \ttitle = {{{title}}}, | ||
| \tpublisher = {{{publisher}}}, | ||
| \tyear = {year}, | ||
| \tauthor = {{{authors}}}, | ||
| \tdoi = {{{doi}}}, | ||
| }} | ||
| """).lstrip() | ||
| assert citation_file.read_text() == '\n'.join( | ||
| [ESMVALTOOL_PAPER, fake_bibtex_entry]) | ||
|
|
||
|
|
||
| def test_cmip6_data_citation_url(tmp_path): | ||
| """Test3: CMIP6 info_url is retrieved from ES-DOC.""" | ||
| # Create fake provenance | ||
| provenance = ProvDocument() | ||
| provenance.add_namespace('file', uri=ESMVALTOOL_URI_PREFIX + 'file') | ||
| provenance.add_namespace('attribute', | ||
| uri=ESMVALTOOL_URI_PREFIX + 'attribute') | ||
| attributes = { | ||
| 'attribute:mip_era': 'CMIP6', | ||
| 'attribute:activity_id': 'activity', | ||
| 'attribute:institution_id': 'institution', | ||
| 'attribute:source_id': 'source', | ||
| 'attribute:experiment_id': 'experiment', | ||
| } | ||
| filename = str(tmp_path / 'output.nc') | ||
| provenance.entity('file:' + filename, attributes) | ||
| _write_citation_files(filename, provenance) | ||
| citation_url = tmp_path / 'output_data_citation_info.txt' | ||
|
|
||
| # Create fake info url | ||
| fake_url_prefix = '.'.join(attributes.values()) | ||
| text = '\n'.join([ | ||
| "Follow the links below to find more information about CMIP6 data:", | ||
| f"- {CMIP6_URL_STEM}/cmip6?input={fake_url_prefix}", | ||
| '', | ||
| ]) | ||
| assert citation_url.read_text() == text |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.