Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
0e7bb54
Add a method to save citation information
SarahAlidoost Dec 17, 2019
21e56b5
Fixing the function write_citation_file
SarahAlidoost Jan 15, 2020
db80953
Fix the function write_citation_file
SarahAlidoost Jan 21, 2020
4b63ef6
fix the function _write_citation_file
SarahAlidoost Jan 22, 2020
6a38723
style
SarahAlidoost Jan 22, 2020
528eee7
refactor and style
SarahAlidoost Jan 23, 2020
a7a6368
Add esmvaltool paper to the provenance, and style
SarahAlidoost Jan 23, 2020
65ee713
update the tag
SarahAlidoost Jan 23, 2020
36d7e36
Add a method to save citation information
SarahAlidoost Dec 17, 2019
5f70c73
Fixing the function write_citation_file
SarahAlidoost Jan 15, 2020
aff593d
Fix the function write_citation_file
SarahAlidoost Jan 21, 2020
3406215
fix the function _write_citation_file
SarahAlidoost Jan 22, 2020
05f4ce8
style
SarahAlidoost Jan 22, 2020
585cc92
refactor and style
SarahAlidoost Jan 23, 2020
46f7b45
Add esmvaltool paper to the provenance, and style
SarahAlidoost Jan 23, 2020
2ea1c98
update the tag
SarahAlidoost Jan 23, 2020
bb30b19
Merge branch 'save_citation' of github.com:ESMValGroup/ESMValCore int…
SarahAlidoost Jan 24, 2020
638d08c
change the method to a function
SarahAlidoost Jan 27, 2020
f61be82
fix the function get_esmvaltool_porvenance
SarahAlidoost Jan 27, 2020
21f09d2
fix the if-else condition
SarahAlidoost Jan 28, 2020
eb50f0b
Add CMIP citation info, and refactor
SarahAlidoost Jan 30, 2020
c319a9e
remove pybtex, fix _json_to_bibtex function
SarahAlidoost Jan 31, 2020
227460b
Refactor and style
SarahAlidoost Jan 31, 2020
6cccf25
Refactor and style
SarahAlidoost Jan 31, 2020
c482757
fix open_url
SarahAlidoost Jan 31, 2020
824f869
fix the _get_response function
SarahAlidoost Jan 31, 2020
8cc7bab
add documentation
SarahAlidoost Jan 31, 2020
96a5e85
Style
SarahAlidoost Jan 31, 2020
87c7010
Merge remote-tracking branch 'origin/master' into save_citation
Peter9192 Feb 7, 2020
c1214a4
Refactor and style
SarahAlidoost Feb 10, 2020
b0db9ec
add a test checking if jason data includes bibtex keys
SarahAlidoost Feb 11, 2020
cf54ae0
style
SarahAlidoost Feb 11, 2020
c5bcdd5
add new module and remove functions from task
SarahAlidoost Feb 19, 2020
824c0f9
fix citation parts
SarahAlidoost Feb 19, 2020
a0a05da
fix the citation functions, fix provenance to not replace the tags fo…
SarahAlidoost Feb 21, 2020
3a87afe
keep the references tags and not to replace them
SarahAlidoost Feb 21, 2020
b7a6773
remove unnecessary imports and refactor
SarahAlidoost Feb 21, 2020
b88e2dd
update the documentation
SarahAlidoost Feb 21, 2020
438960e
check if the reference folder does not exist
SarahAlidoost Feb 21, 2020
36677e3
remove validating json data
SarahAlidoost Feb 21, 2020
748987c
refactor json to bibtex function
SarahAlidoost Feb 21, 2020
299bd84
refactor
SarahAlidoost Feb 21, 2020
6508877
refactor
SarahAlidoost Feb 21, 2020
df3b008
remove unused function
SarahAlidoost Feb 24, 2020
7b1c339
refactor wrtite and save functions
SarahAlidoost Feb 24, 2020
622bb86
remove new line
SarahAlidoost Feb 24, 2020
0968425
use diagnostics path instaed of finding references path
SarahAlidoost Feb 24, 2020
e7d6bd7
use pathlib instead of os.path
SarahAlidoost Feb 24, 2020
776bd72
add esmvaltool technical paper as default citation entry
SarahAlidoost Feb 24, 2020
d51a3f0
fix the logger error message
SarahAlidoost Feb 24, 2020
b835eac
style and refactor
SarahAlidoost Feb 24, 2020
800cdb9
refactor
SarahAlidoost Feb 24, 2020
b96dc6d
fix broken tests due to removing references and replace tags
SarahAlidoost Feb 24, 2020
8e40366
add a unit test for citation.py
SarahAlidoost Feb 24, 2020
f9e89c8
safe to remove esmvaltool bibtex file
SarahAlidoost Feb 24, 2020
66dd95a
move the esmvaltool paper tag to citation module
SarahAlidoost Feb 25, 2020
85f03f2
add the esmvaltool paper tag
SarahAlidoost Feb 25, 2020
a5da5f3
remove unused import
SarahAlidoost Feb 25, 2020
dc37c39
remove unused import
SarahAlidoost Feb 25, 2020
cdd5181
refactor json to bibtex function
SarahAlidoost Feb 26, 2020
4e65db1
fix tests for _citation.py
SarahAlidoost Feb 26, 2020
1eb18b2
remove unused monkeypatch
SarahAlidoost Feb 26, 2020
2f49ecb
fix typo
SarahAlidoost Feb 28, 2020
789df2b
add support for references that are not in diagnostics, refactor
SarahAlidoost Mar 6, 2020
0b96874
fix test for new codes in citation.py
SarahAlidoost Mar 6, 2020
c452ba6
fix newlines in entries
SarahAlidoost Mar 6, 2020
dfe6e12
style
SarahAlidoost Mar 6, 2020
492c26d
refactor
SarahAlidoost Mar 9, 2020
4332963
add a function to convert bibtex to reference entry
SarahAlidoost Mar 9, 2020
38a1a18
fix the function cite_tag_value
SarahAlidoost Mar 10, 2020
7a061cd
remove the unnecessary condition for TAGS
SarahAlidoost Mar 10, 2020
f30a0d6
add tests to check if references have been added
SarahAlidoost Mar 10, 2020
a3c7e42
refactor
SarahAlidoost Mar 10, 2020
1d81db7
refactor
SarahAlidoost Mar 10, 2020
504a17f
fix broken test
SarahAlidoost Mar 11, 2020
857832b
fix the test for tags in test_recipe
SarahAlidoost Mar 11, 2020
0997805
add a space after , for joining tags
SarahAlidoost Mar 11, 2020
16043f1
remove pop() and refactor
SarahAlidoost Mar 11, 2020
0ee0047
fix flake8 error
SarahAlidoost Mar 12, 2020
b0a2372
remove cite_tag_value
SarahAlidoost Mar 20, 2020
1d64c43
move \t to begning of the line, remove + from get attribute
SarahAlidoost Mar 20, 2020
7bb95a8
refactor write_citation_file function
SarahAlidoost Mar 20, 2020
d57c984
refactor clean_tag function, fix the logger
SarahAlidoost Mar 20, 2020
035a442
fix minor things
SarahAlidoost Mar 20, 2020
f267a02
style
SarahAlidoost Mar 20, 2020
07c04ba
add import, refactor jason_to_bitex func
SarahAlidoost Mar 23, 2020
558e109
move the test to esmvaltool repo
SarahAlidoost Mar 23, 2020
d60b531
undo the changes
SarahAlidoost Mar 23, 2020
824143d
refactor
SarahAlidoost Mar 23, 2020
d85156c
fix the tests
SarahAlidoost Mar 23, 2020
4e9e014
fix get_recipe_provenance function
SarahAlidoost Mar 23, 2020
a1bbbff
refactor extract_tags function
SarahAlidoost Mar 23, 2020
f106cd0
remove esmvaltool_paper_tag
SarahAlidoost Mar 23, 2020
d54c012
remove esmvaltool_paper_tag
SarahAlidoost Mar 23, 2020
a9f1323
refcator bibtex string
SarahAlidoost Mar 24, 2020
ef43a82
add import to fix merge conflict
SarahAlidoost Mar 24, 2020
ddc5e71
remove import from _citation
SarahAlidoost Mar 24, 2020
aba4457
remove lstrip()
SarahAlidoost Mar 24, 2020
27f4eff
add str and fix the test
SarahAlidoost Mar 24, 2020
fb2f057
style
SarahAlidoost Mar 24, 2020
60b9080
refactor write_citation_file function
SarahAlidoost Mar 25, 2020
fd0ff2e
fix multiline docstring
SarahAlidoost Mar 25, 2020
6bb038f
fix title for info_url
SarahAlidoost Mar 25, 2020
c995e66
fix minor things
SarahAlidoost Mar 25, 2020
096e70d
remove duplicated cmip6
SarahAlidoost Mar 25, 2020
f078010
style
SarahAlidoost Mar 25, 2020
66fdf11
Improve text and avoid duplicate citation entries
bouweandela Mar 30, 2020
7a85ee1
Update ESMValTool reference
bouweandela Mar 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 247 additions & 0 deletions esmvalcore/_citation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
"""Citation module."""
import logging
import os
import re
import textwrap
from functools import lru_cache

import requests

from ._config import DIAGNOSTICS_PATH

logger = logging.getLogger(__name__)

REFERENCES_PATH = DIAGNOSTICS_PATH / 'references'

CMIP6_URL_STEM = 'https://cera-www.dkrz.de/WDCC/ui/cerasearch'

# The technical overview paper should always be cited
ESMVALTOOL_PAPER = (
Comment thread
bouweandela marked this conversation as resolved.
"@article{righi20gmd,\n"
"\tdoi = {10.5194/gmd-13-1179-2020},\n"
"\turl = {https://doi.org/10.5194/gmd-13-1179-2020},\n"
"\tyear = {2020},\n"
"\tmonth = mar,\n"
"\tpublisher = {Copernicus {GmbH}},\n"
"\tvolume = {13},\n"
"\tnumber = {3},\n"
"\tpages = {1179--1199},\n"
"\tauthor = {Mattia Righi and Bouwe Andela and Veronika Eyring "
"and Axel Lauer and Valeriu Predoi and Manuel Schlund "
"and Javier Vegas-Regidor and Lisa Bock and Bj\"{o}rn Br\"{o}tz "
"and Lee de Mora and Faruk Diblen and Laura Dreyer "
"and Niels Drost and Paul Earnshaw and Birgit Hassler "
"and Nikolay Koldunov and Bill Little and Saskia Loosveldt Tomas "
"and Klaus Zimmermann},\n"
"\ttitle = {Earth System Model Evaluation Tool (ESMValTool) v2.0 "
"-- technical overview},\n"
"\tjournal = {Geoscientific Model Development}\n"
"}\n")


def _write_citation_files(filename, provenance):
"""
Write citation information provided by the recorded provenance.

Recipe and cmip6 data references are saved into one bibtex file.
cmip6 data references are provided by CMIP6 data citation service.
Each cmip6 data reference has a json link. In the case of internet
connection, cmip6 data references are saved into a bibtex file.
Also, cmip6 data reference links are saved into a text file.
"""
product_name = os.path.splitext(filename)[0]

tags = set()
cmip6_json_urls = set()
cmip6_info_urls = set()
other_info = set()

for item in provenance.records:
# get cmip6 data citation info
cmip6_data = 'CMIP6' in item.get_attribute('attribute:mip_era')
if cmip6_data:
url_prefix = _make_url_prefix(item.attributes)
cmip6_info_urls.add(_make_info_url(url_prefix))
cmip6_json_urls.add(_make_json_url(url_prefix))

# get other citation info
references = item.get_attribute('attribute:references')
if not references:
# ESMValTool CMORization scripts use 'reference' (without final s)
references = item.get_attribute('attribute:reference')
if references:
if item.identifier.namespace.prefix == 'recipe':
# get recipe citation tags
tags.update(references)
elif item.get_attribute('attribute:script_file'):
# get diagnostics citation tags
tags.update(references)
elif not cmip6_data:
# get any other data citation tags, e.g. CMIP5
other_info.update(references)

_save_citation_bibtex(product_name, tags, cmip6_json_urls)
_save_citation_info_txt(product_name, cmip6_info_urls, other_info)


def _save_citation_bibtex(product_name, tags, json_urls):
"""Save the bibtex entries in a bibtex file."""
citation_entries = [ESMVALTOOL_PAPER]

# convert tags to bibtex entries
if tags:
entries = set()
for tag in _extract_tags(tags):
entries.add(_collect_bibtex_citation(tag))
citation_entries.extend(sorted(entries))

# convert json_urls to bibtex entries
entries = set()
for json_url in json_urls:
cmip_citation = _collect_cmip_citation(json_url)
if cmip_citation:
entries.add(cmip_citation)
citation_entries.extend(sorted(entries))

with open(f'{product_name}_citation.bibtex', 'w') as file:
file.write('\n'.join(citation_entries))


def _save_citation_info_txt(product_name, info_urls, other_info):
"""Save all data citation information in one text file."""
lines = []
# Save CMIP6 url_info
if info_urls:
lines.append(
"Follow the links below to find more information about CMIP6 data:"
)
lines.extend(f'- {url}' for url in sorted(info_urls))

# Save any references from the 'references' and 'reference' NetCDF global
# attributes.
if other_info:
if lines:
lines.append('')
lines.append("Additional data citation information was found, for "
"which no entry is available in the bibtex file:")
lines.extend('- ' + str(t).replace('\n', ' ')
for t in sorted(other_info))

if lines:
with open(f'{product_name}_data_citation_info.txt', 'w') as file:
file.write('\n'.join(lines) + '\n')


def _extract_tags(tags):
"""Extract tags.

Tags are recorded as a list of strings converted to a string in provenance.
For example, a single entry in the list `tags` could be the string
"['acknow_project', 'acknow_author']".
"""
pattern = re.compile(r'\w+')
return set(pattern.findall(str(tags)))


def _get_response(url):
"""Return information from CMIP6 Data Citation service in json format."""
json_data = None
if url.lower().startswith('https'):
try:
response = requests.get(url)
if response.status_code == 200:
json_data = response.json()
else:
logger.warning('Error in the CMIP6 citation link: %s', url)
except IOError:
logger.info('No network connection, '
'unable to retrieve CMIP6 citation information')
return json_data


def _json_to_bibtex(data):
"""Make a bibtex entry from CMIP6 Data Citation json data."""
url = 'url not found'
title = data.get('titles', ['title not found'])[0]
publisher = data.get('publisher', 'publisher not found')
year = data.get('publicationYear', 'publicationYear not found')
authors = 'creators not found'
doi = 'doi not found'

if 'creators' in data:
author_list = [
item.get('creatorName', '') for item in data['creators']
]
authors = ' and '.join(author_list)
if not authors:
authors = 'creators not found'

if 'identifier' in data:
doi = data['identifier'].get('id', 'doi not found')
url = f'https://doi.org/{doi}'

bibtex_entry = textwrap.dedent(f"""
@misc{{{url},
\turl = {{{url}}},
\ttitle = {{{title}}},
\tpublisher = {{{publisher}}},
\tyear = {year},
\tauthor = {{{authors}}},
\tdoi = {{{doi}}},
}}
""").lstrip()
return bibtex_entry


@lru_cache(maxsize=1024)
def _collect_bibtex_citation(tag):
"""Collect information from bibtex files."""
bibtex_file = REFERENCES_PATH / f'{tag}.bibtex'
if bibtex_file.is_file():
entry = bibtex_file.read_text()
else:
entry = ''
logger.warning(
"The reference file %s does not exist, citation information "
"incomplete.", bibtex_file)
return entry


@lru_cache(maxsize=1024)
def _collect_cmip_citation(json_url):
"""Collect information from CMIP6 Data Citation Service."""
json_data = _get_response(json_url)
if json_data:
bibtex_entry = _json_to_bibtex(json_data)
else:
bibtex_entry = ''
return bibtex_entry


def _make_url_prefix(attribute):
"""Make url prefix based on CMIP6 Data Citation Service."""
# the order of keys is important
localpart = {
'mip_era': '',
'activity_id': '',
'institution_id': '',
'source_id': '',
'experiment_id': '',
}
for key, value in attribute:
if key.localpart in localpart:
localpart[key.localpart] = value
url_prefix = '.'.join(localpart.values())
return url_prefix


def _make_json_url(url_prefix):
"""Make json url based on CMIP6 Data Citation Service."""
json_url = f'{CMIP6_URL_STEM}/cerarest/exportcmip6?input={url_prefix}'
return json_url


def _make_info_url(url_prefix):
"""Make info url based on CMIP6 Data Citation Service."""
info_url = f'{CMIP6_URL_STEM}/cmip6?input={url_prefix}'
return info_url
2 changes: 1 addition & 1 deletion esmvalcore/_provenance.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def get_recipe_provenance(documentation, filename):
entity = provenance.entity(
'recipe:{}'.format(filename), {
'attribute:description': documentation.get('description', ''),
'attribute:references': ', '.join(
'attribute:references': str(
documentation.get('references', [])),
})

Expand Down
2 changes: 2 additions & 0 deletions esmvalcore/_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
import psutil
import yaml

from ._citation import _write_citation_files
from ._config import DIAGNOSTICS_PATH, TAGS, replace_tags
from ._provenance import TrackedFile, get_task_provenance

Expand Down Expand Up @@ -565,6 +566,7 @@ def _collect_provenance(self):
product = TrackedFile(filename, attributes, ancestors)
product.initialize_provenance(self.activity)
product.save_provenance()
_write_citation_files(product.filename, product.provenance)
self.products.add(product)
logger.debug("Collecting provenance of task %s took %.1f seconds",
self.name,
Expand Down
118 changes: 118 additions & 0 deletions tests/integration/test_citation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
"""Test _citation.py."""
import textwrap

from prov.model import ProvDocument

import esmvalcore
from esmvalcore._citation import (CMIP6_URL_STEM, ESMVALTOOL_PAPER,
_write_citation_files)
from esmvalcore._provenance import ESMVALTOOL_URI_PREFIX


def test_references(tmp_path, monkeypatch):
"""Test1: references are replaced with bibtex."""
# Create fake provenance
provenance = ProvDocument()
provenance.add_namespace('file', uri=ESMVALTOOL_URI_PREFIX + 'file')
provenance.add_namespace('attribute',
uri=ESMVALTOOL_URI_PREFIX + 'attribute')
filename = str(tmp_path / 'output.nc')
attributes = {
'attribute:references': 'test_tag',
'attribute:script_file': 'diagnostics.py'
}
provenance.entity('file:' + filename, attributes)

# Create fake bibtex references tag file
references_path = tmp_path / 'references'
references_path.mkdir()
monkeypatch.setattr(esmvalcore._citation, 'REFERENCES_PATH',
references_path)
fake_bibtex_file = references_path / 'test_tag.bibtex'
fake_bibtex = "Fake bibtex file content\n"
fake_bibtex_file.write_text(fake_bibtex)

_write_citation_files(filename, provenance)
citation_file = tmp_path / 'output_citation.bibtex'
citation = citation_file.read_text()
assert citation == '\n'.join([ESMVALTOOL_PAPER, fake_bibtex])


def mock_get_response(url):
"""Mock _get_response() function."""
json_data = False
if url.lower().startswith('https'):
json_data = {'titles': ['title is found']}
return json_data


def test_cmip6_data_citation(tmp_path, monkeypatch):
"""Test2: CMIP6 citation info is retrieved from ES-DOC."""
# Create fake provenance
provenance = ProvDocument()
provenance.add_namespace('file', uri=ESMVALTOOL_URI_PREFIX + 'file')
provenance.add_namespace('attribute',
uri=ESMVALTOOL_URI_PREFIX + 'attribute')
attributes = {
'attribute:mip_era': 'CMIP6',
'attribute:activity_id': 'activity',
'attribute:institution_id': 'institution',
'attribute:source_id': 'source',
'attribute:experiment_id': 'experiment',
}
filename = str(tmp_path / 'output.nc')
provenance.entity('file:' + filename, attributes)

monkeypatch.setattr(esmvalcore._citation, '_get_response',
mock_get_response)
_write_citation_files(filename, provenance)
citation_file = tmp_path / 'output_citation.bibtex'

# Create fake bibtex entry
url = 'url not found'
title = 'title is found'
publisher = 'publisher not found'
year = 'publicationYear not found'
authors = 'creators not found'
doi = 'doi not found'
fake_bibtex_entry = textwrap.dedent(f"""
@misc{{{url},
\turl = {{{url}}},
\ttitle = {{{title}}},
\tpublisher = {{{publisher}}},
\tyear = {year},
\tauthor = {{{authors}}},
\tdoi = {{{doi}}},
}}
""").lstrip()
assert citation_file.read_text() == '\n'.join(
[ESMVALTOOL_PAPER, fake_bibtex_entry])


def test_cmip6_data_citation_url(tmp_path):
"""Test3: CMIP6 info_url is retrieved from ES-DOC."""
# Create fake provenance
provenance = ProvDocument()
provenance.add_namespace('file', uri=ESMVALTOOL_URI_PREFIX + 'file')
provenance.add_namespace('attribute',
uri=ESMVALTOOL_URI_PREFIX + 'attribute')
attributes = {
'attribute:mip_era': 'CMIP6',
'attribute:activity_id': 'activity',
'attribute:institution_id': 'institution',
'attribute:source_id': 'source',
'attribute:experiment_id': 'experiment',
}
filename = str(tmp_path / 'output.nc')
provenance.entity('file:' + filename, attributes)
_write_citation_files(filename, provenance)
citation_url = tmp_path / 'output_data_citation_info.txt'

# Create fake info url
fake_url_prefix = '.'.join(attributes.values())
text = '\n'.join([
"Follow the links below to find more information about CMIP6 data:",
f"- {CMIP6_URL_STEM}/cmip6?input={fake_url_prefix}",
'',
])
assert citation_url.read_text() == text
Loading