From efb307dfb8da2c09add6f5676a7507e1cc3ebf4b Mon Sep 17 00:00:00 2001 From: "w. Patrick Gale" Date: Mon, 20 Mar 2023 17:41:47 -0400 Subject: [PATCH 1/2] issue 9277 basics --- doc/sphinx-guides/HowToSphinxDockerBuild.md | 150 ++++++++ .../SphinxDocBuildHtml/Dockerfile | 12 + .../SphinxDocBuildHtml/requirements.txt | 2 + .../SphinxDocBuildPDF/Dockerfile | 12 + .../SphinxDocBuildPDF/requirements.txt | 3 + .../SphinxDocLocalHtmlImage/Dockerfile | 3 + .../SphinxDocLocalHtmlImage/README.md | 3 + .../admin/controlledVocabularyProperties.tsv | 5 + .../_static/admin/datasetFieldProperties.tsv | 40 ++ .../_static/admin/displayFormatVariables.tsv | 18 + .../_static/admin/fieldTypeDefinitions.tsv | 9 + .../_static/admin/metadataBlockProperties.tsv | 7 + .../source/_static/api/add-storage-site.json | 6 + .../_static/api/dataset-package-files.json | 132 +++++++ .../source/_static/container/tunables.tsv | 18 + .../source/admin/metadatacustomization.rst | 344 +++--------------- doc/sphinx-guides/source/admin/timers.rst | 68 ++-- doc/sphinx-guides/source/api/native-api.rst | 255 ++++--------- .../source/container/base-image.rst | 195 +++------- .../source/developers/big-data-support.rst | 88 ++--- .../source/developers/dev-environment.rst | 2 +- .../source/developers/make-data-count.rst | 4 +- .../source/developers/testing.rst | 28 +- .../source/developers/troubleshooting.rst | 6 +- .../source/developers/workflows.rst | 2 +- .../source/installation/advanced.rst | 4 +- .../source/installation/config.rst | 160 ++++---- .../source/installation/installation-main.rst | 16 +- .../source/installation/prerequisites.rst | 22 +- .../source/user/dataset-management.rst | 2 +- 30 files changed, 791 insertions(+), 825 deletions(-) create mode 100644 doc/sphinx-guides/HowToSphinxDockerBuild.md create mode 100644 doc/sphinx-guides/SphinxDocBuildHtml/Dockerfile create mode 100644 doc/sphinx-guides/SphinxDocBuildHtml/requirements.txt create mode 100644 doc/sphinx-guides/SphinxDocBuildPDF/Dockerfile create mode 100644 doc/sphinx-guides/SphinxDocBuildPDF/requirements.txt create mode 100644 doc/sphinx-guides/SphinxDocLocalHtmlImage/Dockerfile create mode 100644 doc/sphinx-guides/SphinxDocLocalHtmlImage/README.md create mode 100644 doc/sphinx-guides/source/_static/admin/controlledVocabularyProperties.tsv create mode 100644 doc/sphinx-guides/source/_static/admin/datasetFieldProperties.tsv create mode 100644 doc/sphinx-guides/source/_static/admin/displayFormatVariables.tsv create mode 100644 doc/sphinx-guides/source/_static/admin/fieldTypeDefinitions.tsv create mode 100644 doc/sphinx-guides/source/_static/admin/metadataBlockProperties.tsv create mode 100644 doc/sphinx-guides/source/_static/api/add-storage-site.json create mode 100644 doc/sphinx-guides/source/_static/api/dataset-package-files.json create mode 100644 doc/sphinx-guides/source/_static/container/tunables.tsv diff --git a/doc/sphinx-guides/HowToSphinxDockerBuild.md b/doc/sphinx-guides/HowToSphinxDockerBuild.md new file mode 100644 index 00000000000..f3f8c31bb5f --- /dev/null +++ b/doc/sphinx-guides/HowToSphinxDockerBuild.md @@ -0,0 +1,150 @@ +# About these instructions + +The purpose of this document is to provide instruction on how to build a fresh copy of the Dataverse Sphinx documentation. It will focus on using (Docker scripts)[https://www.docker.com/] to setup the build environment. If you need help with Sphinx, visit [https://www.sphinx-doc.org]. + +The following instructions were written for a bash in a Linux environment (WSL terminal on Windows 11 machine), but should apply to most unix environments. + +Replace `/mnt/q/GitHubRepos/dataverse/doc/sphinx-guides` with your absolute path to the `doc/sphinx-guides` directory on your computer. + +## Configuring your environment variables + +To simplify the instructions you need to create variables for your environment. +Your root Dataverse sphinx-guide path will be set using a `ROOT_DATAVERSE_SG` variable: +`export ROOT_DATAVERSE_SG="/mnt/q/GitHubRepos/dataverse/doc/sphinx-guides"` + +Next create a variable `DKR_DV_PDF` to store the unique name for the Docker image being created to run the PDF Sphinx builds. +`export DKR_DV_PDF="sddi_pdf"` + +Next create a variable `DKR_DV_HTML` to store the unique name for the Docker image being created to run the HTML Sphinx builds. +`export DKR_DV_HTML="sddi_html"` + +This next variable `DKR_DV_HTML_VIEW` is on used if you wish to test the HTML documentation in a local Apache container within Docker. +`export DKR_DV_HTML_VIEW="sddi_html_view"` + +Or, just run all of the commands at one (copy and paste into terminal) + +```s bash +( + export ROOT_DATAVERSE_SG="/mnt/q/GitHubRepos/dataverse/doc/sphinx-guides" + export DKR_DV_PDF="sddi_pdf" + export DKR_DV_HTML="sddi_html" + export DKR_DV_HTML_VIEW="sddi_html_view" +) +``` + +## PDF build scripts using Docker + +First you will generate the PDF version of the Dataverse documentation. The reason for this is HTML documentation creates a link to this PDF file. Below are the bash commands to build a Docker image using Sphinx and Latex. + +```s (bash) +# change to the `SphinxDocBuildPDF` directory +cd $ROOT_DATAVERSE_SG/SphinxDocBuildPDF +# build the Docker image from the Dockerfile script +docker build -t $DKR_DV_PDF . +# change back to the root Dataverse documentation directory +cd $ROOT_DATAVERSE_SG +# create the PDF version of the Dataverse documentation (since you need this for the HTML docs); if errors are thrown at this point, adjust the documentation file in question and rerun this command +docker run --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_PDF make latexpdf +# copy the freshly built PDF file to the source folder under static files (your HTML build will be looking for this file) +cp $ROOT_DATAVERSE_SG/build/latex/Dataverse.pdf "$ROOT_DATAVERSE_SG/source/_static" +``` + +## HTML build scripts using Docker + +Next you will generate the Docker image for building the HTML version of the Dataverse documentation. Below are the bash commands to build a Docker image using Sphinx: + +```s (bash) +cd $ROOT_DATAVERSE_SG/SphinxDocBuildHtml +docker build -t $DKR_DV_HTML . +``` + +Lastly, you can make the Dataverse Sphinx documentation. ***Note the `/[project directory]:/docs` command below gives the impression that a `docs` folder should exist, but this is just standard Sphinx syntax and Sphinx will look through the `source` directory.*** + +```s (bash) +cd $ROOT_DATAVERSE_SG +## remove the existing docker image for HTML processing if needed +# docker image rm -f $DKR_DV_HTML +## if you are rerunning the HTML build then you need to remove the /build/html directory so it can be recreated +# rm -r $ROOT_DATAVERSE_SG/build/html +docker run --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_HTML make html +``` + +To see the documentation build, simply open the `build\html\index.html` file in your web browser. + +```s (bash) +# copy the `SphinxDocLocalHtmlImage/Dockerfile` to the build directory if you are wanting to run a localhost example of the generated documentation (since Docker is only able to 'look within/below' the current directory of the Dockerfile) +cp $ROOT_DATAVERSE_SG/SphinxDocLocalHtmlImage/Dockerfile $ROOT_DATAVERSE_SG/build +# you need to copy the font awesome font files to the html build directory since the `sphinxcontrib.icon` module is not including them +cp -r $ROOT_DATAVERSE_SG/source/_font $ROOT_DATAVERSE_SG/build/html +# change directories to the Sphinx build +cd $ROOT_DATAVERSE_SG/build +# create a Docker static documents image with a copy of the freshly built Sphinx docs +docker build -t $DKR_DV_HTML_VIEW . +# start an Apache container running the static documents image +docker run --publish 80:80 --detach --name localhost_sddi $DKR_DV_HTML_VIEW +# visit http://localhost/index.html in a browser to test the HTML documentation +``` + +## Issues with PDF output + +**DO NOT nest HTML/documentation lists more than three deep (see the issue regarding this on GitHub at [https://github.com/IQSS/dataverse/issues/9277]).** + +If you see errors when building the PDF you can copy the `Dataverse.tex` file contents under the `/build/latex` directory into a LaTeX checker such as [https://www.dainiak.com/latexcheck], but if you run into problems such as the nested documentation lists then the errors can be unhelpful (but likely the documentation file appearing in the error is causing problems in some way). + +### Check the code of the Dataverse.tex file using https://www.dainiak.com/latexcheck/. +- one of the common problems is non-ASCII text being used (such as The character U+2019 "’" could be confused with the character U+0060 "`", which is more common in source code) +- also do not include emojis in the documentation +- If you would like to search for possible problematic characters run `LC_ALL=C find . -type f -exec grep -c -P -n "[^\x00-\x7F]" {} + ` within the source directory (any files with non-ASCII characters will have a number to the right greater than zero). If the `./developers/dependencies.rst` file happens to have any non-ASCII characters then you can check the location of the characters using `LC_ALL=C grep --color='auto' -P -n "[\x80-\xFF]" ./developers/dependencies.rst`. Note: not all non-ASCII characters are problematic. + +## If you are new to Sphinx then you can use the following Docker command to create a starter Sphinx environment + +You use the Docker image you just built to create the Sphinx project template: + +```s (bash) +docker run -it --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_HTML sphinx-quickstart +``` + +At this point you have a boilerplate `source` folder with some `hello world` documentation. You can copy the dataverse documentation source from the `dataverse\doc\sphinx-guides\source` GitHub directory and replace the boiler plate source directory Sphinx just created for us. + +## Changelog for files + +The following updates were performed due to the files throwing errors on the PDF builds. + +\doc\sphinx-guides\source\admin\metadatacustomization.rst +- had to remove the table and convert to CSV table, removed non-ASCII characters, and links to download files (which are not stored in the document downloads and it does not make sense to have these files downloadable for those already working with the code) +- adding metadataBlockProperties.tsv, datasetFieldProperties.tsv, controlledVocabularyProperties.tsv, fieldTypeDefinitions.tsv, displayFormatVariables.tsv + +\doc\sphinx-guides\source\api\native-api.rst +- removed non-ASCII characters and links to download code files +- added \docs\source\_static\api\dataset-package-files.json + +\doc\sphinx-guides\source\developers\big-data-support.rst +- explictly stated commands and removed code file download +- added /docs/source/_static/api/add-storage-site.json + +\doc\sphinx-guides\source\developers\dev-environment.rst +\doc\sphinx-guides\source\developers\make-data-count.rst +- removed code file download + +\doc\sphinx-guides\source\developers\testing.rst +\doc\sphinx-guides\source\developers\troubleshooting.rst +- removed non-ASCII characters, and links to download files + +\doc\sphinx-guides\source\installation\advanced.rst +- links to download files + +\doc\sphinx-guides\source\installation\config.rst +- removed non-ASCII characters, explictly stated commands, and links to download files + +\doc\sphinx-guides\source\installation\installation-main.rst +\doc\sphinx-guides\source\installation\prerequisites.rst +- removed non-ASCII characters and links to download files + +\doc\sphinx-guides\source\admin\timers.rst +- removed nested lists + +\doc\sphinx-guides\source\developers\workflows.rst +- removed non-ASCII characters + +\doc\sphinx-guides\source\container\base-image.rst +- tied `:widths: auto` for csv-table but the description would not wrap the description \ No newline at end of file diff --git a/doc/sphinx-guides/SphinxDocBuildHtml/Dockerfile b/doc/sphinx-guides/SphinxDocBuildHtml/Dockerfile new file mode 100644 index 00000000000..6bbc058c9ff --- /dev/null +++ b/doc/sphinx-guides/SphinxDocBuildHtml/Dockerfile @@ -0,0 +1,12 @@ +# get latest sphinx image +FROM sphinxdoc/sphinx:latest + +RUN export DEBIAN_FRONTEND=noninteractive \ + && apt-get update && apt-get install --yes --no-install-recommends wget rsync git && \ + apt-get autoremove -y && \ + pip3 install --upgrade pip setuptools && \ + rm -r /root/.cache + +WORKDIR /docs +ADD requirements.txt /docs +RUN pip3 install -r requirements.txt diff --git a/doc/sphinx-guides/SphinxDocBuildHtml/requirements.txt b/doc/sphinx-guides/SphinxDocBuildHtml/requirements.txt new file mode 100644 index 00000000000..0a6dd02f0a7 --- /dev/null +++ b/doc/sphinx-guides/SphinxDocBuildHtml/requirements.txt @@ -0,0 +1,2 @@ +sphinx_bootstrap_theme +sphinx-icon \ No newline at end of file diff --git a/doc/sphinx-guides/SphinxDocBuildPDF/Dockerfile b/doc/sphinx-guides/SphinxDocBuildPDF/Dockerfile new file mode 100644 index 00000000000..f9ad6cda108 --- /dev/null +++ b/doc/sphinx-guides/SphinxDocBuildPDF/Dockerfile @@ -0,0 +1,12 @@ +# get latest sphinx image with latext pdf +FROM sphinxdoc/sphinx-latexpdf:latest + +RUN export DEBIAN_FRONTEND=noninteractive \ + && apt-get update && apt-get install --yes --no-install-recommends wget rsync git && \ + apt-get autoremove -y && \ + pip3 install --upgrade pip setuptools && \ + rm -r /root/.cache + +WORKDIR /docs +ADD requirements.txt /docs +RUN pip3 install -r requirements.txt diff --git a/doc/sphinx-guides/SphinxDocBuildPDF/requirements.txt b/doc/sphinx-guides/SphinxDocBuildPDF/requirements.txt new file mode 100644 index 00000000000..b661846f0cb --- /dev/null +++ b/doc/sphinx-guides/SphinxDocBuildPDF/requirements.txt @@ -0,0 +1,3 @@ +sphinx_bootstrap_theme +sphinx-icon +pdflatex \ No newline at end of file diff --git a/doc/sphinx-guides/SphinxDocLocalHtmlImage/Dockerfile b/doc/sphinx-guides/SphinxDocLocalHtmlImage/Dockerfile new file mode 100644 index 00000000000..f407a3ed2fc --- /dev/null +++ b/doc/sphinx-guides/SphinxDocLocalHtmlImage/Dockerfile @@ -0,0 +1,3 @@ +# Note: this file MUST BE in the root build directory +FROM httpd:2.4 +COPY ./html/ /usr/local/apache2/htdocs/ \ No newline at end of file diff --git a/doc/sphinx-guides/SphinxDocLocalHtmlImage/README.md b/doc/sphinx-guides/SphinxDocLocalHtmlImage/README.md new file mode 100644 index 00000000000..678df013f49 --- /dev/null +++ b/doc/sphinx-guides/SphinxDocLocalHtmlImage/README.md @@ -0,0 +1,3 @@ +# About this directory + +The Dockerfile in this directory needs to be copied to the /build directory where the static documentation files reside. The Docker file will copy the documentation files and to a Docker image that can be run using a simple Docker Apache container. See the [HOWTO-SPHINX-INSTALL.md] instructions. \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/admin/controlledVocabularyProperties.tsv b/doc/sphinx-guides/source/_static/admin/controlledVocabularyProperties.tsv new file mode 100644 index 00000000000..05aafffc4a3 --- /dev/null +++ b/doc/sphinx-guides/source/_static/admin/controlledVocabularyProperties.tsv @@ -0,0 +1,5 @@ +Property Purpose Allowed values and restrictions +DatasetField Specifies the #datasetField to which #datasetField to which this entry applies. "Must reference an existing #datasetField. As a best practice, the value should reference a #datasetField in the current metadata block definition. (It is technically possible to reference an existing #datasetField from another metadata block.)" +Value "A short display string, representing an enumerated value for this field. If the identifier property is empty, this value is used as the identifier." Free text +identifier "A string used to encode the selected enumerated value of a field. If this property is empty, the value of the “Value” field is used as the identifier." Free text +displayOrder Control the order in which the enumerated values are displayed for selection. Non-negative integer. \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/admin/datasetFieldProperties.tsv b/doc/sphinx-guides/source/_static/admin/datasetFieldProperties.tsv new file mode 100644 index 00000000000..e8003a54e2b --- /dev/null +++ b/doc/sphinx-guides/source/_static/admin/datasetFieldProperties.tsv @@ -0,0 +1,40 @@ +Property Purpose Allowed values and restrictions +name A user-definable string used to identify a #datasetField. Maps directly to field name used by Solr. "- (from DatasetFieldType.java) The internal DDI-like name, no spaces, etc. +- (from Solr) Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. Names with both leading and trailing underscores (e.g. _version_) are reserved. +- Must not collide with a field of the same same name in another #metadataBlock definition or any name already included as a field in the Solr index." +title Acts as a brief label for display related to this #datasetField. Should be relatively brief. +description Used to provide a description of the field. Free text +watermark A string to initially display in a field as a prompt for what the user should enter. Free text +fieldType "Defines the type of content that the field, if not empty, is meant to contain." "- none +- date +- email +- text +- textbox +- url +- int +- float +- See below for fieldtype definitions" +displayOrder "Controls the sequence in which the fields are displayed, both for input and presentation." Non-negative integer. +displayFormat "Controls how the content is displayed for presentation (not entry). The value of this field may contain one or more special variables (enumerated below). HTML tags, likely in conjunction with one or more of these values, may be used to control the display of content in the web UI." See below for displayFormat variables +advancedSearchField Specify whether this field is available in advanced search. TRUE (available) or FALSE (not available) +allowControlledVocabulary Specify whether the possible values of this field are determined by values in the #controlledVocabulary section. TRUE (controlled) or FALSE (not controlled) +allowmultiples Specify whether this field is repeatable. TRUE (repeatable) or FALSE (not repeatable) +facetable "Specify whether the field is facetable (i.e., if the expected values for this field are themselves useful search terms for this field). If a field is “facetable” (able to be faceted on), it appears under “Browse/Search Facets” when you edit “General Information” for a Dataverse collection. Setting this value to TRUE generally makes sense for enumerated or controlled vocabulary fields, fields representing identifiers (IDs, names, email addresses), and other fields that are likely to share values across entries. It is less likely to make sense for fields containing descriptions, floating point numbers, and other values that are likely to be unique." TRUE (controlled) or FALSE (not controlled) +displayoncreate [5]_ "Designate fields that should display during the creation of a new dataset, even before the dataset is saved. Fields not so designated will not be displayed until the dataset has been saved." TRUE (display during creation) or FALSE (don't display during creation) +required "For primitive fields, specify whether or not the field is required. + +For compound fields, also specify if one or more subfields are required or conditionally required. At least one instance of a required field must be present. More than one instance of a field may be allowed, depending on the value of allowmultiples.B15" "For primitive fields, TRUE (required) or FALSE (optional). + +For compound fields: + +- To make one or more subfields optional, the parent field and subfield(s) must be FALSE (optional). +- To make one or more subfields required, the parent field and the required subfield(s) must be TRUE (required). +- To make one or more subfields conditionally required, make the parent field FALSE (optional) and make TRUE (required) any subfield or subfields that are required if any other subfields are filled. +" +parent "For subfields, specify the name of the parent or containing field." "- Must not result in a cyclical reference. +- Must reference an existing field in the same #metadataBlock. " +metadatablock_id Specify the name of the #metadataBlock that contains this field. "- Must reference an existing #metadataBlock. +- As a best practice, the value should reference the #metadataBlock in the current definition (it is technically possible to reference another existing metadata block.)" +termURI "Specify a global URI identifying this term in an external community vocabulary. + +This value overrides the default (created by appending the property name to the blockURI defined for the #metadataBlock)" "For example, the existing citation #metadataBlock defines the property named 'title' as http://purl.org/dc/terms/title - i.e. indicating that it can be interpreted as the Dublin Core term 'title'" \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/admin/displayFormatVariables.tsv b/doc/sphinx-guides/source/_static/admin/displayFormatVariables.tsv new file mode 100644 index 00000000000..98750610dec --- /dev/null +++ b/doc/sphinx-guides/source/_static/admin/displayFormatVariables.tsv @@ -0,0 +1,18 @@ +Variable Description +(blank) "The displayFormat is left blank for primitive fields (e.g. subtitle) and fields that do not take values (e.g. author), since displayFormats do not work for these fields." +#VALUE The value of the field (instance level). +#NAME The name of the field (class level). +#EMAIL For displaying emails. +"#VALUE" For displaying the value as a link (if the value entered is a link). +"#VALUE" "For displaying the value as a link, with the value included in the URL (e.g. if URL is \http://emsearch.rutgers.edu/atlas/#VALUE_summary.html, and the value entered is 1001, the field is displayed as `1001 `__ (hyperlinked to http://emsearch.rutgers.edu/atlas/1001_summary.html))." +"
" For displaying the image of an entered image URL (used to display images in the producer and distributor logos metadata fields). +"#VALUE: + +#VALUE: + +(#VALUE)" "Appends and/or prepends characters to the value of the field. e.g. if the displayFormat for the distributorAffiliation is (#VALUE) (wrapped with parens) and the value entered is University of North Carolina, the field is displayed in the UI as (University of North Carolina)." +"; + +: + +," "Displays the character (e.g. semicolon, comma) between the values of fields within compound fields. For example, if the displayFormat for the compound field ""series"" is a colon, and if the value entered for seriesName is IMPs and for seriesInformation is A collection of NMR data, the compound field is displayed in the UI as IMPs: A collection of NMR data." \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/admin/fieldTypeDefinitions.tsv b/doc/sphinx-guides/source/_static/admin/fieldTypeDefinitions.tsv new file mode 100644 index 00000000000..f096d61214a --- /dev/null +++ b/doc/sphinx-guides/source/_static/admin/fieldTypeDefinitions.tsv @@ -0,0 +1,9 @@ +Fieldtype Definition +none "Used for compound fields, in which case the parent field would have no value and display no data entry control." +date "A date, expressed in one of three resolutions of the form YYYY-MM-DD, YYYY-MM, or YYYY." +email A valid email address. Not indexed for privacy reasons. +text Any text other than newlines may be entered into this field. +textbox "Any text may be entered. For input, the Dataverse Software presents a multi-line area that accepts newlines. While any HTML is permitted, only a subset of HTML tags will be rendered in the UI. See the :ref:`supported-html-fields` section of the Dataset + File Management page in the User Guide." +url "If not empty, field must contain a valid URL." +int An integer value destined for a numeric field. +float A floating point number destined for a numeric field. \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/admin/metadataBlockProperties.tsv b/doc/sphinx-guides/source/_static/admin/metadataBlockProperties.tsv new file mode 100644 index 00000000000..4fba5d1ce6c --- /dev/null +++ b/doc/sphinx-guides/source/_static/admin/metadataBlockProperties.tsv @@ -0,0 +1,7 @@ +Property Purpose Allowed values and restrictions +name A user-definable string used to identify a #metadataBlock "- No spaces or punctuation, except underscore +- By convention, should start with a letter, and use lower camel case [3]_ +- Must not collide with a field of the same name in the same or any other #datasetField definition, including metadata blocks defined elsewhere [4]_" +dataverseAlias "If specified, this metadata block will be available only to the Dataverse collection designated here by its alias and to children of that Dataverse collection." "Free text. For an example, see ``/scripts/api/data/metadatablocks/custom_hbgdki.tsv``." +displayName Acts as a brief label for display related to this #metadataBlock. "Should be relatively brief. The limit is 256 characters, but very long names might cause display problems." +blockURI Associates the properties in a block with an external URI. Properties will be assigned the global identifier blockURI in the OAI_ORE metadata and archival Bags The citation #metadataBlock has the blockURI https://dataverse.org/schema/citation/ which assigns a default global URI to terms such as https://dataverse.org/schema/citation/subtitle \ No newline at end of file diff --git a/doc/sphinx-guides/source/_static/api/add-storage-site.json b/doc/sphinx-guides/source/_static/api/add-storage-site.json new file mode 100644 index 00000000000..d13ec2f165d --- /dev/null +++ b/doc/sphinx-guides/source/_static/api/add-storage-site.json @@ -0,0 +1,6 @@ +{ + "hostname": "dataverse.librascholar.edu", + "name": "LibraScholar, USA", + "primaryStorage": true, + "transferProtocols": "rsync,posix,globus" +} diff --git a/doc/sphinx-guides/source/_static/api/dataset-package-files.json b/doc/sphinx-guides/source/_static/api/dataset-package-files.json new file mode 100644 index 00000000000..ce89f83c307 --- /dev/null +++ b/doc/sphinx-guides/source/_static/api/dataset-package-files.json @@ -0,0 +1,132 @@ +{ + "datasetVersion": { + "license": { + "name": "CC0 1.0", + "uri": "http://creativecommons.org/publicdomain/zero/1.0" + }, + "protocol":"doi", + "authority":"10.502", + "identifier":"ZZ7/MOSEISLEYDB94", + "metadataBlocks": { + "citation": { + "fields": [ + { + "typeName": "title", + "multiple": false, + "value": "Imported dataset with package files No. 3", + "typeClass": "primitive" + }, + { + "typeName": "productionDate", + "multiple": false, + "value": "2011-02-23", + "typeClass": "primitive" + }, + { + "typeName": "dsDescription", + "multiple": true, + "value": [ + { + "dsDescriptionValue": { + "typeName": "dsDescriptionValue", + "multiple": false, + "value": "Native Dataset", + "typeClass": "primitive" + } + } + ], + "typeClass": "compound" + }, + { + "typeName": "subject", + "multiple": true, + "value": [ + "Medicine, Health and Life Sciences" + ], + "typeClass": "controlledVocabulary" + }, + { + "typeName": "author", + "multiple": true, + "value": [ + { + "authorAffiliation": { + "typeName": "authorAffiliation", + "multiple": false, + "value": "LibraScholar Medical School", + "typeClass": "primitive" + }, + "authorName": { + "typeName": "authorName", + "multiple": false, + "value": "Doc, Bob", + "typeClass": "primitive" + } + }, + { + "authorAffiliation": { + "typeName": "authorAffiliation", + "multiple": false, + "value": "LibraScholar Medical School", + "typeClass": "primitive" + }, + "authorName": { + "typeName": "authorName", + "multiple": false, + "value": "Prof, Arthur", + "typeClass": "primitive" + } + } + ], + "typeClass": "compound" + }, + { + "typeName": "depositor", + "multiple": false, + "value": "Prof, Arthur", + "typeClass": "primitive" + }, + { + "typeName": "datasetContact", + "multiple": true, + "value": [ + { + "datasetContactEmail": { + "typeName": "datasetContactEmail", + "multiple": false, + "value": "aprof@mailinator.com", + "typeClass": "primitive" + } + } + ], + "typeClass": "compound" + } + ], + "displayName": "Citation Metadata" + } + }, + "files": [ + { + "description": "", + "label": "pub", + "restricted": false, + "version": 1, + "datasetVersionId": 1, + "dataFile": { + "id": 4, + "filename": "pub", + "contentType": "application/vnd.dataverse.file-package", + "filesize": 1698795873, + "description": "", + "storageIdentifier": "162017e5ad5-ee2a2b17fee9", + "originalFormatLabel": "UNKNOWN", + "rootDataFileId": -1, + "checksum": { + "type": "SHA-1", + "value": "54bc7ddb096a490474bd8cc90cbed1c96730f350" + } + } + } + ] + } +} diff --git a/doc/sphinx-guides/source/_static/container/tunables.tsv b/doc/sphinx-guides/source/_static/container/tunables.tsv new file mode 100644 index 00000000000..b9e98150b3c --- /dev/null +++ b/doc/sphinx-guides/source/_static/container/tunables.tsv @@ -0,0 +1,18 @@ +Env. variable Default Type Description +`DEPLOY_PROPS` (empty) String Set to add arguments to generated `asadmin deploy` commands. +`PREBOOT_COMMANDS` [preboot]_ Abs. path Provide path to file with `asadmin` commands to run **before** boot of application server. See also `Pre/postboot script docs`_. +`POSTBOOT_COMMANDS` [postboot]_ Abs. path Provide path to file with `asadmin` commands to run **after** boot of application server. See also `Pre/postboot script docs`_. +`JVM_ARGS` (empty) String Additional arguments to pass to application server's JVM on start. +`MEM_MAX_RAM_PERCENTAGE` `70.0` Percentage "Maximum amount of container's allocated RAM to be used as heap space. Make sure to leave some room for native memory, OS overhead etc!" +`MEM_XSS` 512k Size Tune the maximum JVM stack size. +`MEM_MIN_HEAP_FREE_RATIO` `20` Integer Make the heap shrink aggressively and grow conservatively. See also `run-java-sh recommendations`_. +`MEM_MAX_HEAP_FREE_RATIO` `40` Integer Make the heap shrink aggressively and grow conservatively. See also `run-java-sh recommendations`_. +`MEM_MAX_GC_PAUSE_MILLIS` `500` Milliseconds Shorter pause times might result in lots of collections causing overhead without much gain. This needs monitoring and tuning. It's a complex matter. +`MEM_METASPACE_SIZE` `256m` Size "Initial size of memory reserved for class metadata, also used as trigger to run a garbage collection once passing this size." +`MEM_MAX_METASPACE_SIZE` `2g` Size The metaspace's size will not outgrow this limit. +`ENABLE_DUMPS` `0` "Bool,`0|1`" "If enabled, the argument(s) given in `JVM_DUMP_ARG` will be added to the JVM starting up. This means it will enable dumping the heap to `${DUMPS_DIR}` (see below) in out of memory"" cases. (You should back this location with disk space / ramdisk, so it does not write into an overlay filesystem!)""" +`JVM_DUMPS_ARG` [dump-option]_ String Can be fine tuned for more grained controls of dumping behaviour. +`ENABLE_JMX` `0` "Bool,`0|1`" "Allow insecure JMX connections, enable AMX and tune all JMX monitoring levels to `HIGH`. See also `Payara Docs - Basic Monitoring `_. A basic JMX service is enabled by default in Payara, exposing basic JVM MBeans, but especially no Payara MBeans." +`ENABLE_JDWP` `0` "Bool,`0|1`" "Enable the Java Debug Wire Protocol"" to attach a remote debugger to the JVM in this container. Listens on port 9009 when enabled. Search the internet for numerous tutorials to use it.""" +`ENABLE_RELOAD` `0` "Bool,`0|1`" "Enable the dynamic hot"" reloads of files when changed in a deployment. Useful for development, when new artifacts are copied into the running domain.""" +`DATAVERSE_HTTP_TIMEOUT` 900 Seconds See :ref:`:ApplicationServerSettings` `http.request-timeout-seconds`. *Note:* can also be set using any other `MicroProfile Config Sources`_ available via `dataverse.http.timeout`. diff --git a/doc/sphinx-guides/source/admin/metadatacustomization.rst b/doc/sphinx-guides/source/admin/metadatacustomization.rst index 9fb8626d4c4..8805665be53 100644 --- a/doc/sphinx-guides/source/admin/metadatacustomization.rst +++ b/doc/sphinx-guides/source/admin/metadatacustomization.rst @@ -13,15 +13,15 @@ Before you embark on customizing metadata in your Dataverse installation you sho Much more customization of metadata is possible, but this is an advanced topic so feedback on what is written below is very welcome. The possibilities for customization include: -- Editing and adding metadata fields +- Editing and adding metadata fields -- Editing and adding instructional text (field label tooltips and text box watermarks) +- Editing and adding instructional text (field label tooltips and text box watermarks) -- Editing and adding controlled vocabularies +- Editing and adding controlled vocabularies -- Changing which fields depositors must use in order to save datasets (see also :ref:`dataset-templates` section of the User Guide.) +- Changing which fields depositors must use in order to save datasets (see also :ref:`dataset-templates` section of the User Guide.) -- Changing how saved metadata values are displayed in the UI +- Changing how saved metadata values are displayed in the UI Generally speaking it is safer to create your own custom metadata block rather than editing metadata blocks that ship with the Dataverse Software, because changes to these blocks may be made in future releases. If you'd like to make improvements to any of the metadata blocks shipped with the Dataverse Software, please open an issue at https://github.com/IQSS/dataverse/issues so it can be discussed before a pull request is made. Please note that the metadata blocks shipped with the Dataverse Software are based on standards (e.g. DDI for social science) and you can learn more about these standards in the :doc:`/user/appendix` section of the User Guide. If you have developed your own custom metadata block that you think may be of interest to the Dataverse community, please create an issue and consider making a pull request as described in the :doc:`/developers/version-control` section of the Developer Guide. @@ -49,337 +49,77 @@ the metadata block TSV. 1. metadataBlock - - Purpose: Represents the metadata block being defined. + - Purpose: Represents the metadata block being defined. - - Cardinality: + - Cardinality: - - 0 or more per Dataverse installation + - 0 or more per Dataverse installation - - 1 per Metadata Block definition + - 1 per Metadata Block definition 2. datasetField - - Purpose: Each entry represents a metadata field to be defined + - Purpose: Each entry represents a metadata field to be defined within a metadata block. - - Cardinality: 1 or more per metadataBlock + - Cardinality: 1 or more per metadataBlock 3. controlledVocabulary - - Purpose: Each entry enumerates an allowed value for a given + - Purpose: Each entry enumerates an allowed value for a given datasetField. - - Cardinality: zero or more per datasetField + - Cardinality: zero or more per datasetField Each of the three main sections own sets of properties: #metadataBlock properties ~~~~~~~~~~~~~~~~~~~~~~~~~ -+----------------+---------------------------------------------------------+---------------------------------------------------------+ -| **Property** | **Purpose** | **Allowed values and restrictions** | -+----------------+---------------------------------------------------------+---------------------------------------------------------+ -| name | A user-definable string used to identify a | \• No spaces or punctuation, except underscore. | -| | #metadataBlock | | -| | | \• By convention, should start with a letter, and use | -| | | lower camel case [3]_ | -| | | | -| | | \• Must not collide with a field of the same name in | -| | | the same or any other #datasetField definition, | -| | | including metadata blocks defined elsewhere. [4]_ | -+----------------+---------------------------------------------------------+---------------------------------------------------------+ -| dataverseAlias | If specified, this metadata block will be available | Free text. For an example, see custom_hbgdki.tsv. | -| | only to the Dataverse collection designated here by | | -| | its alias and to children of that Dataverse collection. | | -+----------------+---------------------------------------------------------+---------------------------------------------------------+ -| displayName | Acts as a brief label for display related to this | Should be relatively brief. The limit is 256 character, | -| | #metadataBlock. | but very long names might cause display problems. | -+----------------+---------------------------------------------------------+---------------------------------------------------------+ -| blockURI | Associates the properties in a block with an external | The citation #metadataBlock has the blockURI | -| | URI. | https://dataverse.org/schema/citation/ which assigns a | -| | Properties will be assigned the | default global URI to terms such as | -| | global identifier blockURI in the OAI_ORE | https://dataverse.org/schema/citation/subtitle | -| | metadata and archival Bags | | -+----------------+---------------------------------------------------------+---------------------------------------------------------+ +.. csv-table:: + :header-rows: 1 + :widths: 20, 30, 65 + :delim: tab + :file: ../_static/admin/metadataBlockProperties.tsv #datasetField (field) properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| **Property** | **Purpose** | **Allowed values and restrictions** | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| name | A user-definable string used to identify a | \• (from DatasetFieldType.java) The internal DDI-like | | -| | #datasetField. Maps directly to field name used by | name, no spaces, etc. | | -| | Solr. | | | -| | | \• (from Solr) Field names should consist of | | -| | | alphanumeric or underscore characters only and not start | | -| | | with a digit. This is not currently strictly enforced, | | -| | | but other field names will not have first class | | -| | | support from all components and back compatibility | | -| | | is not guaranteed. | | -| | | Names with both leading and trailing underscores | | -| | | (e.g. \_version_) are reserved. | | -| | | | | -| | | \• Must not collide with a field of | | -| | | the same same name in another #metadataBlock | | -| | | definition or any name already included as a | | -| | | field in the Solr index. | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| title | Acts as a brief label for display | Should be relatively brief. | | -| | related to this #datasetField. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| description | Used to provide a description of the | Free text | | -| | field. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| watermark | A string to initially display in a field | Free text | | -| | as a prompt for what the user should enter. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| fieldType | Defines the type of content that the | | \• none | -| | field, if not empty, is meant to contain. | | \• date | -| | | | \• email | -| | | | \• text | -| | | | \• textbox | -| | | | \• url | -| | | | \• int | -| | | | \• float | -| | | | \• See below for | -| | | | fieldtype definitions | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| displayOrder | Controls the sequence in which the fields | Non-negative integer. | | -| | are displayed, both for input and | | | -| | presentation. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| displayFormat | Controls how the content is displayed | See below for displayFormat | | -| | for presentation (not entry). The value of | variables | | -| | this field may contain one or more | | | -| | special variables (enumerated below). | | | -| | HTML tags, likely in conjunction with one | | | -| | or more of these values, may be used | | | -| | to control the display of content in | | | -| | the web UI. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| advancedSearchField | Specify whether this field is available in | TRUE (available) or | | -| | advanced search. | FALSE (not available) | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| allowControlledVocabulary | Specify whether the possible values of | TRUE (controlled) or FALSE (not | | -| | this field are determined by values | controlled) | | -| | in the #controlledVocabulary section. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| allowmultiples | Specify whether this field is repeatable. | TRUE (repeatable) or FALSE (not | | -| | | repeatable) | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| facetable | Specify whether the field is facetable | TRUE (controlled) or FALSE (not | | -| | (i.e., if the expected values for | controlled) | | -| | this field are themselves useful | | | -| | search terms for this field). If a field is | | | -| | "facetable" (able to be faceted on), it | | | -| | appears under "Browse/Search | | | -| | Facets" when you edit | | | -| | "General Information" for a Dataverse | | | -| | collection. | | | -| | Setting this value to TRUE generally makes | | | -| | sense for enumerated or controlled | | | -| | vocabulary fields, fields representing | | | -| | identifiers (IDs, names, email | | | -| | addresses), and other fields that are | | | -| | likely to share values across | | | -| | entries. It is less likely to make sense | | | -| | for fields containing descriptions, | | | -| | floating point numbers, and other | | | -| | values that are likely to be unique. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| displayoncreate [5]_ | Designate fields that should display during | TRUE (display during creation) or FALSE | | -| | the creation of a new dataset, even before | (don’t display during creation) | | -| | the dataset is saved. | | | -| | Fields not so designated will not | | | -| | be displayed until the dataset has been | | | -| | saved. | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| required | For primitive fields, specify whether or not the | For primitive fields, TRUE | | -| | field is required. | (required) or FALSE (optional). | | -| | | | | -| | For compound fields, also specify if one or more | For compound fields: | | -| | subfields are required or conditionally required. At | | | -| | least one instance of a required field must be | \• To make one or more | | -| | present. More than one instance of a field may be | subfields optional, the parent | | -| | allowed, depending on the value of allowmultiples. | field and subfield(s) must be | | -| | | FALSE (optional). | | -| | | | | -| | | \• To make one or more subfields | | -| | | required, the parent field and | | -| | | the required subfield(s) must be | | -| | | TRUE (required). | | -| | | | | -| | | \• To make one or more subfields | | -| | | conditionally required, make the | | -| | | parent field FALSE (optional) | | -| | | and make TRUE (required) any | | -| | | subfield or subfields that are | | -| | | required if any other subfields | | -| | | are filled. | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| parent | For subfields, specify the name of the parent or | \• Must not result in a cyclical reference. | | -| | containing field. | | | -| | | \• Must reference an existing field in the same | | -| | | #metadataBlock. | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| metadatablock_id | Specify the name of the #metadataBlock that contains | \• Must reference an existing #metadataBlock. | | -| | this field. | | | -| | | \• As a best practice, the value should reference the | | -| | | #metadataBlock in the current | | -| | | definition (it is technically | | -| | | possible to reference another | | -| | | existing metadata block.) | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ -| termURI | Specify a global URI identifying this term in an | For example, the existing citation | | -| | external community vocabulary. | #metadataBlock defines the property | | -| | | named 'title' as http://purl.org/dc/terms/title | | -| | This value overrides the default (created by appending | - i.e. indicating that it can | | -| | the property name to the blockURI defined for the | be interpreted as the Dublin Core term 'title' | | -| | #metadataBlock) | | | -+---------------------------+--------------------------------------------------------+----------------------------------------------------------+-----------------------+ +.. csv-table:: + :header-rows: 1 + :widths: auto + :class: longtable + :delim: tab + :file: ../_static/admin/datasetFieldProperties.tsv #controlledVocabulary (enumerated) properties ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -+--------------+--------------------------------------------+-----------------------------------------+ -| **Property** | **Purpose** | **Allowed values and restrictions** | -+--------------+--------------------------------------------+-----------------------------------------+ -| DatasetField | Specifies the #datasetField to which | Must reference an existing | -| | #datasetField to which this entry applies. | #datasetField. | -| | | As a best practice, the value should | -| | | reference a #datasetField in the | -| | | current metadata block definition. (It | -| | | is technically possible to reference | -| | | an existing #datasetField from | -| | | another metadata block.) | -+--------------+--------------------------------------------+-----------------------------------------+ -| Value | A short display string, representing | Free text | -| | an enumerated value for this field. If | | -| | the identifier property is empty, | | -| | this value is used as the identifier. | | -+--------------+--------------------------------------------+-----------------------------------------+ -| identifier | A string used to encode the selected | Free text | -| | enumerated value of a field. If this | | -| | property is empty, the value of the | | -| | “Value” field is used as the identifier. | | -+--------------+--------------------------------------------+-----------------------------------------+ -| displayOrder | Control the order in which the enumerated | Non-negative integer. | -| | values are displayed for selection. | | -+--------------+--------------------------------------------+-----------------------------------------+ +.. csv-table:: + :header-rows: 1 + :widths: 20, 50, 50 + :delim: tab + :file: ../_static/admin/controlledVocabularyProperties.tsv FieldType definitions ~~~~~~~~~~~~~~~~~~~~~ -+---------------+------------------------------------+ -| **Fieldtype** | **Definition** | -+---------------+------------------------------------+ -| none | Used for compound fields, in which | -| | case the parent field would have | -| | no value and display no data | -| | entry control. | -+---------------+------------------------------------+ -| date | A date, expressed in one of three | -| | resolutions of the form | -| | YYYY-MM-DD, YYYY-MM, or YYYY. | -+---------------+------------------------------------+ -| email | A valid email address. Not | -| | indexed for privacy reasons. | -+---------------+------------------------------------+ -| text | Any text other than newlines may | -| | be entered into this field. | -+---------------+------------------------------------+ -| textbox | Any text may be entered. For | -| | input, the Dataverse Software | -| | presents a | -| | multi-line area that accepts | -| | newlines. While any HTML is | -| | permitted, only a subset of HTML | -| | tags will be rendered in the UI. | -| | See the | -| | :ref:`supported-html-fields` | -| | section of the Dataset + File | -| | Management page in the User Guide. | -+---------------+------------------------------------+ -| url | If not empty, field must contain | -| | a valid URL. | -+---------------+------------------------------------+ -| int | An integer value destined for a | -| | numeric field. | -+---------------+------------------------------------+ -| float | A floating point number destined | -| | for a numeric field. | -+---------------+------------------------------------+ +.. csv-table:: + :header-rows: 1 + :widths: 20, 100 + :delim: tab + :file: ../_static/admin/fieldTypeDefinitions.tsv displayFormat variables ~~~~~~~~~~~~~~~~~~~~~~~ These are common ways to use the displayFormat to control how values are displayed in the UI. This list is not exhaustive. -+---------------------------------+--------------------------------------------------------+ -| **Variable** | **Description** | -+---------------------------------+--------------------------------------------------------+ -| (blank) | The displayFormat is left blank | -| | for primitive fields (e.g. | -| | subtitle) and fields that do not | -| | take values (e.g. author), since | -| | displayFormats do not work for | -| | these fields. | -+---------------------------------+--------------------------------------------------------+ -| #VALUE | The value of the field (instance level). | -+---------------------------------+--------------------------------------------------------+ -| #NAME | The name of the field (class level). | -+---------------------------------+--------------------------------------------------------+ -| #EMAIL | For displaying emails. | -+---------------------------------+--------------------------------------------------------+ -| #VALUE | For displaying the value as a | -| | link (if the value entered is a | -| | link). | -+---------------------------------+--------------------------------------------------------+ -| #VALUE | For displaying the value as a | -| | link, with the value included in | -| | the URL (e.g. if URL is | -| | \http://emsearch.rutgers.edu/atla\ | -| | \s/#VALUE_summary.html, | -| | and the value entered is 1001, | -| | the field is displayed as | -| | `1001 `__ | -| | (hyperlinked to | -| | http://emsearch.rutgers.edu/atlas/1001_summary.html)). | -+---------------------------------+--------------------------------------------------------+ -|
| entered image URL (used to | -| | display images in the producer | -| | and distributor logos metadata | -| | fields). | -+---------------------------------+--------------------------------------------------------+ -| #VALUE: | Appends and/or prepends | -| | characters to the value of the | -| \- #VALUE: | field. e.g. if the displayFormat | -| | for the distributorAffiliation is | -| (#VALUE) | (#VALUE) (wrapped with parens) | -| | and the value entered | -| | is University of North | -| | Carolina, the field is displayed | -| | in the UI as (University of | -| | North Carolina). | -+---------------------------------+--------------------------------------------------------+ -| ; | Displays the character (e.g. | -| | semicolon, comma) between the | -| : | values of fields within | -| | compound fields. For example, | -| , | if the displayFormat for the | -| | compound field “series” is a | -| | colon, and if the value | -| | entered for seriesName is | -| | IMPs and for | -| | seriesInformation is A | -| | collection of NMR data, the | -| | compound field is displayed in | -| | the UI as IMPs: A | -| | collection of NMR data. | -+---------------------------------+--------------------------------------------------------+ +.. csv-table:: + :header-rows: 1 + :widths: auto + :delim: tab + :file: ../_static/admin/displayFormatVariables.tsv Metadata Block Setup -------------------- @@ -505,7 +245,7 @@ the Solr schema configuration, including any enabled metadata schemas: ``curl "http://localhost:8080/api/admin/index/solr/schema"`` -You can use :download:`update-fields.sh <../../../../conf/solr/8.11.1/update-fields.sh>` to easily add these to the +You can use ``update-fields.sh`` under ``/conf/solr/8.11.1/update-fields.sh`` to easily add these to the Solr schema you installed for your Dataverse installation. The script needs a target XML file containing your Solr schema. (See the :doc:`/installation/prerequisites/` section of @@ -595,8 +335,8 @@ Footnotes https://www.iana.org/assignments/media-types/text/tab-separated-values .. [2] - Although the structure of the data, as you’ll see below, violates the - “Each record must have the same number of fields” tenet of TSV + Although the structure of the data, as you'll see below, violates the + "Each record must have the same number of fields" tenet of TSV .. [3] https://en.wikipedia.org/wiki/CamelCase diff --git a/doc/sphinx-guides/source/admin/timers.rst b/doc/sphinx-guides/source/admin/timers.rst index f9ac8d8a498..fc04d7a60ee 100644 --- a/doc/sphinx-guides/source/admin/timers.rst +++ b/doc/sphinx-guides/source/admin/timers.rst @@ -6,7 +6,7 @@ Dataverse Installation Application Timers Your Dataverse installation uses timers to automatically run scheduled Harvest and Metadata export jobs. .. contents:: |toctitle| - :local: + :local: Dedicated timer server in a Dataverse Installation server cluster ----------------------------------------------------------------- @@ -53,39 +53,39 @@ This timer is created automatically from an @Schedule annotation on the makeLink This timer runs a weekly job to create links for any saved searches that haven't been linked yet. -This job is automatically scheduled to run once a week at 12:30AM local time on Sunday. If really necessary, it is possible to change that time by deploying the application war file with an ejb-jar.xml file in the WEB-INF directory of the war file. A :download:`sample file <../_static/admin/ejb-jar.xml>` would run the job every Tuesday at 2:30PM. The schedule can be modified to your choice by editing the fields in the session section. If other EJBs require some form of configuration using an ejb-jar file, there should be one ejb-jar file for the entire application, which can have different sections for each EJB. Below are instructions for the simple case of adding the ejb-jar.xml for the first time and making a custom schedule for the saved search timer. - -* Create or edit dataverse/src/main/webapp/WEB-INF/ejb-jar.xml, following the :download:`sample file <../_static/admin/ejb-jar.xml>` provided. - -* Edit the parameters in the section of the ejb-jar file in the WEB-INF directory to suit your preferred schedule - - * The provided parameters in the sample file are , , and ; additional parameters are available - - * For a complete reference for calendar expressions that can be used to schedule Timer services see: https://docs.oracle.com/javaee/7/tutorial/ejb-basicexamples004.htm - -* Build and deploy the application - -* Alternatively, you can insert an ejb-jar.xml file into a provided Dataverse Software war file without building the application. - - * Check if there is already an ejb-jar.xml file in the war file - - * jar tvf $DATAVERSE-WAR-FILENAME | grep ejb-jar.xml - - * if the response includes " WEB-INF/ejb-jar.xml", you will need to extract the ejb-jar.xml file for editing - - * jar xvf $DATAVERSE-WAR-FILENAME WEB-INF/ejb-jar.xml - - * edit the extracted WEB-INF/ejb-jar.xml, following the :download:`sample file <../_static/admin/ejb-jar.xml>` provided. - - * if the response is empty, create a WEB-INF directory and create en ejb-jar.xml file in it, following the :download:`sample file <../_static/admin/ejb-jar.xml>` provided. - - * edit the parameters in the section of the WEB-INF/ejb-jar.xml to suit your preferred schedule - - * Insert the edited WEB-INF/ejb-jar.xml into the dataverse war file - - * jar uvf $DATAVERSE-WAR-FILENAME WEB-INF/ejb-jar.xml - - * Deploy the war file +This job is automatically scheduled to run once a week at 12:30AM local time on Sunday. If really necessary, it is possible to change that time by deploying the application war file with an ejb-jar.xml file in the WEB-INF directory of the war file. A sample file at ``/_static/admin/ejb-jar.xml`` would run the job every Tuesday at 2:30PM. The schedule can be modified to your choice by editing the fields in the session section. If other EJBs require some form of configuration using an ejb-jar file, there should be one ejb-jar file for the entire application, which can have different sections for each EJB. Below are instructions for the simple case of adding the ejb-jar.xml for the first time and making a custom schedule for the saved search timer. + + Create or edit ``dataverse/src/main/webapp/WEB-INF/ejb-jar.xml``, following the sample file provided at ``/_static/admin/ejb-jar.xml``. + + Edit the parameters in the section of the ejb-jar file in the WEB-INF directory to suit your preferred schedule. + + The provided parameters in the sample file are , , and ; additional parameters are available. + + For a complete reference for calendar expressions that can be used to schedule Timer services see: https://docs.oracle.com/javaee/7/tutorial/ejb-basicexamples004.htm + + Build and deploy the application. + + Alternatively, you can insert an ejb-jar.xml file into a provided Dataverse Software war file without building the application. + + Check if there is already an ejb-jar.xml file in the war file: + + :command:`jar tvf $DATAVERSE-WAR-FILENAME | grep ejb-jar.xml` + + If the response includes ``WEB-INF/ejb-jar.xml``, you will need to extract the ejb-jar.xml file for editing: + + :command:`jar xvf $DATAVERSE-WAR-FILENAME WEB-INF/ejb-jar.xml` + + Edit the extracted WEB-INF/ejb-jar.xml, following the sample file provided at ``/_static/admin/ejb-jar.xml``. + + If the response is empty, create a WEB-INF directory and create en ejb-jar.xml file in it, following the sample file provided at ``/_static/admin/ejb-jar.xml``. + + Edit the parameters in the section of the WEB-INF/ejb-jar.xml to suit your preferred schedule. + + Insert the edited WEB-INF/ejb-jar.xml into the dataverse war file: + + :command:`jar uvf $DATAVERSE-WAR-FILENAME WEB-INF/ejb-jar.xml` + + Deploy the war file See also :ref:`saved-search` in the API Guide. diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index 3cd469e3883..81872c7c92a 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -9,7 +9,7 @@ The Dataverse Software exposes most of its GUI functionality via a REST-based AP .. _CORS: https://www.w3.org/TR/cors/ -.. warning:: The Dataverse Software's API is versioned at the URI - all API calls may include the version number like so: ``http://server-address/api/v1/...``. Omitting the ``v1`` part would default to the latest API version (currently 1). When writing scripts/applications that will be used for a long time, make sure to specify the API version, so they don't break when the API is upgraded. +.. warning:: The Dataverse Software's API is versioned at the URI - all API calls may include the version number like so: :samp:`http://server-address/api/v1/...`. Omitting the :samp:`v1` part would default to the latest API version (currently 1). When writing scripts/applications that will be used for a long time, make sure to specify the API version, so they don't break when the API is upgraded. .. contents:: |toctitle| :local: @@ -30,7 +30,7 @@ The steps for creating a Dataverse collection are: - Figure out the alias or database id of the "parent" Dataverse collection into which you will be creating your new Dataverse collection. - Execute a curl command or equivalent. -Download :download:`dataverse-complete.json <../_static/api/dataverse-complete.json>` file and modify it to suit your needs. The fields ``name``, ``alias``, and ``dataverseContacts`` are required. The controlled vocabulary for ``dataverseType`` is the following: +Use the ``dataverse-complete.json`` file located at ``/_static/api/dataverse-complete.json`` and modify it to suit your needs. The fields ``name``, ``alias``, and ``dataverseContacts`` are required. The controlled vocabulary for ``dataverseType`` is the following: - ``DEPARTMENT`` - ``JOURNALS`` @@ -227,7 +227,7 @@ The fully expanded example above (without environment variables) looks like this curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/dataverses/root/facets --upload-file dataverse-facets.json -Where :download:`dataverse-facets.json <../_static/api/dataverse-facets.json>` contains a JSON encoded list of metadata keys (e.g. ``["authorName","authorAffiliation"]``). +Where ``facets.json`` contains a JSON encoded list of metadata keys (e.g. ``["authorName","authorAffiliation"]``). List Metadata Block Facets Configured for a Dataverse Collection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -271,7 +271,7 @@ The fully expanded example above (without environment variables) looks like this curl -H X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST -H "Content-type:application/json" https://demo.dataverse.org/api/dataverses/root/metadatablockfacets --upload-file metadata-block-facets.json -Where :download:`metadata-block-facets.json <../_static/api/metadata-block-facets.json>` contains a JSON encoded list of metadata block names (e.g. ``["socialscience","geospatial"]``). This endpoint supports an empty list (e.g. ``[]``) +The ``metadata-block-facets.json`` located at ``/_static/api/metadata-block-facets.json`` contains a JSON encoded list of metadata block names (e.g. ``["socialscience","geospatial"]``). This endpoint supports an empty list (e.g. ``[]``) .. _metadata-block-facet-root-api: @@ -319,8 +319,8 @@ Where ``roles.json`` looks like this:: { "alias": "sys1", - "name": “Restricted System Role”, - "description": “A person who may only add datasets.”, + "name": "Restricted System Role", + "description": "A person who may only add datasets.", "permissions": [ "AddDataset" ] @@ -449,7 +449,7 @@ Define Metadata Blocks for a Dataverse Collection You can define the metadata blocks available to authors within a Dataverse collection. -The metadata blocks that are available with a default Dataverse installation are in :download:`define-metadatablocks.json <../_static/api/define-metadatablocks.json>` (also shown below) and you should download this file and edit it to meet your needs. Please note that the "citation" metadata block is required. You must have "EditDataverse" permission on the Dataverse collection. +The metadata blocks that are available with a default Dataverse installation are in ``define-metadatablocks.json`` located at ``/_static/api/define-metadatablocks.json`` (also shown below) and you should download this file and edit it to meet your needs. Please note that the "citation" metadata block is required. You must have "EditDataverse" permission on the Dataverse collection. .. literalinclude:: ../_static/api/define-metadatablocks.json @@ -526,7 +526,7 @@ To create a dataset, you must supply a JSON file that contains at least the foll - Description Text - Subject -As a starting point, you can download :download:`dataset-finch1.json <../../../../scripts/search/tests/data/dataset-finch1.json>` and modify it to meet your needs. (:download:`dataset-finch1_fr.json <../../../../scripts/api/data/dataset-finch1_fr.json>` is a variant of this file that includes setting the metadata language (see :ref:`:MetadataLanguages`) to French (fr). In addition to this minimal example, you can download :download:`dataset-create-new-all-default-fields.json <../../../../scripts/api/data/dataset-create-new-all-default-fields.json>` which populates all of the metadata fields that ship with a Dataverse installation.) +As a starting point, you can use ``dataset-finch1.json`` located at ``/scripts/search/tests/data/dataset-finch1.json`` and modify it to meet your needs. (``dataset-create-new-all-default-fields.json`` located at ``/scripts/api/data/dataset-finch1_fr.json`` is a variant of this file that includes setting the metadata language (see :ref:`:MetadataLanguages`) to French (fr). In addition to this minimal example, you can use ``dataset-create-new-all-default-fields.json`` located at ``/scripts/api/data/dataset-create-new-all-default-fields.json`` which populates all of the metadata fields that ship with a Dataverse installation.) The curl command below assumes you have kept the name "dataset-finch1.json" and that this file is in your current working directory. @@ -582,7 +582,7 @@ The optional ``release`` parameter tells the Dataverse installation to immediate The JSON format is the same as that supported by the native API's :ref:`create dataset command`, although it also allows packages. For example: -.. literalinclude:: ../../../../scripts/api/data/dataset-package-files.json +.. literalinclude:: ../_static/api/dataset-package-files.json Before calling the API, make sure the data files referenced by the ``POST``\ ed JSON are placed in the dataset directory with filenames matching their specified storage identifiers. In installations using POSIX storage, these files must be made readable by the app server user. @@ -621,9 +621,9 @@ The optional ``pid`` parameter holds a persistent identifier (such as a DOI or H The optional ``release`` parameter tells the Dataverse installation to immediately publish the dataset. If the parameter is changed to ``no``, the imported dataset will remain in ``DRAFT`` status. -The file is a DDI XML file. A sample DDI XML file may be downloaded here: :download:`ddi_dataset.xml <../_static/api/ddi_dataset.xml>` +The file is a DDI XML file. A sample DDI XML file ``ddi_dataset.xml`` is located at ``/_static/api/ddi_dataset.xml`` -Note that DDI XML does not have a field that corresponds to the "Subject" field in Dataverse. Therefore the "Import DDI" API endpoint populates the "Subject" field with ``N/A``. To update the "Subject" field one will need to call the :ref:`edit-dataset-metadata-api` API with a JSON file that contains an update to "Subject" such as :download:`subject-update-metadata.json <../_static/api/subject-update-metadata.json>`. Alternatively, the web interface can be used to add a subject. +Note that DDI XML does not have a field that corresponds to the "Subject" field in Dataverse. Therefore the "Import DDI" API endpoint populates the "Subject" field with ``N/A``. To update the "Subject" field one will need to call the :ref:`edit-dataset-metadata-api` API with a JSON file that contains an update to "Subject" such as ``subject-update-metadata.json`` located at ``/_static/api/subject-update-metadata.json``. Alternatively, the web interface can be used to add a subject. .. warning:: @@ -730,12 +730,13 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key:$API_TOKEN" https://demo.dataverse.org/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB -|CORS| Show the dataset whose database id is passed: + +|CORS| Show the dataset whose id is passed: .. code-block:: bash export SERVER_URL=https://demo.dataverse.org - export ID=24 + export ID=408730 curl $SERVER_URL/api/datasets/$ID @@ -743,7 +744,7 @@ The fully expanded example above (without environment variables) looks like this .. code-block:: bash - curl https://demo.dataverse.org/api/datasets/24 + curl https://demo.dataverse.org/api/datasets/408730 The dataset id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``). @@ -1033,7 +1034,7 @@ Updates the metadata for a dataset. If a draft of the dataset already exists, th You must download a JSON representation of the dataset, edit the JSON you download, and then send the updated JSON to the Dataverse installation. -For example, after making your edits, your JSON file might look like :download:`dataset-update-metadata.json <../_static/api/dataset-update-metadata.json>` which you would send to the Dataverse installation like this: +For example, after making your edits, your JSON file might look like ``dataset-update-metadata.json`` located at ``/_static/api/dataset-update-metadata.json`` which you would send to the Dataverse installation like this: .. code-block:: bash @@ -1108,7 +1109,7 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/:persistentId/editMetadata/?persistentId=doi:10.5072/FK2/BCCP9Z&replace=true --upload-file dataset-update-metadata.json -For these edits your JSON file need only include those dataset fields which you would like to edit. A sample JSON file may be downloaded here: :download:`dataset-edit-metadata-sample.json <../_static/api/dataset-edit-metadata-sample.json>` +For these edits your JSON file need only include those dataset fields which you would like to edit. A sample JSON file ``dataset-edit-metadata-sample.json`` can be found at ``/_static/api/dataset-edit-metadata-sample.json`` Delete Dataset Metadata ~~~~~~~~~~~~~~~~~~~~~~~ @@ -1129,7 +1130,7 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/datasets/:persistentId/deleteMetadata/?persistentId=doi:10.5072/FK2/BCCP9Z --upload-file dataset-delete-author-metadata.json -For these deletes your JSON file must include an exact match of those dataset fields which you would like to delete. A sample JSON file may be downloaded here: :download:`dataset-delete-author-metadata.json <../_static/api/dataset-delete-author-metadata.json>` +For these deletes your JSON file must include an exact match of those dataset fields which you would like to delete. A sample JSON file ``dataset-delete-author-metadata.json`` can be found at ``/_static/api/dataset-delete-author-metadata.json`` .. _publish-dataset-api: @@ -1512,38 +1513,6 @@ The fully expanded example above (without environment variables) looks like this curl -H X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId=doi:10.5072/FK2/J8SJZB -F 'jsonData={"description":"A remote image.","storageIdentifier":"trsa://themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png","checksumType":"MD5","md5Hash":"509ef88afa907eaf2c17c1c8d8fde77e","label":"testlogo.png","fileName":"testlogo.png","mimeType":"image/png"}' -.. _cleanup-storage-api: - -Cleanup storage of a Dataset -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -This is an experimental feature and should be tested on your system before using it in production. -Also, make sure that your backups are up-to-date before using this on production servers. -It is advised to first call this method with the ``dryrun`` parameter set to ``true`` before actually deleting the files. -This will allow you to manually inspect the files that would be deleted if that parameter is set to ``false`` or is omitted (a list of the files that would be deleted is provided in the response). - -If your Dataverse installation has been configured to support direct uploads, or in some other situations, -you could end up with some files in the storage of a dataset that are not linked to that dataset directly. Most commonly, this could -happen when an upload fails in the middle of a transfer, i.e. if a user does a UI direct upload and leaves the page without hitting cancel or save, -Dataverse doesn't know and doesn't clean up the files. Similarly in the direct upload API, if the final /addFiles call isn't done, the files are abandoned. - -All the files stored in the Dataset storage location that are not in the file list of that Dataset (and follow the naming pattern of the dataset files) can be removed, as shown in the example below. - -.. code-block:: bash - - export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export SERVER_URL=https://demo.dataverse.org - export PERSISTENT_ID=doi:10.5072/FK2/J8SJZB - export DRYRUN=true - - curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/:persistentId/cleanStorage?persistentId=$PERSISTENT_ID&dryrun=$DRYRUN" - -The fully expanded example above (without environment variables) looks like this: - -.. code-block:: bash - - curl -H X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X GET https://demo.dataverse.org/api/datasets/:persistentId/cleanStorage?persistentId=doi:10.5072/FK2/J8SJZB&dryrun=true - Adding Files To a Dataset via Other Tools ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -1627,7 +1596,7 @@ Here's how curators can send a "reason for return" to the dataset authors. First .. literalinclude:: ../_static/api/reason-for-return.json -In the example below, the curator has saved the JSON file as :download:`reason-for-return.json <../_static/api/reason-for-return.json>` in their current working directory. Then, the curator sends this JSON file to the ``returnToAuthor`` API endpoint like this: +In the example below, the curator has saved the JSON file ``reason-for-return.json`` located at ``/_static/api/reason-for-return.json`` in their current working directory. Then, the curator sends this JSON file to the ``returnToAuthor`` API endpoint like this: .. code-block:: bash @@ -1725,7 +1694,7 @@ The API will output the list of locks, for example:: If the dataset is not locked (or if there is no lock of the requested type), the API will return an empty list. -The following API end point will lock a Dataset with a lock of specified type. Note that this requires “superuser” credentials: +The following API end point will lock a Dataset with a lock of specified type. Note that this requires "superuser" credentials: .. code-block:: bash @@ -1742,7 +1711,7 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/datasets/24/lock/Ingest -Use the following API to unlock the dataset, by deleting all the locks currently on the dataset. Note that this requires “superuser” credentials: +Use the following API to unlock the dataset, by deleting all the locks currently on the dataset. Note that this requires "superuser" credentials: .. code-block:: bash @@ -1758,7 +1727,7 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/24/locks -Or, to delete a lock of the type specified only. Note that this requires “superuser” credentials: +Or, to delete a lock of the type specified only. Note that this requires "superuser" credentials: .. code-block:: bash @@ -1782,7 +1751,7 @@ If the dataset is not locked (or if there is no lock of the specified type), the List Locks Across All Datasets ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Note that this API requires “superuser” credentials. You must supply the ``X-Dataverse-key`` header with the api token of an admin user (as in the example below). +Note that this API requires "superuser" credentials. You must supply the ``X-Dataverse-key`` header with the api token of an admin user (as in the example below). The output of this API is formatted identically to the API that lists the locks for a specific dataset, as in one of the examples above. @@ -2091,77 +2060,6 @@ The response is a JSON object described in the :doc:`/api/external-tools` sectio Files ----- -Get JSON Representation of a File -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``. - -Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB*: - -.. code-block:: bash - - export SERVER_URL=https://demo.dataverse.org - export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB - export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - curl -H "X-Dataverse-key:$API_TOKEN" $SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER - -The fully expanded example above (without environment variables) looks like this: - -.. code-block:: bash - - curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB - -You may get its draft version of an unpublished file if you pass an api token with view draft permissions: - -.. code-block:: bash - - export SERVER_URL=https://demo.dataverse.org - export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB - export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - curl -H "X-Dataverse-key:$API_TOKEN" $SERVER/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER - -The fully expanded example above (without environment variables) looks like this: - -.. code-block:: bash - - curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB - - -|CORS| Show the file whose id is passed: - -.. code-block:: bash - - export SERVER_URL=https://demo.dataverse.org - export ID=408730 - - curl $SERVER_URL/api/file/$ID - -The fully expanded example above (without environment variables) looks like this: - -.. code-block:: bash - - curl https://demo.dataverse.org/api/files/408730 - -You may get its draft version of an published file if you pass an api token with view draft permissions and use the draft path parameter: - -.. code-block:: bash - - export SERVER_URL=https://demo.dataverse.org - export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB - export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - - curl -H "X-Dataverse-key:$API_TOKEN" $SERVER/api/files/:persistentId/draft/?persistentId=$PERSISTENT_IDENTIFIER - -The fully expanded example above (without environment variables) looks like this: - -.. code-block:: bash - - curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" https://demo.dataverse.org/api/files/:persistentId/draft/?persistentId=doi:10.5072/FK2/J8SJZB - -The file id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``). - Adding Files ~~~~~~~~~~~~ @@ -2359,47 +2257,6 @@ Currently the following methods are used to detect file types: - The file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``. - The file name (e.g. "Dockerfile") is used, defined in a file called ``MimeTypeDetectionByFileName.properties``. -.. _extractNcml: - -Extract NcML -~~~~~~~~~~~~ - -As explained in the :ref:`netcdf-and-hdf5` section of the User Guide, when those file types are uploaded, an attempt is made to extract an NcML file from them and store it as an auxiliary file. - -This happens automatically but superusers can also manually trigger this NcML extraction process with the API endpoint below. - -Note that "true" will be returned if an NcML file was created. "false" will be returned if there was an error or if the NcML file already exists (check server.log for details). - -.. code-block:: bash - - export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export SERVER_URL=https://demo.dataverse.org - export ID=24 - - curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/$ID/extractNcml" - -The fully expanded example above (without environment variables) looks like this: - -.. code-block:: bash - - curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/24/extractNcml - -A curl example using a PID: - -.. code-block:: bash - - export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx - export SERVER_URL=https://demo.dataverse.org - export PERSISTENT_ID=doi:10.5072/FK2/AAA000 - - curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/files/:persistentId/extractNcml?persistentId=$PERSISTENT_ID" - -The fully expanded example above (without environment variables) looks like this: - -.. code-block:: bash - - curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/extractNcml?persistentId=doi:10.5072/FK2/AAA000" - Replacing Files ~~~~~~~~~~~~~~~ @@ -2518,6 +2375,48 @@ The fully expanded example above (without environment variables) looks like this Note: The ``id`` returned in the json response is the id of the file metadata version. + +Adding File Metadata +~~~~~~~~~~~~~~~~~~~~ + +This API call requires a ``jsonString`` expressing the metadata of multiple files. It adds file metadata to the database table where the file has already been copied to the storage. + +The jsonData object includes values for: + +* "description" - A description of the file +* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset +* "storageIdentifier" - String +* "fileName" - String +* "mimeType" - String +* "fixity/checksum" either: + + * "md5Hash" - String with MD5 hash value, or + * "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings + +.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below. + +A curl example using an ``PERSISTENT_ID`` + +* ``SERVER_URL`` - e.g. https://demo.dataverse.org +* ``API_TOKEN`` - API endpoints require an API token that can be passed as the X-Dataverse-key HTTP header. For more details, see the :doc:`auth` section. +* ``PERSISTENT_IDENTIFIER`` - Example: ``doi:10.5072/FK2/7U7YBV`` + +.. code-block:: bash + + export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx + export SERVER_URL=https://demo.dataverse.org + export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV + export JSON_DATA="[{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}, \ + {'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53', 'fileName':'file2.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123789'}}]" + + curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA" + +The fully expanded example above (without environment variables) looks like this: + +.. code-block:: bash + + curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/datasets/:persistentId/addFiles?persistentId=doi:10.5072/FK2/7U7YBV -F jsonData='[{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}, {"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123789"}}]' + Updating File Metadata ~~~~~~~~~~~~~~~~~~~~~~ @@ -2589,7 +2488,7 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT https://demo.dataverse.org/api/edit/24 --upload-file dct.xml -You can download :download:`dct.xml <../../../../src/test/resources/xml/dct.xml>` from the example above to see what the XML looks like. +You can use the ``dct.xml`` located at ``/src/test/resources/xml/dct.xml`` from the example above to see what the XML looks like. Provenance ~~~~~~~~~~ @@ -2742,7 +2641,7 @@ The fully expanded example above (without environment variables) looks like this curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST "https://demo.dataverse.org/api/files/:persistentId/prov-freeform?persistentId=doi:10.5072/FK2/AAA000" -H "Content-type:application/json" --upload-file provenance.json -See a sample JSON file :download:`file-provenance.json <../_static/api/file-provenance.json>` from https://openprovenance.org (c.f. Huynh, Trung Dong and Moreau, Luc (2014) ProvStore: a public provenance repository. At 5th International Provenance and Annotation Workshop (IPAW'14), Cologne, Germany, 09-13 Jun 2014. pp. 275-277). +See a sample ``file-provenance.json`` located at ``/_static/api/file-provenance.json`` from http://openprovenance.org (c.f. Huynh, Trung Dong and Moreau, Luc (2014) ProvStore: a public provenance repository. At 5th International Provenance and Annotation Workshop (IPAW'14), Cologne, Germany, 09-13 Jun 2014. pp. 275-277). Delete Provenance JSON for an uploaded file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -2873,7 +2772,7 @@ Create a Builtin User For security reasons, builtin users cannot be created via API unless the team who runs the Dataverse installation has populated a database setting called ``BuiltinUsers.KEY``, which is described under :ref:`securing-your-installation` and :ref:`database-settings` sections of Configuration in the Installation Guide. You will need to know the value of ``BuiltinUsers.KEY`` before you can proceed. -To create a builtin user via API, you must first construct a JSON document. You can download :download:`user-add.json <../_static/api/user-add.json>` or copy the text below as a starting point and edit as necessary. +To create a builtin user via API, you must first construct a JSON document. You can use the ``user-add.json`` located at ``/_static/api/user-add.json`` or copy the text below as a starting point and edit as necessary. .. literalinclude:: ../_static/api/user-add.json @@ -2909,8 +2808,8 @@ Where ``roles.json`` looks like this:: { "alias": "sys1", - "name": “Restricted System Role”, - "description": “A person who may only add datasets.”, + "name": "Restricted System Role", + "description": "A person who may only add datasets.", "permissions": [ "AddDataset" ] @@ -3287,7 +3186,7 @@ Create a Harvesting Set To create a harvesting set you must supply a JSON file that contains the following fields: - Name: Alpha-numeric may also contain -, _, or %, but no spaces. Must also be unique in the installation. -- Definition: A search query to select the datasets to be harvested. For example, a query containing authorName:YYY would include all datasets where ‘YYY’ is the authorName. +- Definition: A search query to select the datasets to be harvested. For example, a query containing authorName:YYY would include all datasets where 'YYY' is the authorName. - Description: Text that describes the harvesting set. The description appears in the Manage Harvesting Sets dashboard and in API responses. This field is optional. An example JSON file would look like this:: @@ -3295,7 +3194,7 @@ An example JSON file would look like this:: { "name":"ffAuthor", "definition":"authorName:Finch, Fiona", - "description":"Fiona Finch’s Datasets" + "description":"Fiona Finch's Datasets" } .. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below. @@ -3320,7 +3219,7 @@ Modify an Existing Harvesting Set To modify a harvesting set, you must supply a JSON file that contains one or both of the following fields: -- Definition: A search query to select the datasets to be harvested. For example, a query containing authorName:YYY would include all datasets where ‘YYY’ is the authorName. +- Definition: A search query to select the datasets to be harvested. For example, a query containing authorName:YYY would include all datasets where 'YYY' is the authorName. - Description: Text that describes the harvesting set. The description appears in the Manage Harvesting Sets dashboard and in API responses. This field is optional. Note that you may not modify the name of an existing harvesting set. @@ -3329,7 +3228,7 @@ An example JSON file would look like this:: { "definition":"authorName:Finch, Fiona AND subject:trees", - "description":"Fiona Finch’s Datasets with subject of trees" + "description":"Fiona Finch's Datasets with subject of trees" } .. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below. @@ -3441,8 +3340,7 @@ The following optional fields are supported: - archiveDescription: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data." - set: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything". - style: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation). -- customHeaders: This can be used to configure this client with a specific HTTP header that will be added to every OAI request. This is to accommodate a use case where the remote server requires this header to supply some form of a token in order to offer some content not available to other clients. See the example below. Multiple headers can be supplied separated by `\\n` - actual "backslash" and "n" characters, not a single "new line" character. - + Generally, the API will accept the output of the GET version of the API for an existing client as valid input, but some fields will be ignored. For example, as of writing this there is no way to configure a harvesting schedule via this API. An example JSON file would look like this:: @@ -3454,7 +3352,6 @@ An example JSON file would look like this:: "archiveUrl": "https://zenodo.org", "archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.", "metadataFormat": "oai_dc", - "customHeaders": "x-oai-api-key: xxxyyyzzz", "set": "user-lmops" } @@ -4329,7 +4226,7 @@ View the details of the standard license with the database ID specified in ``$ID curl $SERVER_URL/api/licenses/$ID -Superusers can add a new license by posting a JSON file adapted from this example :download:`add-license.json <../_static/api/add-license.json>`. The ``name`` and ``uri`` of the new license must be unique. Sort order field is mandatory. If you are interested in adding a Creative Commons license, you are encouarged to use the JSON files under :ref:`adding-creative-commons-licenses`: +Superusers can add a new license by posting a JSON file adapted from this example ``add-license.json`` file located at ``/_static/api/add-license.json``. The ``name`` and ``uri`` of the new license must be unique. If you are interested in adding a Creative Commons license, you are encouarged to use the JSON files under :ref:`adding-creative-commons-licenses`: .. code-block:: bash diff --git a/doc/sphinx-guides/source/container/base-image.rst b/doc/sphinx-guides/source/container/base-image.rst index 931c722f91b..1cca8e56072 100644 --- a/doc/sphinx-guides/source/container/base-image.rst +++ b/doc/sphinx-guides/source/container/base-image.rst @@ -10,7 +10,7 @@ at this layer, to make the application image focus on the app itself. **NOTE: The base image does not contain the Dataverse application itself.** -Within the main repository, you may find the base image's files at ``/modules/container-base``. +Within the main repository, you may find the base image's files at `/modules/container-base`. This Maven module uses the `Maven Docker Plugin `_ to build and ship the image. You may use, extend, or alter this image to your liking and/or host in some different registry if you want to. @@ -27,9 +27,9 @@ Development and maintenance of the `image's code `__) -- The ``stable`` tag corresponds to the ``master`` branch, where releases are cut from. +- The `stable` tag corresponds to the `master` branch, where releases are cut from. (`Dockerfile `__) @@ -41,7 +41,7 @@ The base image provides: - `Eclipse Temurin JRE using Java 11 `_ - `Payara Community Application Server `_ -- CLI tools necessary to run Dataverse (i. e. ``curl`` or ``jq`` - see also :doc:`../installation/prerequisites` in Installation Guide) +- CLI tools necessary to run Dataverse (i. e. `curl` or `jq` - see also :doc:`../installation/prerequisites` in Installation Guide) - Linux tools for analysis, monitoring and so on - `Jattach `__ (attach to running JVM) - `wait-for `__ (tool to "wait for" a service to be available) @@ -63,27 +63,27 @@ Assuming you have `Docker `_, `Docker D Simply execute the Maven modules packaging target with activated "container profile. Either from the projects Git root: -``mvn -Pct -f modules/container-base install`` +`mvn -Pct -f modules/container-base install` Or move to the module and execute: -``cd modules/container-base && mvn -Pct install`` +`cd modules/container-base && mvn -Pct install` Some additional notes, using Maven parameters to change the build and use ...: -- | ... a different tag only: add ``-Dbase.image.tag=tag``. - | *Note:* default is ``develop`` -- | ... a different image name and tag: add ``-Dbase.image=name:tag``. - | *Note:* default is ``gdcc/base:${base.image.tag}`` -- ... a different image registry than Docker Hub: add ``-Ddocker.registry=registry.example.org`` (see also +- | ... a different tag only: add `-Dbase.image.tag=tag`. + | *Note:* default is `develop` +- | ... a different image name and tag: add `-Dbase.image=name:tag`. + | *Note:* default is `gdcc/base:${base.image.tag}` +- ... a different image registry than Docker Hub: add `-Ddocker.registry=registry.example.org` (see also `DMP docs on registries `__) -- ... a different Payara version: add ``-Dpayara.version=V.YYYY.R``. -- | ... a different Temurin JRE version ``A``: add ``-Dtarget.java.version=A`` (i.e. ``11``, ``17``, ...). - | *Note:* must resolve to an available image tag ``A-jre`` of Eclipse Temurin! +- ... a different Payara version: add `-Dpayara.version=V.YYYY.R`. +- | ... a different Temurin JRE version `A`: add `-Dtarget.java.version=A` (i.e. `11`, `17`, ...). + | *Note:* must resolve to an available image tag `A-jre` of Eclipse Temurin! (See also `Docker Hub search example `_) -- ... a different Java Distribution: add ``-Djava.image="name:tag"`` with precise reference to an +- ... a different Java Distribution: add `-Djava.image="name:tag"` with precise reference to an image available local or remote. -- ... a different UID/GID for the ``payara`` user/group: add ``-Dbase.image.uid=1234`` (or ``.gid``) +- ... a different UID/GID for the `payara` user/group: add `-Dbase.image.uid=1234` (or `.gid`) Automated Builds & Publishing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -103,14 +103,14 @@ Processor Architecture and Multiarch This image is created as a "multi-arch image", supporting the most common architectures Dataverse usually runs on: AMD64 (Windows/Linux/...) and ARM64 (Apple M1/M2), by using Maven Docker Plugin's *BuildX* mode. -Building the image via ``mvn -Pct package`` or ``mvn -Pct install`` as above will only build for the architecture of +Building the image via `mvn -Pct package` or `mvn -Pct install` as above will only build for the architecture of the Docker maschine's CPU. -Only ``mvn -Pct deploy`` will trigger building on all enabled architectures. +Only `mvn -Pct deploy` will trigger building on all enabled architectures. Yet, to enable building with non-native code on your build machine, you will need to setup a cross-platform builder. On Linux, you should install `qemu-user-static `__ (preferably via -your package management) on the host and run ``docker run --rm --privileged multiarch/qemu-user-static --reset -p yes`` +your package management) on the host and run `docker run --rm --privileged multiarch/qemu-user-static --reset -p yes` to enable that builder. The Docker plugin will setup everything else for you. @@ -124,104 +124,16 @@ Many settings have been carefully selected for best performance and stability of As with any service, you should always monitor any metrics and make use of the tuning capabilities the base image provides. These are mostly based on environment variables (very common with containers) and provide sane defaults. -.. list-table:: - :align: left - :width: 100 - :widths: 10 10 10 50 +.. csv-table:: + :class: longtable :header-rows: 1 + :delim: tab + :file: ../_static/container/tunables.tsv + :widths: 35, 15, 15, 35 - * - Env. variable - - Default - - Type - - Description - * - ``DEPLOY_PROPS`` - - (empty) - - String - - Set to add arguments to generated `asadmin deploy` commands. - * - ``PREBOOT_COMMANDS`` - - [preboot]_ - - Abs. path - - Provide path to file with ``asadmin`` commands to run **before** boot of application server. - See also `Pre/postboot script docs`_. - * - ``POSTBOOT_COMMANDS`` - - [postboot]_ - - Abs. path - - Provide path to file with ``asadmin`` commands to run **after** boot of application server. - See also `Pre/postboot script docs`_. - * - ``JVM_ARGS`` - - (empty) - - String - - Additional arguments to pass to application server's JVM on start. - * - ``MEM_MAX_RAM_PERCENTAGE`` - - ``70.0`` - - Percentage - - Maximum amount of container's allocated RAM to be used as heap space. - Make sure to leave some room for native memory, OS overhead etc! - * - ``MEM_XSS`` - - ``512k`` - - Size - - Tune the maximum JVM stack size. - * - ``MEM_MIN_HEAP_FREE_RATIO`` - - ``20`` - - Integer - - Make the heap shrink aggressively and grow conservatively. See also `run-java-sh recommendations`_. - * - ``MEM_MAX_HEAP_FREE_RATIO`` - - ``40`` - - Integer - - Make the heap shrink aggressively and grow conservatively. See also `run-java-sh recommendations`_. - * - ``MEM_MAX_GC_PAUSE_MILLIS`` - - ``500`` - - Milliseconds - - Shorter pause times might result in lots of collections causing overhead without much gain. - This needs monitoring and tuning. It's a complex matter. - * - ``MEM_METASPACE_SIZE`` - - ``256m`` - - Size - - Initial size of memory reserved for class metadata, also used as trigger to run a garbage collection - once passing this size. - * - ``MEM_MAX_METASPACE_SIZE`` - - ``2g`` - - Size - - The metaspace's size will not outgrow this limit. - * - ``ENABLE_DUMPS`` - - ``0`` - - Bool, ``0|1`` - - If enabled, the argument(s) given in ``JVM_DUMP_ARG`` will be added to the JVM starting up. - This means it will enable dumping the heap to ``${DUMPS_DIR}`` (see below) in "out of memory" cases. - (You should back this location with disk space / ramdisk, so it does not write into an overlay filesystem!) - * - ``JVM_DUMPS_ARG`` - - [dump-option]_ - - String - - Can be fine tuned for more grained controls of dumping behaviour. - * - ``ENABLE_JMX`` - - ``0`` - - Bool, ``0|1`` - - Allow insecure JMX connections, enable AMX and tune all JMX monitoring levels to ``HIGH``. - See also `Payara Docs - Basic Monitoring `_. - A basic JMX service is enabled by default in Payara, exposing basic JVM MBeans, but especially no Payara MBeans. - * - ``ENABLE_JDWP`` - - ``0`` - - Bool, ``0|1`` - - Enable the "Java Debug Wire Protocol" to attach a remote debugger to the JVM in this container. - Listens on port 9009 when enabled. Search the internet for numerous tutorials to use it. - * - ``ENABLE_RELOAD`` - - ``0`` - - Bool, ``0|1`` - - Enable the dynamic "hot" reloads of files when changed in a deployment. Useful for development, - when new artifacts are copied into the running domain. - * - ``DATAVERSE_HTTP_TIMEOUT`` - - ``900`` - - Seconds - - See :ref:`:ApplicationServerSettings` ``http.request-timeout-seconds``. - - *Note:* can also be set using any other `MicroProfile Config Sources`_ available via ``dataverse.http.timeout``. - - -.. [preboot] ``${CONFIG_DIR}/pre-boot-commands.asadmin`` -.. [postboot] ``${CONFIG_DIR}/post-boot-commands.asadmin`` -.. [dump-option] ``-XX:+HeapDumpOnOutOfMemoryError`` - - +.. [preboot] `${CONFIG_DIR}/pre-boot-commands.asadmin` +.. [postboot] `${CONFIG_DIR}/post-boot-commands.asadmin` +.. [dump-option] `-XX:+HeapDumpOnOutOfMemoryError` Locations +++++++++ @@ -240,32 +152,31 @@ building upon it. You can also use these for references in scripts, etc. .. list-table:: :align: left - :width: 100 - :widths: 10 10 50 + :widths: 25 25 50 :header-rows: 1 * - Env. variable - Value - Description - * - ``HOME_DIR`` - - ``/opt/payara`` + * - `HOME_DIR` + - `/opt/payara` - Home base to Payara and the application - * - ``PAYARA_DIR`` - - ``${HOME_DIR}/appserver`` + * - `PAYARA_DIR` + - `${HOME_DIR}/appserver` - Installation directory of Payara server - * - ``SCRIPT_DIR`` - - ``${HOME_DIR}/scripts`` + * - `SCRIPT_DIR` + - `${HOME_DIR}/scripts` - Any scripts like the container entrypoint, init scripts, etc - * - ``CONFIG_DIR`` - - ``${HOME_DIR}/config`` + * - `CONFIG_DIR` + - `${HOME_DIR}/config` - Payara Server configurations like pre/postboot command files go here (Might be reused for Dataverse one day) - * - ``DEPLOY_DIR`` - - ``${HOME_DIR}/deployments`` + * - `DEPLOY_DIR` + - `${HOME_DIR}/deployments` - Any EAR or WAR file, exploded WAR directory etc are autodeployed on start - * - ``DOMAIN_DIR`` - - ``${PAYARA_DIR}/glassfish`` ``/domains/${DOMAIN_NAME}`` - - Path to root of the Payara domain applications will be deployed into. Usually ``${DOMAIN_NAME}`` will be ``domain1``. + * - `DOMAIN_DIR` + - `${PAYARA_DIR}/glassfish` `/domains/${DOMAIN_NAME}` + - Path to root of the Payara domain applications will be deployed into. Usually `${DOMAIN_NAME}` will be `domain1`. **Writeable at runtime:** @@ -287,18 +198,18 @@ named Docker volume in these places to avoid data loss, gain performance and/or * - Env. variable - Value - Description - * - ``STORAGE_DIR`` - - ``/dv`` + * - `STORAGE_DIR` + - `/dv` - This place is writeable by the Payara user, making it usable as a place to store research data, customizations or other. Images inheriting the base image should create distinct folders here, backed by different mounted volumes. - * - ``SECRETS_DIR`` - - ``/secrets`` + * - `SECRETS_DIR` + - `/secrets` - Mount secrets or other here, being picked up automatically by `Directory Config Source `_. See also various :doc:`../installation/config` options involving secrets. - * - ``DUMPS_DIR`` - - ``/dumps`` + * - `DUMPS_DIR` + - `/dumps` - Default location where heap dumps will be stored (see above). You should mount some storage here (disk or ephemeral). @@ -311,7 +222,7 @@ The default ports that are exposed by this image are: - 8080 - HTTP listener - 4848 - Admin Service HTTPS listener - 8686 - JMX listener -- 9009 - "Java Debug Wire Protocol" port (when ``ENABLE_JDWP=1``) +- 9009 - "Java Debug Wire Protocol" port (when `ENABLE_JDWP=1`) The HTTPS listener (on port 8181) becomes deactivated during the build, as we will always need to reverse-proxy the application server and handle SSL/TLS termination at this point. Save the memory and some CPU cycles! @@ -325,21 +236,21 @@ Entry & Extension Points The entrypoint shell script provided by this base image will by default ensure to: -- Run any scripts named ``${SCRIPT_DIR}/init_*`` or in ``${SCRIPT_DIR}/init.d/*`` directory for initialization +- Run any scripts named `${SCRIPT_DIR}/init_*` or in `${SCRIPT_DIR}/init.d/*` directory for initialization **before** the application server starts. -- Run an executable script ``${SCRIPT_DIR}/startInBackground.sh`` in the background - if present. -- Run the application server startup scripting in foreground (``${SCRIPT_DIR}/startInForeground.sh``). +- Run an executable script `${SCRIPT_DIR}/startInBackground.sh` in the background - if present. +- Run the application server startup scripting in foreground (`${SCRIPT_DIR}/startInForeground.sh`). If you need to create some scripting that runs in parallel under supervision of `dumb-init `_, e.g. to wait for the application to deploy before executing something, this is your point of extension: simply provide -the ``${SCRIPT_DIR}/startInBackground.sh`` executable script with your application image. +the `${SCRIPT_DIR}/startInBackground.sh` executable script with your application image. Other Hints +++++++++++ -By default, ``domain1`` is enabled to use the ``G1GC`` garbage collector. +By default, `domain1` is enabled to use the `G1GC` garbage collector. For running a Java application within a Linux based container, the support for CGroups is essential. It has been included and activated by default since Java 8u192, Java 11 LTS and later. If you are interested in more details, diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 0a3dd23ed23..6bf62c4aa68 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -191,7 +191,7 @@ How a DCM reports checksum success or failure to your Dataverse Installation Once the user uploads files to a DCM, that DCM will perform checksum validation and report to your Dataverse installation the results of that validation. The DCM must be configured to pass the API token of a superuser. The implementation details, which are subject to change, are below. -The JSON that a DCM sends to your Dataverse installation on successful checksum validation looks something like the contents of :download:`checksumValidationSuccess.json <../_static/installation/files/root/big-data-support/checksumValidationSuccess.json>` below: +The JSON that a DCM sends to your Dataverse installation on successful checksum validation looks something like the contents of ``checksumValidationSuccess.json`` located at ``/_static/installation/files/root/big-data-support/checksumValidationSuccess.json`` below: .. literalinclude:: ../_static/installation/files/root/big-data-support/checksumValidationSuccess.json :language: json @@ -214,16 +214,16 @@ See instructions at https://github.com/sbgrid/data-capture-module/blob/master/do Add Dataverse Installation settings to use mock (same as using DCM, noted above): -- ``curl http://localhost:8080/api/admin/settings/:DataCaptureModuleUrl -X PUT -d "http://localhost:5000"`` -- ``curl http://localhost:8080/api/admin/settings/:UploadMethods -X PUT -d "dcm/rsync+ssh"`` +- :command:`curl http://localhost:8080/api/admin/settings/:DataCaptureModuleUrl -X PUT -d "http://localhost:5000"` +- :command:`curl http://localhost:8080/api/admin/settings/:UploadMethods -X PUT -d "dcm/rsync+ssh"` At this point you should be able to download a placeholder rsync script. Your Dataverse installation is then waiting for news from the DCM about if checksum validation has succeeded or not. First, you have to put files in place, which is usually the job of the DCM. You should substitute "X1METO" for the "identifier" of the dataset you create. You must also use the proper path for where you store files in your dev environment. -- ``mkdir /usr/local/payara5/glassfish/domains/domain1/files/10.5072/FK2/X1METO`` -- ``mkdir /usr/local/payara5/glassfish/domains/domain1/files/10.5072/FK2/X1METO/X1METO`` -- ``cd /usr/local/payara5/glassfish/domains/domain1/files/10.5072/FK2/X1METO/X1METO`` -- ``echo "hello" > file1.txt`` -- ``shasum file1.txt > files.sha`` +- :command:`mkdir /usr/local/payara5/glassfish/domains/domain1/files/10.5072/FK2/X1METO` +- :command:`mkdir /usr/local/payara5/glassfish/domains/domain1/files/10.5072/FK2/X1METO/X1METO` +- :command:`cd /usr/local/payara5/glassfish/domains/domain1/files/10.5072/FK2/X1METO/X1METO` +- :command:`echo "hello" > file1.txt` +- :command:`shasum file1.txt > files.sha` @@ -265,11 +265,11 @@ Optional steps for setting up the S3 Docker DCM Variant - Set S3 as the storage driver - - ``cd /opt/payara5/bin/`` - - ``./asadmin delete-jvm-options "\-Ddataverse.files.storage-driver-id=file"`` - - ``./asadmin create-jvm-options "\-Ddataverse.files.storage-driver-id=s3"`` - - ``./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"`` - - ``./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"`` + - :command:`cd /opt/payara5/bin/` + - :command:`./asadmin delete-jvm-options "\-Ddataverse.files.storage-driver-id=file"` + - :command:`./asadmin create-jvm-options "\-Ddataverse.files.storage-driver-id=s3"` + - :command:`./asadmin create-jvm-options "\-Ddataverse.files.s3.type=s3"` + - :command:`./asadmin create-jvm-options "\-Ddataverse.files.s3.label=s3"` - Add AWS bucket info to your Dataverse installation @@ -288,13 +288,13 @@ Optional steps for setting up the S3 Docker DCM Variant - S3 bucket for your Dataverse installation - - ``/usr/local/payara5/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=iqsstestdcmbucket"`` + - :command:`/usr/local/payara5/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.s3.bucket-name=iqsstestdcmbucket"` - S3 bucket for DCM (as your Dataverse installation needs to do the copy over) - - ``/usr/local/payara5/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.dcm-s3-bucket-name=test-dcm"`` + - :command:`/usr/local/payara5/glassfish/bin/asadmin create-jvm-options "-Ddataverse.files.dcm-s3-bucket-name=test-dcm"` - - Set download method to be HTTP, as DCM downloads through S3 are over this protocol ``curl -X PUT "http://localhost:8080/api/admin/settings/:DownloadMethods" -d "native/http"`` + - Set download method to be HTTP, as DCM downloads through S3 are over this protocol :command:`curl -X PUT "http://localhost:8080/api/admin/settings/:DownloadMethods" -d "native/http"` Using the DCM Docker Containers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -303,31 +303,31 @@ For using these commands, you will need to connect to the shell prompt inside va - Create a dataset and download rsync upload script - - connect to client container: ``docker exec -it dcm_client bash`` - - create dataset: ``cd /mnt ; ./create.bash`` ; this will echo the database ID to stdout - - download transfer script: ``./get_transfer.bash $database_id_from_create_script`` - - execute the transfer script: ``bash ./upload-${database_id_from-create_script}.bash`` , and follow instructions from script. + - connect to client container: :command:`docker exec -it dcm_client bash` + - create dataset: :command:`cd /mnt ; ./create.bash` ; this will echo the database ID to stdout + - download transfer script: :command:`./get_transfer.bash $database_id_from_create_script` + - execute the transfer script: :command:`bash ./upload-${database_id_from-create_script}.bash` , and follow instructions from script. - Run script - - e.g. ``bash ./upload-3.bash`` (``3`` being the database id from earlier commands in this example). + - e.g. :command:`bash ./upload-3.bash` (:samp:`3` being the database id from earlier commands in this example). - Manually run post upload script on dcmsrv - - for posix implementation: ``docker exec -it dcmsrv /opt/dcm/scn/post_upload.bash`` - - for S3 implementation: ``docker exec -it dcmsrv /opt/dcm/scn/post_upload_s3.bash`` + - for posix implementation: :command:`docker exec -it dcmsrv /opt/dcm/scn/post_upload.bash` + - for S3 implementation: :command:`docker exec -it dcmsrv /opt/dcm/scn/post_upload_s3.bash` Additional DCM docker development tips ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - You can completely blow away all the docker images with these commands (including non DCM ones!) - - ``docker-compose -f docmer-compose.yml down -v`` + - :command:`docker-compose -f docmer-compose.yml down -v` - There are a few logs to tail - - dvsrv : ``tail -n 2000 -f /opt/payara5/glassfish/domains/domain1/logs/server.log`` - - dcmsrv : ``tail -n 2000 -f /var/log/lighttpd/breakage.log`` - - dcmsrv : ``tail -n 2000 -f /var/log/lighttpd/access.log`` + - dvsrv : :command:`tail -n 2000 -f /opt/payara5/glassfish/domains/domain1/logs/server.log` + - dcmsrv : :command:`tail -n 2000 -f /var/log/lighttpd/breakage.log` + - dcmsrv : :command:`tail -n 2000 -f /var/log/lighttpd/access.log` - You may have to restart the app server domain occasionally to deal with memory filling up. If deployment is getting reallllllly slow, its a good time. @@ -343,8 +343,8 @@ Using the RSAL Docker Containers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Create a dataset (either with the procedure mentioned in DCM Docker Containers, or another process) -- Publish the dataset (from the client container): ``cd /mnt; ./publish_major.bash ${database_id}`` -- Run the RSAL component of the workflow (from the host): ``docker exec -it rsalsrv /opt/rsal/scn/pub.py`` +- Publish the dataset (from the client container): :command:`cd /mnt; ./publish_major.bash ${database_id}` +- Run the RSAL component of the workflow (from the host): :command:`docker exec -it rsalsrv /opt/rsal/scn/pub.py` - If desired, from the client container you can download the dataset following the instructions in the dataset access section of the dataset page. Configuring the RSAL Mock @@ -356,7 +356,7 @@ Also, to configure your Dataverse installation to use the new workflow you must 1. Configure the RSAL URL: -``curl -X PUT -d 'http://:5050' http://localhost:8080/api/admin/settings/:RepositoryStorageAbstractionLayerUrl`` +:command:`curl -X PUT -d 'http://:5050' http://localhost:8080/api/admin/settings/:RepositoryStorageAbstractionLayerUrl` 2. Update workflow json with correct URL information: @@ -364,63 +364,63 @@ Edit internal-httpSR-workflow.json and replace url and rollbackUrl to be the url 3. Create the workflow: -``curl http://localhost:8080/api/admin/workflows -X POST --data-binary @internal-httpSR-workflow.json -H "Content-type: application/json"`` +:command:`curl http://localhost:8080/api/admin/workflows -X POST --data-binary @internal-httpSR-workflow.json -H "Content-type: application/json"` 4. List available workflows: -``curl http://localhost:8080/api/admin/workflows`` +:command:`curl http://localhost:8080/api/admin/workflows` 5. Set the workflow (id) as the default workflow for the appropriate trigger: -``curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset -X PUT -d 2`` +:command:`curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset -X PUT -d 2` 6. Check that the trigger has the appropriate default workflow set: -``curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset`` +:command:`curl http://localhost:8080/api/admin/workflows/default/PrePublishDataset` 7. Add RSAL to whitelist 8. When finished testing, unset the workflow: -``curl -X DELETE http://localhost:8080/api/admin/workflows/default/PrePublishDataset`` +:command:`curl -X DELETE http://localhost:8080/api/admin/workflows/default/PrePublishDataset` Configuring download via rsync ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to see the rsync URLs, you must run this command: -``curl -X PUT -d 'rsal/rsync' http://localhost:8080/api/admin/settings/:DownloadMethods`` +:command:`curl -X PUT -d 'rsal/rsync' http://localhost:8080/api/admin/settings/:DownloadMethods` .. TODO: Document these in the Installation Guide once they're final. To specify replication sites that appear in rsync URLs: -Download :download:`add-storage-site.json <../../../../scripts/api/data/storageSites/add-storage-site.json>` and adjust it to meet your needs. The file should look something like this: +Adjust the ``add-storage-site.json`` located under ``/scripts/api/data/storageSites/add-storage-site.json`` to meet your needs. The file should look something like this: -.. literalinclude:: ../../../../scripts/api/data/storageSites/add-storage-site.json +.. literalinclude:: ../_static/api/add-storage-site.json Then add the storage site using curl: -``curl -H "Content-type:application/json" -X POST http://localhost:8080/api/admin/storageSites --upload-file add-storage-site.json`` +:command:`curl -H "Content-type:application/json" -X POST http://localhost:8080/api/admin/storageSites --upload-file add-storage-site.json` You make a storage site the primary site by passing "true". Pass "false" to make it not the primary site. (id "1" in the example): -``curl -X PUT -d true http://localhost:8080/api/admin/storageSites/1/primaryStorage`` +:command:`curl -X PUT -d true http://localhost:8080/api/admin/storageSites/1/primaryStorage` You can delete a storage site like this (id "1" in the example): -``curl -X DELETE http://localhost:8080/api/admin/storageSites/1`` +:command:`curl -X DELETE http://localhost:8080/api/admin/storageSites/1` You can view a single storage site like this: (id "1" in the example): -``curl http://localhost:8080/api/admin/storageSites/1`` +:command:`curl http://localhost:8080/api/admin/storageSites/1` You can view all storage site like this: -``curl http://localhost:8080/api/admin/storageSites`` +:command:`curl http://localhost:8080/api/admin/storageSites` In the GUI, this is called "Local Access". It's where you can compute on files on your cluster. -``curl http://localhost:8080/api/admin/settings/:LocalDataAccessPath -X PUT -d "/programs/datagrid"`` +:command:`curl http://localhost:8080/api/admin/settings/:LocalDataAccessPath -X PUT -d "/programs/datagrid"` diff --git a/doc/sphinx-guides/source/developers/dev-environment.rst b/doc/sphinx-guides/source/developers/dev-environment.rst index 3801dbe76fc..40b7ac5a29d 100755 --- a/doc/sphinx-guides/source/developers/dev-environment.rst +++ b/doc/sphinx-guides/source/developers/dev-environment.rst @@ -197,7 +197,7 @@ Create a Python virtual environment, activate it, then install dependencies: The installer will try to connect to the SMTP server you tell it to use. If you haven't used the Docker Compose option for setting up the dependencies, or you don't have a mail server handy, you can run ``nc -l 25`` in another terminal and choose "localhost" (the default) to get past this check. -Finally, run the installer (see also :download:`README_python.txt <../../../../scripts/installer/README_python.txt>` if necessary): +Finally, run the installer (see also ``README_python.txt`` located at ``/scripts/installer/README_python.txt`` if necessary): ``python3 install.py`` diff --git a/doc/sphinx-guides/source/developers/make-data-count.rst b/doc/sphinx-guides/source/developers/make-data-count.rst index a3c0d10dc5e..5e9bc3e62f8 100644 --- a/doc/sphinx-guides/source/developers/make-data-count.rst +++ b/doc/sphinx-guides/source/developers/make-data-count.rst @@ -30,13 +30,13 @@ Full Setup The recommended way to work on the Make Data Count feature is to spin up an EC2 instance that has both the Dataverse Software and Counter Processor installed. Go to the :doc:`deployment` page for details on how to spin up an EC2 instance and make sure that your Ansible file is configured to install Counter Processor before running the "create" script. -(Alternatively, you can try installing Counter Processor in Vagrant. :download:`setup-counter-processor.sh <../../../../scripts/vagrant/setup-counter-processor.sh>` might help you get it installed.) +(Alternatively, you can try installing Counter Processor in Vagrant. Using the ``setup-counter-processor.sh`` located at ``/scripts/vagrant/setup-counter-processor.sh`` might help you get it installed.) After you have spun to your EC2 instance, set ``:MDCLogPath`` so that the Dataverse installation creates a log for Counter Processor to operate on. For more on this database setting, see the :doc:`/installation/config` section of the Installation Guide. Next you need to have the Dataverse installation add some entries to the log that Counter Processor will operate on. To do this, click on some published datasets and download some files. -Next you should run Counter Processor to convert the log into a SUSHI report, which is in JSON format. Before running Counter Processor, you need to put a configuration file into place. As a starting point use :download:`counter-processor-config.yaml <../../../../scripts/vagrant/counter-processor-config.yaml>` and edit the file, paying particular attention to the following settings: +Next you should run Counter Processor to convert the log into a SUSHI report, which is in JSON format. Before running Counter Processor, you need to put a configuration file into place. As a starting point use ``counter-processor-config.yaml`` located at ``/scripts/vagrant/counter-processor-config.yaml`` and edit the file, paying particular attention to the following settings: - ``log_name_pattern`` You might want something like ``/usr/local/payara5/glassfish/domains/domain1/logs/counter_(yyyy-mm-dd).log`` - ``year_month`` You should probably set this to the current month. diff --git a/doc/sphinx-guides/source/developers/testing.rst b/doc/sphinx-guides/source/developers/testing.rst index 4b3d5fd0a55..e8a2a860b37 100755 --- a/doc/sphinx-guides/source/developers/testing.rst +++ b/doc/sphinx-guides/source/developers/testing.rst @@ -5,16 +5,16 @@ Testing In order to keep our codebase healthy, the Dataverse Project encourages developers to write automated tests in the form of unit tests and integration tests. We also welcome ideas for how to improve our automated testing. .. contents:: |toctitle| - :local: + :local: The Health of a Codebase ------------------------ Before we dive into the nut and bolts of testing, let's back up for a moment and think about why we write automated tests in the first place. Writing automated tests is an investment and leads to better quality software. Counterintuitively, writing tests and executing them regularly allows a project to move faster. Martin Fowler explains this well while talking about the health of a codebase: - "This is an economic judgment. Several times, many times, I run into teams that say something like, 'Oh well. Management isn't allowing us to do a quality job here because it will slow us down. And we've appealed to management and said we need to put more quality in the code, but they've said no, we need to go faster instead.' And my comment to that is well, as soon as you’re framing it in terms of code quality versus speed, you've lost. Because the whole point of refactoring is to go faster. + "This is an economic judgment. Several times, many times, I run into teams that say something like, 'Oh well. Management isn't allowing us to do a quality job here because it will slow us down. And we've appealed to management and said we need to put more quality in the code, but they've said no, we need to go faster instead.' And my comment to that is well, as soon as you're framing it in terms of code quality versus speed, you've lost. Because the whole point of refactoring is to go faster. - "And this is why I quite like playing a bit more with the metaphor as the health of a codebase. If you keep yourself healthy then you'll be able to run faster. But if you just say, 'Well, I want to run a lot so I'm therefore going to run a whole load all the time and not eat properly and not pay attention about this shooting pain going up my leg,' then you’re not going to be able to run quickly very long. **You have to pay attention to your health. And same with the codebase. You have to continuously say, 'How do we keep it in a healthy state? Then we can go fast,' because we’re running marathons here with codebases. And if we neglect that internal quality of the codebase, it hits you surprisingly fast.**" + "And this is why I quite like playing a bit more with the metaphor as the health of a codebase. If you keep yourself healthy then you'll be able to run faster. But if you just say, 'Well, I want to run a lot so I'm therefore going to run a whole load all the time and not eat properly and not pay attention about this shooting pain going up my leg,' then you're not going to be able to run quickly very long. **You have to pay attention to your health. And same with the codebase. You have to continuously say, 'How do we keep it in a healthy state? Then we can go fast,' because we're running marathons here with codebases. And if we neglect that internal quality of the codebase, it hits you surprisingly fast.**" --Martin Fowler at https://devchat.tv/ruby-rogues/178-rr-book-club-refactoring-ruby-with-martin-fowler @@ -231,22 +231,22 @@ Before writing any new REST Assured tests, you should get the tests to pass in a You do not have to reinvent the wheel. There are many useful methods you can call in your own tests -- especially within UtilIT.java -- when you need your test to create and/or interact with generated accounts, files, datasets, etc. Similar methods can subsequently delete them to get them out of your way as desired before the test has concluded. -For example, if you’re testing your code’s operations with user accounts, the method ``UtilIT.createRandomUser();`` can generate an account for your test to work with. The same account can then be deleted by your program by calling the ``UtilIT.deleteUser();`` method on the imaginary friend your test generated. +For example, if you're testing your code's operations with user accounts, the method ``UtilIT.createRandomUser();`` can generate an account for your test to work with. The same account can then be deleted by your program by calling the ``UtilIT.deleteUser();`` method on the imaginary friend your test generated. -Remember, it’s only a test (and it's not graded)! Some guidelines to bear in mind: +Remember, it's only a test (and it's not graded)! Some guidelines to bear in mind: - Map out which logical functions you want to test -- Understand what’s being tested and ensure it’s repeatable +- Understand what's being tested and ensure it's repeatable - Assert the conditions of success / return values for each operation * A useful resource would be `HTTP status codes `_ - Let the code do the labor; automate everything that happens when you run your test file. -- Just as with any development, if you’re stuck: ask for help! +- Just as with any development, if you're stuck: ask for help! -To execute existing integration tests on your local Dataverse installation, a helpful command line tool to use is `Maven `_. You should have Maven installed as per the `Development Environment `_ guide, but if not it’s easily done via Homebrew: ``brew install maven``. +To execute existing integration tests on your local Dataverse installation, a helpful command line tool to use is `Maven `_. You should have Maven installed as per the `Development Environment `_ guide, but if not it's easily done via Homebrew: ``brew install maven``. Once installed, you may run commands with ``mvn [options] [] []``. -+ If you want to run just one particular API test, it’s as easy as you think: ++ If you want to run just one particular API test, it's as easy as you think: ``mvn test -Dtest=FileRecordJobIT`` @@ -258,7 +258,7 @@ Once installed, you may run commands with ``mvn [options] [] [` so that our automated testing knows about it. +If you are adding a new test class, be sure to add it to ``tests/integration-tests.txt`` located at ``/tests/integration-tests.txt`` so that our automated testing knows about it. Writing and Using a Testcontainers Test @@ -331,7 +331,7 @@ Note that we are running the following commands as the user "dataverse". In shor Add jacococli.jar to the WAR File ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As the "dataverse" user download :download:`instrument_war_jacoco.bash <../_static/util/instrument_war_jacoco.bash>` (or skip ahead to the "git clone" step to get the script that way) and give it two arguments: +As the "dataverse" user, use the ``instrument_war_jacoco.bash`` located at ``/_static/util/instrument_war_jacoco.bash`` (or skip ahead to the "git clone" step to get the script that way) and give it two arguments: - path to your pristine WAR file - path to the new WAR file the script will create with jacococli.jar in it @@ -391,12 +391,12 @@ Load/Performance Testing Locust ~~~~~~ -Load and performance testing is conducted on an as-needed basis but we're open to automating it. As of this writing Locust ( https://locust.io ) scripts at https://github.com/IQSS/dataverse-helper-scripts/tree/master/src/stress_tests have been used. +Load and performance testing is conducted on an as-needed basis but we're open to automating it. As of this writing Locust (https://locust.io) scripts at https://github.com/IQSS/dataverse-helper-scripts/tree/master/src/stress_tests have been used. download-files.sh script ~~~~~~~~~~~~~~~~~~~~~~~~ -One way of generating load is by downloading many files. You can download :download:`download-files.sh <../../../../tests/performance/download-files/download-files.sh>`, make it executable (``chmod 755``), and run it with ``--help``. You can use ``-b`` to specify the base URL of the Dataverse installation and ``-s`` to specify the number of seconds to wait between requests like this: +One way of generating load is by downloading many files. You can use ``download-files.sh`` located at ``/tests/performance/download-files/download-files.sh``, make it executable (``chmod 755``), and run it with ``--help``. You can use ``-b`` to specify the base URL of the Dataverse installation and ``-s`` to specify the number of seconds to wait between requests like this: ``./download-files.sh -b https://dev1.dataverse.org -s 2`` @@ -456,7 +456,7 @@ Accessibility Testing Accessibility Policy ~~~~~~~~~~~~~~~~~~~~ -The Dataverse Project aims to improve the user experience for those with disabilities, and are in the process of following the recommendations of the `Harvard University Digital Accessibility Policy `__, which use the Worldwide Web Consortium’s Web Content Accessibility Guidelines version 2.1, Level AA Conformance (WCAG 2.1 Level AA) as the standard. +The Dataverse Project aims to improve the user experience for those with disabilities, and are in the process of following the recommendations of the `Harvard University Digital Accessibility Policy `__, which use the Worldwide Web Consortium's Web Content Accessibility Guidelines version 2.1, Level AA Conformance (WCAG 2.1 Level AA) as the standard. To report an accessibility issue with the Dataverse Software, you can create a new issue in our GitHub repo at: https://github.com/IQSS/dataverse/issues/ diff --git a/doc/sphinx-guides/source/developers/troubleshooting.rst b/doc/sphinx-guides/source/developers/troubleshooting.rst index 832785f9860..7e142f4ef7d 100755 --- a/doc/sphinx-guides/source/developers/troubleshooting.rst +++ b/doc/sphinx-guides/source/developers/troubleshooting.rst @@ -5,7 +5,7 @@ Troubleshooting Over in the :doc:`dev-environment` section we described the "happy path" of when everything goes right as you set up your Dataverse Software development environment. Here are some common problems and solutions for when things go wrong. .. contents:: |toctitle| - :local: + :local: context-root in glassfish-web.xml Munged by Netbeans ---------------------------------------------------- @@ -65,7 +65,7 @@ mail.smtp.socketFactory.fallback false mail.smtp.socketFactory.class javax.net.ssl.SSLSocketFactory ====================================== ============================== -**\*WARNING**: Entering a password here will *not* conceal it on-screen. It’s recommended to use an *app password* (for smtp.gmail.com users) or utilize a dedicated/non-personal user account with SMTP server auths so that you do not risk compromising your password. +**\*WARNING**: Entering a password here will *not* conceal it on-screen. It's recommended to use an *app password* (for smtp.gmail.com users) or utilize a dedicated/non-personal user account with SMTP server auths so that you do not risk compromising your password. Save these changes at the top of the page and restart your app server to try it out. @@ -89,7 +89,7 @@ As another example, here is how to create a Mail Host via command line for Amazo Rebuilding Your Dev Environment ------------------------------- -A script called :download:`dev-rebuild.sh <../../../../scripts/dev/dev-rebuild.sh>` is available that does the following: +A script called ``dev-rebuild.sh`` located at ``/scripts/dev/dev-rebuild.sh`` is available that does the following: - Drops the database. - Clears our Solr. diff --git a/doc/sphinx-guides/source/developers/workflows.rst b/doc/sphinx-guides/source/developers/workflows.rst index 38ca6f4e141..3f52a7224b9 100644 --- a/doc/sphinx-guides/source/developers/workflows.rst +++ b/doc/sphinx-guides/source/developers/workflows.rst @@ -32,7 +32,7 @@ Administration A Dataverse installation stores a set of workflows in its database. Workflows can be managed using the ``api/admin/workflows/`` endpoints of the :doc:`/api/native-api`. Sample workflow files are available in ``scripts/api/data/workflows``. -At the moment, defining a workflow for each trigger is done for the entire instance, using the endpoint ``api/admin/workflows/default/«trigger type»``. +At the moment, defining a workflow for each trigger is done for the entire instance, using the endpoint ``api/admin/workflows/default/<>``. In order to prevent unauthorized resuming of workflows, the Dataverse installation maintains a "white list" of IP addresses from which resume requests are honored. This list is maintained using the ``/api/admin/workflows/ip-whitelist`` endpoint of the :doc:`/api/native-api`. By default, the Dataverse installation honors resume requests from localhost only (``127.0.0.1;::1``), so set-ups that use a single server work with no additional configuration. diff --git a/doc/sphinx-guides/source/installation/advanced.rst b/doc/sphinx-guides/source/installation/advanced.rst index 4f06ed37d01..5bc96a1c215 100644 --- a/doc/sphinx-guides/source/installation/advanced.rst +++ b/doc/sphinx-guides/source/installation/advanced.rst @@ -5,7 +5,7 @@ Advanced Installation Advanced installations are not officially supported but here we are at least documenting some tips and tricks that you might find helpful. You can find a diagram of an advanced installation in the :doc:`prep` section. .. contents:: |toctitle| - :local: + :local: Multiple App Servers -------------------- @@ -95,7 +95,7 @@ To install: tree. In the releases 5.0-5.9 it existed under the name ``ZipDownloadService-v1.0.0``. (A pre-built jar file was distributed under that name as part of the 5.0 release on GitHub. Aside from the name change, there have been no changes in the functionality of the tool). -2. Copy it, together with the shell script :download:`cgi-bin/zipdownload <../../../../scripts/zipdownload/cgi-bin/zipdownload>` +2. Copy it, together with the shell script ``cgi-bin/zipdownload`` located at ``/scripts/zipdownload/cgi-bin/zipdownload`` to the ``cgi-bin`` directory of the chosen Apache server (``/var/www/cgi-bin`` standard). 3. Make sure the shell script (``zipdownload``) is executable, and edit it to configure the database access credentials. Do note that the executable does not need access to the entire Dataverse installation database. A security-conscious diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index d68a4eefc50..06e28b1a6e9 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -56,7 +56,7 @@ In this example, ``128.64.32.16`` is your remote address (that you should verify Once you are ready, enable the :ref:`JVM option `. Verify that the remote locations are properly tracked in your MDC metrics, and/or your IP groups are working. As a final test, if your Dataverse installation is allowing unrestricted localhost access to the admin API, imitate an attack in which a malicious request is pretending to be coming from ``127.0.0.1``. Try the following from a remote, insecure location: -``curl https://your.dataverse.edu/api/admin/settings --header "X-FORWARDED-FOR: 127.0.0.1"`` +:command:`curl https://your.dataverse.edu/api/admin/settings --header "X-FORWARDED-FOR: 127.0.0.1"` First of all, confirm that access is denied! If you are in fact able to access the settings api from a location outside the proxy, **something is seriously wrong**, so please let us know, and stop using the JVM option. Otherwise check the access log entry for the header value. What you should see is something like ``"127.0.0.1, 128.64.32.16"``. Where the second address should be the real IP of your remote client. The fact that the "fake" ``127.0.0.1`` you sent over is present in the header is perfectly ok. This is the proper proxy behavior - it preserves any incoming values in the ``X-Forwarded-Header``, if supplied, and adds the detected incoming address to it, *on the right*. It is only this rightmost comma-separated value that Dataverse installation should ever be using. @@ -184,7 +184,7 @@ DataCite requires that you register for a test account, configured with your own and restart Payara. The prefix can be configured via the API (where it is referred to as "Authority"): -``curl -X PUT -d 10.xxxx http://localhost:8080/api/admin/settings/:Authority`` +:command:`curl -X PUT -d 10.xxxx http://localhost:8080/api/admin/settings/:Authority` EZID is available to University of California scholars and researchers. Testing can be done using the authority 10.5072 and shoulder FK2 with the "apitest" account (contact EZID for credentials) or an institutional account. Configuration in Dataverse is then analogous to using DataCite; @@ -1005,11 +1005,11 @@ The custom logo image file is expected to be small enough to fit comfortably in Given this location for the custom logo image file, run this curl command to add it to your settings: -``curl -X PUT -d '/logos/navbar/logo.png' http://localhost:8080/api/admin/settings/:LogoCustomizationFile`` +:command:`curl -X PUT -d '/logos/navbar/logo.png' http://localhost:8080/api/admin/settings/:LogoCustomizationFile` To revert to the default configuration and have the Dataverse Project icon be displayed, run the following command: -``curl -X DELETE http://localhost:8080/api/admin/settings/:LogoCustomizationFile`` +:command:`curl -X DELETE http://localhost:8080/api/admin/settings/:LogoCustomizationFile` About URL ######### @@ -1034,15 +1034,15 @@ Refer to :ref:`:SignUpUrl` and :ref:`conf-allow-signup` for setting a relative p Custom Header ^^^^^^^^^^^^^ -As a starting point you can download :download:`custom-header.html ` and place it at ``/var/www/dataverse/branding/custom-header.html``. +As a starting point you can copy the ``custom-header.html`` file located at ``/_static/installation/files/var/www/dataverse/branding/custom-header.html`` and place it at ``/var/www/dataverse/branding/custom-header.html``. Given this location for the custom header HTML file, run this curl command to add it to your settings: -``curl -X PUT -d '/var/www/dataverse/branding/custom-header.html' http://localhost:8080/api/admin/settings/:HeaderCustomizationFile`` +:command:`curl -X PUT -d '/var/www/dataverse/branding/custom-header.html' http://localhost:8080/api/admin/settings/:HeaderCustomizationFile` If you have enabled a custom header or navbar logo, you might prefer to disable the theme of the root dataverse. You can do so by setting ``:DisableRootDataverseTheme`` to ``true`` like this: -``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:DisableRootDataverseTheme`` +:command:`curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:DisableRootDataverseTheme` Please note: Disabling the display of the root Dataverse collection theme also disables your ability to edit it. Remember that Dataverse collection owners can set their Dataverse collections to "inherit theme" from the root. Those Dataverse collections will continue to inherit the root Dataverse collection theme (even though it no longer displays on the root). If you would like to edit the root Dataverse collection theme in the future, you will have to re-enable it first. @@ -1060,11 +1060,11 @@ Custom Homepage When you configure a custom homepage, it **replaces** the root Dataverse collection in the content block, serving as a welcome page. This allows for complete control over the look and feel of the content block for your installation's homepage. -As a starting point, download :download:`custom-homepage.html ` and place it at ``/var/www/dataverse/branding/custom-homepage.html``. +As a starting point, copy the ``custom-homepage.html`` located at ``/_static/installation/files/var/www/dataverse/branding/custom-homepage.html`` and place it at ``/var/www/dataverse/branding/custom-homepage.html``. Given this location for the custom homepage HTML file, run this curl command to add it to your settings: -``curl -X PUT -d '/var/www/dataverse/branding/custom-homepage.html' http://localhost:8080/api/admin/settings/:HomePageCustomizationFile`` +:command:`curl -X PUT -d '/var/www/dataverse/branding/custom-homepage.html' http://localhost:8080/api/admin/settings/:HomePageCustomizationFile` Note that the ``custom-homepage.html`` file provided has multiple elements that assume your root Dataverse collection still has an alias of "root". While you were branding your root Dataverse collection, you may have changed the alias to "harvard" or "librascholar" or whatever and you should adjust the custom homepage code as needed. @@ -1072,7 +1072,7 @@ Note: If you prefer to start with less of a blank slate, you can review the cust If you decide you'd like to remove this setting, use the following curl command: -``curl -X DELETE http://localhost:8080/api/admin/settings/:HomePageCustomizationFile`` +:command:`curl -X DELETE http://localhost:8080/api/admin/settings/:HomePageCustomizationFile` Footer Block ++++++++++++ @@ -1094,22 +1094,22 @@ Custom Footer As mentioned above, the custom footer appears below the default footer. -As a starting point, download :download:`custom-footer.html ` and place it at ``/var/www/dataverse/branding/custom-footer.html``. +As a starting point, copy the ``custom-footer.html`` located at ``/_static/installation/files/var/www/dataverse/branding/custom-footer.html`` and place it at ``/var/www/dataverse/branding/custom-footer.html``. Given this location for the custom footer HTML file, run this curl command to add it to your settings: -``curl -X PUT -d '/var/www/dataverse/branding/custom-footer.html' http://localhost:8080/api/admin/settings/:FooterCustomizationFile`` +:command:`curl -X PUT -d '/var/www/dataverse/branding/custom-footer.html' http://localhost:8080/api/admin/settings/:FooterCustomizationFile` Custom Stylesheet +++++++++++++++++ You can style your custom homepage, footer, and header content with a custom CSS file. With advanced CSS know-how, you can achieve custom branding and page layouts by utilizing ``position``, ``padding`` or ``margin`` properties. -As a starting point, download :download:`custom-stylesheet.css ` and place it at ``/var/www/dataverse/branding/custom-stylesheet.css``. +As a starting point, copy the ``custom-stylesheet.css`` located at ``/_static/installation/files/var/www/dataverse/branding/custom-stylesheet.css`` and place it at ``/var/www/dataverse/branding/custom-stylesheet.css``. Given this location for the custom CSS file, run this curl command to add it to your settings: -``curl -X PUT -d '/var/www/dataverse/branding/custom-stylesheet.css' http://localhost:8080/api/admin/settings/:StyleCustomizationFile`` +:command:`curl -X PUT -d '/var/www/dataverse/branding/custom-stylesheet.css' http://localhost:8080/api/admin/settings/:StyleCustomizationFile` .. _i18n: @@ -1160,42 +1160,42 @@ The Dataverse Software provides and API endpoint for adding languages using a zi First, clone the "dataverse-language-packs" git repo. -``git clone https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs.git`` +:command:`git clone https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs.git` Take a look at https://github.com/GlobalDataverseCommunityConsortium/dataverse-language-packs/branches to see if the version of the Dataverse Software you're running has translations. Change to the directory for the git repo you just cloned. -``cd dataverse-language-packs`` +:command:`cd dataverse-language-packs` -Switch (``git checkout``) to the branch based on the Dataverse Software version you are running. The branch "dataverse-v4.13" is used in the example below. +Switch (:command:`git checkout`) to the branch based on the Dataverse Software version you are running. The branch "dataverse-v4.13" is used in the example below. -``export BRANCH_NAME=dataverse-v4.13`` +:command:`export BRANCH_NAME=dataverse-v4.13` -``git checkout $BRANCH_NAME`` +:command:`git checkout $BRANCH_NAME` Create a "languages" directory in "/tmp". -``mkdir /tmp/languages`` +:command:`mkdir /tmp/languages` Copy the properties files into the "languages" directory -``cp -R en_US/*.properties /tmp/languages`` +:command:`cp -R en_US/*.properties /tmp/languages` -``cp -R fr_CA/*.properties /tmp/languages`` +:command:`cp -R fr_CA/*.properties /tmp/languages` Create the zip file -``cd /tmp/languages`` +:command:`cd /tmp/languages` -``zip languages.zip *.properties`` +:command:`zip languages.zip *.properties` Load the languages.zip file into your Dataverse Installation ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Now that you have a "languages.zip" file, you can load it into your Dataverse installation with the command below. -``curl http://localhost:8080/api/admin/datasetfield/loadpropertyfiles -X POST --upload-file /tmp/languages/languages.zip -H "Content-Type: application/zip"`` +:command:`curl http://localhost:8080/api/admin/datasetfield/loadpropertyfiles -X POST --upload-file /tmp/languages/languages.zip -H "Content-Type: application/zip"` Click on the languages using the drop down in the header to try them out. @@ -1247,7 +1247,7 @@ Create your own ``analytics-code.html`` file using the analytics code snippet pr Once you have created the analytics file, run this curl command to add it to your settings (using the same file location as in the example above): -``curl -X PUT -d '/var/www/dataverse/branding/analytics-code.html' http://localhost:8080/api/admin/settings/:WebAnalyticsCode`` +:command:`curl -X PUT -d '/var/www/dataverse/branding/analytics-code.html' http://localhost:8080/api/admin/settings/:WebAnalyticsCode` Tracking Button Clicks ++++++++++++++++++++++ @@ -1256,7 +1256,7 @@ The basic analytics configuration above tracks page navigation. However, it does Both Google and Matomo provide the optional capability to track such events and the Dataverse Software has added CSS style classes (btn-compute, btn-contact, btn-download, btn-explore, btn-export, btn-preview, btn-request, btn-share, and downloadCitation) to it's HTML to facilitate it. -For Google Analytics, the example script at :download:`analytics-code.html ` will track both page hits and events within your Dataverse installation. You would use this file in the same way as the shorter example above, putting it somewhere outside your deployment directory, replacing ``YOUR ACCOUNT CODE`` with your actual code and setting :WebAnalyticsCode to reference it. +For Google Analytics, the example ``analytics-code.html`` script located at ``/_static/installation/files/var/www/dataverse/branding/analytics-code.html`` will track both page hits and events within your Dataverse installation. You would use this file in the same way as the shorter example above, putting it somewhere outside your deployment directory, replacing ``YOUR ACCOUNT CODE`` with your actual code and setting :WebAnalyticsCode to reference it. Once this script is running, you can look in the Google Analytics console (Realtime/Events or Behavior/Events) and view events by type and/or the Dataset or File the event involves. @@ -1304,13 +1304,13 @@ Adding Creative Common Licenses JSON files for `Creative Commons licenses `_ are provided below. Note that a new installation of Dataverse already includes CC0 and CC BY. -- :download:`licenseCC0-1.0.json <../../../../scripts/api/data/licenses/licenseCC0-1.0.json>` -- :download:`licenseCC-BY-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-4.0.json>` -- :download:`licenseCC-BY-SA-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-SA-4.0.json>` -- :download:`licenseCC-BY-NC-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-NC-4.0.json>` -- :download:`licenseCC-BY-NC-SA-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-NC-SA-4.0.json>` -- :download:`licenseCC-BY-ND-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-ND-4.0.json>` -- :download:`licenseCC-BY-NC-ND-4.0.json <../../../../scripts/api/data/licenses/licenseCC-BY-NC-ND-4.0.json>` +- ``licenseCC0-1.0.json`` located at ``/scripts/api/data/licenses/licenseCC0-1.0.json`` +- ``licenseCC-BY-4.0.json`` located at ``/scripts/api/data/licenses/licenseCC-BY-4.0.json`` +- ``licenseCC-BY-SA-4.0.json`` located at ``/scripts/api/data/licenses/licenseCC-BY-SA-4.0.json`` +- ``licenseCC-BY-NC-4.0.json`` located at ``/scripts/api/data/licenses/licenseCC-BY-NC-4.0.json`` +- ``licenseCC-BY-NC-SA-4.0.json`` located at ``/scripts/api/data/licenses/licenseCC-BY-NC-SA-4.0.json`` +- ``licenseCC-BY-ND-4.0.json`` located at ``/scripts/api/data/licenses/licenseCC-BY-ND-4.0.json`` +- ``licenseCC-BY-NC-ND-4.0.json`` located at ``/scripts/api/data/licenses/licenseCC-BY-NC-ND-4.0.json`` .. _adding-custom-licenses: @@ -1394,23 +1394,23 @@ The minimal configuration to support an archiver integration involves adding a m \:ArchiverClassName - the fully qualified class to be used for archiving. For example: -``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchiveCommand"`` +:command:`curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.DuraCloudSubmitToArchiveCommand"` \:ArchiverSettings - the archiver class can access required settings including existing Dataverse installation settings and dynamically defined ones specific to the class. This setting is a comma-separated list of those settings. For example\: -``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":DuraCloudHost, :DuraCloudPort, :DuraCloudContext, :BagGeneratorThreads"`` +:command:`curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":DuraCloudHost, :DuraCloudPort, :DuraCloudContext, :BagGeneratorThreads"` The DPN archiver defines three custom settings, one of which is required (the others have defaults): \:DuraCloudHost - the URL for your organization's Duracloud site. For example: -``curl http://localhost:8080/api/admin/settings/:DuraCloudHost -X PUT -d "qdr.duracloud.org"`` +:command:`curl http://localhost:8080/api/admin/settings/:DuraCloudHost -X PUT -d "qdr.duracloud.org"` :DuraCloudPort and :DuraCloudContext are also defined if you are not using the defaults ("443" and "duracloud" respectively). (Note\: these settings are only in effect if they are listed in the \:ArchiverSettings. Otherwise, they will not be passed to the DuraCloud Archiver class.) It also can use one setting that is common to all Archivers: :BagGeneratorThreads -``curl http://localhost:8080/api/admin/settings/:BagGeneratorThreads -X PUT -d '8'`` +:command:`curl http://localhost:8080/api/admin/settings/:BagGeneratorThreads -X PUT -d '8'` By default, the Bag generator zips two datafiles at a time when creating the archival Bag. This setting can be used to lower that to 1, i.e. to decrease system load, or to increase it, e.g. to 4 or 8, to speed processing of many small files. @@ -1427,15 +1427,15 @@ Local Path Configuration ArchiverClassName - the fully qualified class to be used for archiving. For example\: -``curl -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand" http://localhost:8080/api/admin/settings/:ArchiverClassName`` +:command:`curl -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.LocalSubmitToArchiveCommand" http://localhost:8080/api/admin/settings/:ArchiverClassName` \:BagItLocalPath - the path to where you want to store the archival Bags. For example\: -``curl -X PUT -d /home/path/to/storage http://localhost:8080/api/admin/settings/:BagItLocalPath`` +:command:`curl -X PUT -d /home/path/to/storage http://localhost:8080/api/admin/settings/:BagItLocalPath` \:ArchiverSettings - the archiver class can access required settings including existing Dataverse installation settings and dynamically defined ones specific to the class. This setting is a comma-separated list of those settings. For example\: -``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":BagItLocalPath, :BagGeneratorThreads"`` +:command:`curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":BagItLocalPath, :BagGeneratorThreads"` :BagItLocalPath is the file path that you've set in :ArchiverSettings. See the DuraCloud Configuration section for a description of :BagGeneratorThreads. @@ -1446,9 +1446,9 @@ Google Cloud Configuration The Google Cloud Archiver can send Dataverse Archival Bags to a bucket in Google's cloud, including those in the 'Coldline' storage class (cheaper, with slower access) -``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.GoogleCloudSubmitToArchiveCommand"`` +:command:`curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.GoogleCloudSubmitToArchiveCommand"` -``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":GoogleCloudBucket, :GoogleCloudProject, :BagGeneratorThreads"`` +:command:`curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":GoogleCloudBucket, :GoogleCloudProject, :BagGeneratorThreads"` The Google Cloud Archiver defines two custom settings, both are required. It can also use the :BagGeneratorThreads setting as described in the DuraCloud Configuration section above. The credentials for your account, in the form of a json key file, must also be obtained and stored locally (see below): @@ -1456,17 +1456,17 @@ In order to use the Google Cloud Archiver, you must have a Google account. You w \:GoogleCloudBucket - the name of the bucket to use. For example: -``curl http://localhost:8080/api/admin/settings/:GoogleCloudBucket -X PUT -d "qdr-archive"`` +:command:`curl http://localhost:8080/api/admin/settings/:GoogleCloudBucket -X PUT -d "qdr-archive"` \:GoogleCloudProject - the name of the project managing the bucket. For example: -``curl http://localhost:8080/api/admin/settings/:GoogleCloudProject -X PUT -d "qdr-project"`` +:command:`curl http://localhost:8080/api/admin/settings/:GoogleCloudProject -X PUT -d "qdr-project"` The Google Cloud Archiver also requires a key file that must be renamed to 'googlecloudkey.json' and placed in the directory identified by your 'dataverse.files.directory' jvm option. This file can be created in the Google Cloud Console. (One method: Navigate to your Project 'Settings'/'Service Accounts', create an account, give this account the 'Cloud Storage'/'Storage Admin' role, and once it's created, use the 'Actions' menu to 'Create Key', selecting the 'JSON' format option. Use this as the 'googlecloudkey.json' file.) For example: -``cp /usr/local/payara5/glassfish/domains/domain1/files/googlecloudkey.json`` +:command:`cp /usr/local/payara5/glassfish/domains/domain1/files/googlecloudkey.json` .. _S3 Archiver Configuration: @@ -1475,9 +1475,9 @@ S3 Configuration The S3 Archiver can send Dataverse Archival Bag to a bucket at any S3 endpoint. The configuration for the S3 Archiver is independent of any S3 store that may be configured in Dataverse and may, for example, leverage colder (cheaper, slower access) storage. -``curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.S3SubmitToArchiveCommand"`` +:command:`curl http://localhost:8080/api/admin/settings/:ArchiverClassName -X PUT -d "edu.harvard.iq.dataverse.engine.command.impl.S3SubmitToArchiveCommand"` -``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":S3ArchiverConfig, :BagGeneratorThreads"`` +:command:`curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":S3ArchiverConfig, :BagGeneratorThreads"` The S3 Archiver defines one custom setting, a required :S3ArchiverConfig. It can also use the :BagGeneratorThreads setting as described in the DuraCloud Configuration section above. @@ -1487,7 +1487,7 @@ The :S3ArchiverConfig setting is a JSON object that must include an "s3_bucket_n \:S3ArchiverConfig - minimally includes the name of the bucket to use. For example: -``curl http://localhost:8080/api/admin/settings/:S3ArchiverConfig -X PUT -d '{"s3_bucket_name":"archival-bucket"}'`` +:command:`curl http://localhost:8080/api/admin/settings/:S3ArchiverConfig -X PUT -d '{"s3_bucket_name":"archival-bucket"}'` \:S3ArchiverConfig - example to also set the name of an S3 profile to use. For example: @@ -1563,7 +1563,7 @@ Ensure robots.txt Is Not Blocking Search Engines ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For a public production Dataverse installation, it is probably desired that search agents be able to index published pages (AKA - pages that are visible to an unauthenticated user). -Polite crawlers usually respect the `Robots Exclusion Standard `_; we have provided an example of a production robots.txt :download:`here `). +Polite crawlers usually respect the `Robots Exclusion Standard `_; we have provided an example of a production robots.txt at ``/_static/util/robots.txt``). We **strongly recommend** using the crawler rules in the sample robots.txt linked above. Note that they make the Dataverse collection and dataset pages accessible to the search engine bots; but discourage them from actually crawling the site, by following any search links - facets and such - on the Dataverse collection pages. Such crawling is very inefficient in terms of system resources, and often results in confusing search results for the end users of the search engines (for example, when partial search results are indexed as individual pages). @@ -2363,16 +2363,14 @@ stored procedure or function (the assumed default setting is ``randomString``). In addition to this setting, a stored procedure or function must be created in the database. -As a first example, the script below (downloadable -:download:`here `) produces +As a first example, the ``createsequence.sql`` script located at ``/_static/util/createsequence.sql``) produces sequential numerical values. You may need to make some changes to suit your system setup, see the comments for more information: .. literalinclude:: ../_static/util/createsequence.sql :language: plpgsql -As a second example, the script below (downloadable -:download:`here `) produces +As a second example, the ``identifier_from_timestamp.sql`` script located at ``/_static/util/identifier_from_timestamp.sql``) produces sequential 8 character identifiers from a base36 representation of current timestamp. @@ -2909,7 +2907,7 @@ Recommended setting: 20. Changes the default info message displayed when a user is required to change their password on login. The default is: -``{0} Reset Password{1} – Our password requirements have changed. Please pick a strong password that matches the criteria below.`` +``{0} Reset Password{1} - Our password requirements have changed. Please pick a strong password that matches the criteria below.`` Where the {0} and {1} denote surrounding HTML **bold** tags. It's recommended to put a single space before your custom message for better appearance (as in the default message above). Including the {0} and {1} to bolden part of your message is optional. @@ -3174,7 +3172,7 @@ or Allows Cross-Origin Resource sharing(CORS). By default this setting is absent and the Dataverse Software assumes it to be true. -If you don’t want to allow CORS for your installation, set: +If you don't want to allow CORS for your installation, set: ``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:AllowCors`` @@ -3183,7 +3181,7 @@ If you don’t want to allow CORS for your installation, set: Unlike other facets, those indexed by Date/Year are sorted chronologically by default, with the most recent value first. To have them sorted by number of hits, e.g. with the year with the most results first, set this to false -If you don’t want date facets to be sorted chronologically, set: +If you don't want date facets to be sorted chronologically, set: ``curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:ChronologicalDateFacets`` @@ -3208,7 +3206,7 @@ To enable redirects to the zipper on a different server: Number of errors to display to the user when creating DataFiles from a file upload. It defaults to 5 errors. -``curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:CreateDataFilesMaxErrorsToDisplay`` +:command:`curl -X PUT -d '1' http://localhost:8080/api/admin/settings/:CreateDataFilesMaxErrorsToDisplay` .. _:BagItHandlerEnabled: @@ -3217,7 +3215,7 @@ Number of errors to display to the user when creating DataFiles from a file uplo Part of the database settings to configure the BagIt file handler. Enables the BagIt file handler. By default, the handler is disabled. -``curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:BagItHandlerEnabled`` +:command:`curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:BagItHandlerEnabled` .. _:BagValidatorJobPoolSize: @@ -3226,7 +3224,7 @@ Part of the database settings to configure the BagIt file handler. Enables the B Part of the database settings to configure the BagIt file handler. The number of threads the checksum validation class uses to validate a single zip file. Defaults to 4 threads -``curl -X PUT -d '10' http://localhost:8080/api/admin/settings/:BagValidatorJobPoolSize`` +:command:`curl -X PUT -d '10' http://localhost:8080/api/admin/settings/:BagValidatorJobPoolSize` .. _:BagValidatorMaxErrors: @@ -3235,7 +3233,7 @@ Part of the database settings to configure the BagIt file handler. The number of Part of the database settings to configure the BagIt file handler. The maximum number of errors allowed before the validation job aborts execution. This is to avoid processing the whole BagIt package. Defaults to 5 errors. -``curl -X PUT -d '2' http://localhost:8080/api/admin/settings/:BagValidatorMaxErrors`` +:command:`curl -X PUT -d '2' http://localhost:8080/api/admin/settings/:BagValidatorMaxErrors` .. _:BagValidatorJobWaitInterval: @@ -3244,7 +3242,7 @@ Part of the database settings to configure the BagIt file handler. The maximum n Part of the database settings to configure the BagIt file handler. This is the period in seconds to check for the number of errors during validation. Defaults to 10. -``curl -X PUT -d '60' http://localhost:8080/api/admin/settings/:BagValidatorJobWaitInterval`` +:command:`curl -X PUT -d '60' http://localhost:8080/api/admin/settings/:BagValidatorJobWaitInterval` :ArchiverClassName ++++++++++++++++++ @@ -3260,14 +3258,14 @@ For examples, see the specific configuration above in :ref:`BagIt Export`. Each Archiver class may have its own custom settings. Along with setting which Archiver class to use, one must use this setting to identify which setting values should be sent to it when it is invoked. The value should be a comma-separated list of setting names. For example, the LocalSubmitToArchiveCommand only uses the :BagItLocalPath setting. To allow the class to use that setting, this setting must set as: -``curl -X PUT -d ':BagItLocalPath' http://localhost:8080/api/admin/settings/:ArchiverSettings`` +:command:`curl -X PUT -d ':BagItLocalPath' http://localhost:8080/api/admin/settings/:ArchiverSettings` :BagGeneratorThreads ++++++++++++++++++++ An archiver setting shared by several implementations (e.g. DuraCloud, Google, and Local) that can make Bag generation use fewer or more threads in zipping datafiles that the default of 2 -``curl http://localhost:8080/api/admin/settings/:BagGeneratorThreads -X PUT -d '8'`` +:command:`curl http://localhost:8080/api/admin/settings/:BagGeneratorThreads -X PUT -d '8'` :DuraCloudHost ++++++++++++++ @@ -3316,7 +3314,7 @@ In the DDI metadata exports, the default behavior is to always add the repositor A comma-separated list of field type names that should be 'withheld' when dataset access occurs via a Private Url with Anonymized Access (e.g. to support anonymized review). A suggested minimum includes author, datasetContact, and contributor, but additional fields such as depositor, grantNumber, and publication might also need to be included. -``curl -X PUT -d 'author, datasetContact, contributor, depositor, grantNumber, publication' http://localhost:8080/api/admin/settings/:AnonymizedFieldTypeNames`` +:command:`curl -X PUT -d 'author, datasetContact, contributor, depositor, grantNumber, publication' http://localhost:8080/api/admin/settings/:AnonymizedFieldTypeNames` :DatasetChecksumValidationSizeLimit +++++++++++++++++++++++++++++++++++ @@ -3325,7 +3323,7 @@ Setting ``DatasetChecksumValidationSizeLimit`` to a threshold in bytes, disables For example, if you want your Dataverse installation to skip validation for any dataset larger than 5 GB while publishing, use this setting: -``curl -X PUT -d 5000000000 http://localhost:8080/api/admin/settings/:DatasetChecksumValidationSizeLimit`` +:command:`curl -X PUT -d 5000000000 http://localhost:8080/api/admin/settings/:DatasetChecksumValidationSizeLimit` When this option is used to disable the checksum validation, it's strongly recommended to perform periodic asynchronous checks via the integrity API @@ -3340,7 +3338,7 @@ Setting ``DataFileChecksumValidationSizeLimit`` to a threshold in bytes, disable For example, if you want your Dataverse installation to skip validation for any data files larger than 2 GB while publishing, use this setting: -``curl -X PUT -d 2000000000 http://localhost:8080/api/admin/settings/:DataFileChecksumValidationSizeLimit`` +:command:`curl -X PUT -d 2000000000 http://localhost:8080/api/admin/settings/:DataFileChecksumValidationSizeLimit` When this option is used to disable the checksum validation, it's strongly recommended to perform periodic asynchronous checks via the integrity API @@ -3353,7 +3351,7 @@ Also refer to the "Datafile Integrity" API :ref:`datafile-integrity` A boolean setting that, if true, will send an email and notification to users when a Dataset is created. Messages go to those, other than the dataset creator, who have the ability/permission necessary to publish the dataset. The intent of this functionality is to simplify tracking activity and planning to follow-up contact. -``curl -X PUT -d true http://localhost:8080/api/admin/settings/:SendNotificationOnDatasetCreation`` +:command:`curl -X PUT -d true http://localhost:8080/api/admin/settings/:SendNotificationOnDatasetCreation` .. _:CVocConf: @@ -3364,9 +3362,9 @@ A JSON-structured setting that configures Dataverse to associate specific metada Scripts that implement this association for specific service protocols are maintained at https://github.com/gdcc/dataverse-external-vocab-support. That repository also includes a json-schema for validating the structure required by this setting along with an example metadatablock and sample :CVocConf setting values associating entries in the example block with ORCID and SKOSMOS based services. -``wget https://gdcc.github.io/dataverse-external-vocab-support/examples/config/cvoc-conf.json`` +:command:`wget https://gdcc.github.io/dataverse-external-vocab-support/examples/config/cvoc-conf.json` -``curl -X PUT --upload-file cvoc-conf.json http://localhost:8080/api/admin/settings/:CVocConf`` +:command:`curl -X PUT --upload-file cvoc-conf.json http://localhost:8080/api/admin/settings/:CVocConf` .. _:ControlledVocabularyCustomJavaScript: @@ -3377,11 +3375,11 @@ Scripts that implement this association for specific service protocols are maint To specify the URL for a custom script ``covoc.js`` to be loaded from an external site: -``curl -X PUT -d 'https://example.com/js/covoc.js' http://localhost:8080/api/admin/settings/:ControlledVocabularyCustomJavaScript`` +:command:`curl -X PUT -d 'https://example.com/js/covoc.js' http://localhost:8080/api/admin/settings/:ControlledVocabularyCustomJavaScript` To remove the custom script URL: -``curl -X DELETE http://localhost:8080/api/admin/settings/:ControlledVocabularyCustomJavaScript`` +:command:`curl -X DELETE http://localhost:8080/api/admin/settings/:ControlledVocabularyCustomJavaScript` Please note that :ref:`:CVocConf` is a better option if the list is large or needs to be searchable from an external service using protocols such as SKOSMOS. @@ -3397,7 +3395,7 @@ A dataset may only have one label at a time and if a label is set, it will be re This functionality is disabled when this setting is empty/not set. Each set of labels is identified by a curationLabelSet name and a JSON Array of the labels allowed in that set. -``curl -X PUT -d '{"Standard Process":["Author contacted", "Privacy Review", "Awaiting paper publication", "Final Approval"], "Alternate Process":["State 1","State 2","State 3"]}' http://localhost:8080/api/admin/settings/:AllowedCurationLabels`` +:command:`curl -X PUT -d '{"Standard Process":["Author contacted", "Privacy Review", "Awaiting paper publication", "Final Approval"], "Alternate Process":["State 1","State 2","State 3"]}' http://localhost:8080/api/admin/settings/:AllowedCurationLabels` If the Dataverse Installation supports multiple languages, the curation label translations should be added to the ``CurationLabels`` properties files. (See :ref:`i18n` for more on properties files and internationalization in general.) Since the Curation labels are free text, while creating the key, it has to be converted to lowercase, replace space with underscore. @@ -3414,7 +3412,7 @@ Example:: By default, custom terms of data use and access can be specified after selecting "Custom Terms" from the License/DUA dropdown on the Terms tab. When ``:AllowCustomTermsOfUse`` is set to ``false`` the "Custom Terms" item is not made available to the depositor. -``curl -X PUT -d false http://localhost:8080/api/admin/settings/:AllowCustomTermsOfUse`` +:command:`curl -X PUT -d false http://localhost:8080/api/admin/settings/:AllowCustomTermsOfUse` .. _:MaxEmbargoDurationInMonths: @@ -3425,7 +3423,7 @@ This setting controls whether embargoes are allowed in a Dataverse instance and setting indicates embargoes are not supported. A value of -1 allows embargoes of any length. Any other value indicates the maximum number of months (from the current date) a user can enter for an embargo end date. This limit will be enforced in the popup dialog in which users enter the embargo date. For example, to set a two year maximum: -``curl -X PUT -d 24 http://localhost:8080/api/admin/settings/:MaxEmbargoDurationInMonths`` +:command:`curl -X PUT -d 24 http://localhost:8080/api/admin/settings/:MaxEmbargoDurationInMonths` :DataverseMetadataValidatorScript +++++++++++++++++++++++++++++++++ @@ -3434,7 +3432,7 @@ An optional external script that validates Dataverse collection metadata as it's For example, once the following setting is created: -``curl -X PUT -d /usr/local/bin/dv_validator.sh http://localhost:8080/api/admin/settings/:DataverseMetadataValidatorScript`` +:command:`curl -X PUT -d /usr/local/bin/dv_validator.sh http://localhost:8080/api/admin/settings/:DataverseMetadataValidatorScript` :DataverseMetadataPublishValidationFailureMsg +++++++++++++++++++++++++++++++++++++++++++++ @@ -3443,7 +3441,7 @@ Specifies a custom error message shown to the user when a Dataverse collection f For example: -``curl -X PUT -d "This content needs to go through an additional review by the Curation Team before it can be published." http://localhost:8080/api/admin/settings/:DataverseMetadataPublishValidationFailureMsg`` +:command:`curl -X PUT -d "This content needs to go through an additional review by the Curation Team before it can be published." http://localhost:8080/api/admin/settings/:DataverseMetadataPublishValidationFailureMsg` :DataverseMetadataUpdateValidationFailureMsg @@ -3459,7 +3457,7 @@ An optional external script that validates dataset metadata during publishing. T For example: -``curl -X PUT -d /usr/local/bin/ds_validator.sh http://localhost:8080/api/admin/settings/:DatasetMetadataValidatorScript`` +:command:`curl -X PUT -d /usr/local/bin/ds_validator.sh http://localhost:8080/api/admin/settings/:DatasetMetadataValidatorScript` In some ways this duplicates a workflow mechanism, since it is possible to define a workflow with additional validation steps. But please note that the important difference is that this external validation happens *synchronously*, while the user is wating; while a workflow is performed asynchronously with a lock placed on the dataset. This can be useful to some installations, in some situations. But it also means that the script provided should be expected to always work reasonably fast - ideally, in seconds, rather than minutes, etc. @@ -3470,7 +3468,7 @@ Specifies a custom error message shown to the user when a dataset fails an exter For example: -``curl -X PUT -d "This content needs to go through an additional review by the Curation Team before it can be published." http://localhost:8080/api/admin/settings/:DatasetMetadataValidationFailureMsg`` +:command:`curl -X PUT -d "This content needs to go through an additional review by the Curation Team before it can be published." http://localhost:8080/api/admin/settings/:DatasetMetadataValidationFailureMsg` :ExternalValidationAdminOverride @@ -3487,11 +3485,11 @@ This setting is a comma-separated list of the new tags. To override the default list with Docs, Data, Code, and Workflow: -``curl -X PUT -d 'Docs,Data,Code,Workflow' http://localhost:8080/api/admin/settings/:FileCategories`` +:command:`curl -X PUT -d 'Docs,Data,Code,Workflow' http://localhost:8080/api/admin/settings/:FileCategories` To remove the override and go back to the default list: -``curl -X PUT -d '' http://localhost:8080/api/admin/settings/:FileCategories`` +:command:`curl -X PUT -d '' http://localhost:8080/api/admin/settings/:FileCategories` .. _:ShowMuteOptions: diff --git a/doc/sphinx-guides/source/installation/installation-main.rst b/doc/sphinx-guides/source/installation/installation-main.rst index 8559d6ce194..5dc94822d26 100755 --- a/doc/sphinx-guides/source/installation/installation-main.rst +++ b/doc/sphinx-guides/source/installation/installation-main.rst @@ -5,7 +5,7 @@ Installation Now that the :doc:`prerequisites` are in place, we are ready to execute the Dataverse Software installation script (the "installer") and verify that the installation was successful by logging in with a "superuser" account. .. contents:: |toctitle| - :local: + :local: .. _dataverse-installer: @@ -40,7 +40,7 @@ Read the installer script directions like this:: $ cd dvinstall $ less README_python.txt -Alternatively you can download :download:`README_python.txt <../../../../scripts/installer/README_python.txt>` from this guides. +Alternatively you can use the `README_python.txt` located at `/scripts/installer/README_python.txt` from this guide. Follow the instructions in the text file. @@ -66,7 +66,7 @@ The script will prompt you for some configuration values. If this is a test/eval - Postgres admin password - We'll need it in order to create the database and user for the Dataverse Software installer to use, without having to run the installer as root. If you don't know your Postgres admin password, you may simply set the authorization level for localhost to "trust" in the PostgreSQL ``pg_hba.conf`` file (See the PostgreSQL section in the Prerequisites). If this is a production environment, you may want to change it back to something more secure, such as "password" or "md5", after the installation is complete. - Network address of a remote Solr search engine service (if needed) - In most cases, you will be running your Solr server on the same host as the Dataverse Software application (then you will want to leave this set to the default value of ``LOCAL``). But in a serious production environment you may set it up on a dedicated separate server. -If desired, these default values can be configured by creating a ``default.config`` (example :download:`here <../../../../scripts/installer/default.config>`) file in the installer's working directory with new values (if this file isn't present, the above defaults will be used). +If desired, these default values can be configured by creating a ``default.config`` (located at ``/scripts/installer/default.config``) file in the installer's working directory with new values (if this file isn't present, the above defaults will be used). This allows the installer to be run in non-interactive mode (with ``./install -y -f > install.out 2> install.err``), which can allow for easier interaction with automated provisioning tools. @@ -163,15 +163,15 @@ For the Payara console, load a browser with your domain online, navigate to http When fine tuning your JavaMail Session, there are a number of fields you can edit. The most important are: -+ **Mail Host:** Desired mail host’s DNS address (e.g. smtp.gmail.com) ++ **Mail Host:** Desired mail host's DNS address (e.g. smtp.gmail.com) + **Default User:** Username mail host will recognize (e.g. user\@gmail.com) + **Default Sender Address:** Email address that your Dataverse installation will send mail from Depending on the SMTP server you're using, you may need to add additional properties at the bottom of the page (below "Advanced"). -From the "Add Properties" utility at the bottom, use the “Add Property” button for each entry you need, and include the name / corresponding value as needed. Descriptions are optional, but can be used for your own organizational needs. +From the "Add Properties" utility at the bottom, use the "Add Property" button for each entry you need, and include the name / corresponding value as needed. Descriptions are optional, but can be used for your own organizational needs. -**Note:** These properties are just an example. You may need different/more/fewer properties all depending on the SMTP server you’re using. +**Note:** These properties are just an example. You may need different/more/fewer properties all depending on the SMTP server you're using. ============================== ============================== Name Value @@ -181,9 +181,9 @@ mail.smtp.password [Default User password*] mail.smtp.port [Port number to route through] ============================== ============================== -**\*WARNING**: Entering a password here will *not* conceal it on-screen. It’s recommended to use an *app password* (for smtp.gmail.com users) or utilize a dedicated/non-personal user account with SMTP server auths so that you do not risk compromising your password. +**\*WARNING**: Entering a password here will *not* conceal it on-screen. It's recommended to use an *app password* (for smtp.gmail.com users) or utilize a dedicated/non-personal user account with SMTP server auths so that you do not risk compromising your password. -If your installation’s mail host uses SSL (like smtp.gmail.com) you’ll need these name/value pair properties in place: +If your installation's mail host uses SSL (like smtp.gmail.com) you'll need these name/value pair properties in place: ====================================== ============================== Name Value diff --git a/doc/sphinx-guides/source/installation/prerequisites.rst b/doc/sphinx-guides/source/installation/prerequisites.rst index 59de507a264..facfc29fcb1 100644 --- a/doc/sphinx-guides/source/installation/prerequisites.rst +++ b/doc/sphinx-guides/source/installation/prerequisites.rst @@ -9,7 +9,7 @@ Before running the Dataverse Software installation script, you must install and After following all the steps below, you can proceed to the :doc:`installation-main` section. .. contents:: |toctitle| - :local: + :local: Linux ----- @@ -80,9 +80,9 @@ Launching Payara on System Boot The Dataverse Software installation script will start Payara if necessary, but you may find the following scripts helpful to launch Payara start automatically on boot. They were originally written for Glassfish but have been adjusted for Payara. -- This :download:`Systemd file<../_static/installation/files/etc/systemd/payara.service>` may be serve as a reference for systems using Systemd (such as RHEL/derivative or Debian 10, Ubuntu 16+) -- This :download:`init script<../_static/installation/files/etc/init.d/payara.init.service>` may be useful for RHEL/derivative or Ubuntu >= 14 if you're using a Payara service account, or -- This :download:`Payara init script <../_static/installation/files/etc/init.d/payara.init.root>` may be helpful if you're just going to run Payara as root (not recommended). +- The ``Systemd file`` located at ``/_static/installation/files/etc/systemd/payara.service`` may be serve as a reference for systems using Systemd (such as RHEL/derivative or Debian 10, Ubuntu 16+) +- The ``init script`` located at ``/_static/installation/files/etc/init.d/payara.init.service`` may be useful for RHEL/derivative or Ubuntu >= 14 if you're using a Payara service account, or +- The ``Payara init script`` located at ``/_static/installation/files/etc/init.d/payara.init.root`` may be helpful if you're just going to run Payara as root (not recommended). It is not necessary for Payara to be running before you execute the Dataverse Software installation script; it will start Payara for you. @@ -211,14 +211,14 @@ Solr Init Script Please choose the right option for your underlying Linux operating system. It will not be necessary to execute both! -For systems running systemd (like RedHat or derivatives since 7, Debian since 9, Ubuntu since 15.04), as root, download :download:`solr.service<../_static/installation/files/etc/systemd/solr.service>` and place it in ``/tmp``. Then start Solr and configure it to start at boot with the following commands:: +For systems running systemd (like RedHat or derivatives since 7, Debian since 9, Ubuntu since 15.04), as root, use the ``solr.service`` located at ``/_static/installation/files/etc/systemd/solr.service`` and place it in ``/tmp``. Then start Solr and configure it to start at boot with the following commands:: cp /tmp/solr.service /etc/systemd/system systemctl daemon-reload systemctl start solr.service systemctl enable solr.service -For systems using init.d (like CentOS 6), download this :download:`Solr init script <../_static/installation/files/etc/init.d/solr>` and place it in ``/tmp``. Then start Solr and configure it to start at boot with the following commands:: +For systems using init.d (like CentOS 6), use the ``Solr init script`` located at ``/_static/installation/files/etc/init.d/solr`` and place it in ``/tmp``. Then start Solr and configure it to start at boot with the following commands:: cp /tmp/solr /etc/init.d service start solr @@ -305,7 +305,7 @@ Installing R For RHEL/derivative, the EPEL distribution is strongly recommended: -If :fixedwidthplain:`yum` isn't configured to use EPEL repositories ( https://fedoraproject.org/wiki/EPEL ): +If :fixedwidthplain:`yum` isn't configured to use EPEL repositories (https://fedoraproject.org/wiki/EPEL): RHEL8/derivative users can install the epel-release RPM:: @@ -325,8 +325,8 @@ Rocky or AlmaLinux 8.3+ users will need to enable the PowerTools repository:: RHEL 7 users will want to log in to their organization's respective RHN interface, find the particular machine in question and: -• click on "Subscribed Channels: Alter Channel Subscriptions" -• enable EPEL, Server Extras, Server Optional +- click on "Subscribed Channels: Alter Channel Subscriptions" +- enable EPEL, Server Extras, Server Optional Finally, install R with :fixedwidthplain:`yum`:: @@ -377,8 +377,8 @@ for the daemon (:fixedwidthplain:`/etc/init.d/rserve`), so that it gets started automatically when the system boots. This is an :fixedwidthplain:`init.d`-style startup file. If this is a RedHat/CentOS 7 system, you may want to use the -:download:`rserve.service<../../../../scripts/r/rserve/rserve.service>` -systemd unit file instead. Copy it into the /usr/lib/systemd/system/ directory, then:: +`/scripts/r/rserve/rserve.service` systemd unit file instead. +Copy it into the /usr/lib/systemd/system/ directory, then:: # systemctl daemon-reload # systemctl enable rserve diff --git a/doc/sphinx-guides/source/user/dataset-management.rst b/doc/sphinx-guides/source/user/dataset-management.rst index af58cf3c81b..c0a35d0c175 100755 --- a/doc/sphinx-guides/source/user/dataset-management.rst +++ b/doc/sphinx-guides/source/user/dataset-management.rst @@ -174,7 +174,7 @@ BagIt Support BagIt is a set of hierarchical file system conventions designed to support disk-based storage and network transfer of arbitrary digital content. It offers several benefits such as integration with digital libraries, easy implementation, and transfer validation. See `the Wikipedia article `__ for more information. -If the Dataverse installation you are using has enabled BagIt file handling, when uploading BagIt files the repository will validate the checksum values listed in each BagIt’s manifest file against the uploaded files and generate errors about any mismatches. The repository will identify a certain number of errors, such as the first five errors in each BagIt file, before reporting the errors. +If the Dataverse installation you are using has enabled BagIt file handling, when uploading BagIt files the repository will validate the checksum values listed in each BagIt's manifest file against the uploaded files and generate errors about any mismatches. The repository will identify a certain number of errors, such as the first five errors in each BagIt file, before reporting the errors. |bagit-image1| From 0fdfa1eb5096ad56e95e0dac2074223ad804488c Mon Sep 17 00:00:00 2001 From: "w. Patrick Gale" Date: Mon, 20 Mar 2023 18:58:48 -0400 Subject: [PATCH 2/2] fix --- doc/sphinx-guides/source/container/base-image.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/sphinx-guides/source/container/base-image.rst b/doc/sphinx-guides/source/container/base-image.rst index 1cca8e56072..62e75b1a636 100644 --- a/doc/sphinx-guides/source/container/base-image.rst +++ b/doc/sphinx-guides/source/container/base-image.rst @@ -135,6 +135,8 @@ provides. These are mostly based on environment variables (very common with cont .. [postboot] `${CONFIG_DIR}/post-boot-commands.asadmin` .. [dump-option] `-XX:+HeapDumpOnOutOfMemoryError` +.. _base-locations: + Locations +++++++++