Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions doc/sphinx-guides/HowToSphinxDockerBuild.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# About these instructions

The purpose of this document is to provide instruction on how to build a fresh copy of the Dataverse Sphinx documentation. It will focus on using (Docker scripts)[https://www.docker.com/] to setup the build environment. If you need help with Sphinx, visit [https://www.sphinx-doc.org].

The following instructions were written for a bash in a Linux environment (WSL terminal on Windows 11 machine), but should apply to most unix environments.

Replace `/mnt/q/GitHubRepos/dataverse/doc/sphinx-guides` with your absolute path to the `doc/sphinx-guides` directory on your computer.

## Configuring your environment variables

To simplify the instructions you need to create variables for your environment.
Your root Dataverse sphinx-guide path will be set using a `ROOT_DATAVERSE_SG` variable:
`export ROOT_DATAVERSE_SG="/mnt/q/GitHubRepos/dataverse/doc/sphinx-guides"`

Next create a variable `DKR_DV_PDF` to store the unique name for the Docker image being created to run the PDF Sphinx builds.
`export DKR_DV_PDF="sddi_pdf"`

Next create a variable `DKR_DV_HTML` to store the unique name for the Docker image being created to run the HTML Sphinx builds.
`export DKR_DV_HTML="sddi_html"`

This next variable `DKR_DV_HTML_VIEW` is on used if you wish to test the HTML documentation in a local Apache container within Docker.
`export DKR_DV_HTML_VIEW="sddi_html_view"`

Or, just run all of the commands at one (copy and paste into terminal)

```s bash
(
export ROOT_DATAVERSE_SG="/mnt/q/GitHubRepos/dataverse/doc/sphinx-guides"
export DKR_DV_PDF="sddi_pdf"
export DKR_DV_HTML="sddi_html"
export DKR_DV_HTML_VIEW="sddi_html_view"
)
```

## PDF build scripts using Docker

First you will generate the PDF version of the Dataverse documentation. The reason for this is HTML documentation creates a link to this PDF file. Below are the bash commands to build a Docker image using Sphinx and Latex.

```s (bash)
# change to the `SphinxDocBuildPDF` directory
cd $ROOT_DATAVERSE_SG/SphinxDocBuildPDF
# build the Docker image from the Dockerfile script
docker build -t $DKR_DV_PDF .
# change back to the root Dataverse documentation directory
cd $ROOT_DATAVERSE_SG
# create the PDF version of the Dataverse documentation (since you need this for the HTML docs); if errors are thrown at this point, adjust the documentation file in question and rerun this command
docker run --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_PDF make latexpdf
# copy the freshly built PDF file to the source folder under static files (your HTML build will be looking for this file)
cp $ROOT_DATAVERSE_SG/build/latex/Dataverse.pdf "$ROOT_DATAVERSE_SG/source/_static"
```

## HTML build scripts using Docker

Next you will generate the Docker image for building the HTML version of the Dataverse documentation. Below are the bash commands to build a Docker image using Sphinx:

```s (bash)
cd $ROOT_DATAVERSE_SG/SphinxDocBuildHtml
docker build -t $DKR_DV_HTML .
```

Lastly, you can make the Dataverse Sphinx documentation. ***Note the `/[project directory]:/docs` command below gives the impression that a `docs` folder should exist, but this is just standard Sphinx syntax and Sphinx will look through the `source` directory.***

```s (bash)
cd $ROOT_DATAVERSE_SG
## remove the existing docker image for HTML processing if needed
# docker image rm -f $DKR_DV_HTML
## if you are rerunning the HTML build then you need to remove the /build/html directory so it can be recreated
# rm -r $ROOT_DATAVERSE_SG/build/html
docker run --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_HTML make html
```

To see the documentation build, simply open the `build\html\index.html` file in your web browser.

```s (bash)
# copy the `SphinxDocLocalHtmlImage/Dockerfile` to the build directory if you are wanting to run a localhost example of the generated documentation (since Docker is only able to 'look within/below' the current directory of the Dockerfile)
cp $ROOT_DATAVERSE_SG/SphinxDocLocalHtmlImage/Dockerfile $ROOT_DATAVERSE_SG/build
# you need to copy the font awesome font files to the html build directory since the `sphinxcontrib.icon` module is not including them
cp -r $ROOT_DATAVERSE_SG/source/_font $ROOT_DATAVERSE_SG/build/html
# change directories to the Sphinx build
cd $ROOT_DATAVERSE_SG/build
# create a Docker static documents image with a copy of the freshly built Sphinx docs
docker build -t $DKR_DV_HTML_VIEW .
# start an Apache container running the static documents image
docker run --publish 80:80 --detach --name localhost_sddi $DKR_DV_HTML_VIEW
# visit http://localhost/index.html in a browser to test the HTML documentation
```

## Issues with PDF output

**DO NOT nest HTML/documentation lists more than three deep (see the issue regarding this on GitHub at [https://github.com/IQSS/dataverse/issues/9277]).**

If you see errors when building the PDF you can copy the `Dataverse.tex` file contents under the `/build/latex` directory into a LaTeX checker such as [https://www.dainiak.com/latexcheck], but if you run into problems such as the nested documentation lists then the errors can be unhelpful (but likely the documentation file appearing in the error is causing problems in some way).

### Check the code of the Dataverse.tex file using https://www.dainiak.com/latexcheck/.
- one of the common problems is non-ASCII text being used (such as The character U+2019 "’" could be confused with the character U+0060 "`", which is more common in source code)
- also do not include emojis in the documentation
- If you would like to search for possible problematic characters run `LC_ALL=C find . -type f -exec grep -c -P -n "[^\x00-\x7F]" {} + ` within the source directory (any files with non-ASCII characters will have a number to the right greater than zero). If the `./developers/dependencies.rst` file happens to have any non-ASCII characters then you can check the location of the characters using `LC_ALL=C grep --color='auto' -P -n "[\x80-\xFF]" ./developers/dependencies.rst`. Note: not all non-ASCII characters are problematic.

## If you are new to Sphinx then you can use the following Docker command to create a starter Sphinx environment

You use the Docker image you just built to create the Sphinx project template:

```s (bash)
docker run -it --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_HTML sphinx-quickstart
```

At this point you have a boilerplate `source` folder with some `hello world` documentation. You can copy the dataverse documentation source from the `dataverse\doc\sphinx-guides\source` GitHub directory and replace the boiler plate source directory Sphinx just created for us.

## Changelog for files

The following updates were performed due to the files throwing errors on the PDF builds.

\doc\sphinx-guides\source\admin\metadatacustomization.rst
- had to remove the table and convert to CSV table, removed non-ASCII characters, and links to download files (which are not stored in the document downloads and it does not make sense to have these files downloadable for those already working with the code)
- adding metadataBlockProperties.tsv, datasetFieldProperties.tsv, controlledVocabularyProperties.tsv, fieldTypeDefinitions.tsv, displayFormatVariables.tsv

\doc\sphinx-guides\source\api\native-api.rst
- removed non-ASCII characters and links to download code files
- added \docs\source\_static\api\dataset-package-files.json

\doc\sphinx-guides\source\developers\big-data-support.rst
- explictly stated commands and removed code file download
- added /docs/source/_static/api/add-storage-site.json

\doc\sphinx-guides\source\developers\dev-environment.rst
\doc\sphinx-guides\source\developers\make-data-count.rst
- removed code file download

\doc\sphinx-guides\source\developers\testing.rst
\doc\sphinx-guides\source\developers\troubleshooting.rst
- removed non-ASCII characters, and links to download files

\doc\sphinx-guides\source\installation\advanced.rst
- links to download files

\doc\sphinx-guides\source\installation\config.rst
- removed non-ASCII characters, explictly stated commands, and links to download files

\doc\sphinx-guides\source\installation\installation-main.rst
\doc\sphinx-guides\source\installation\prerequisites.rst
- removed non-ASCII characters and links to download files

\doc\sphinx-guides\source\admin\timers.rst
- removed nested lists

\doc\sphinx-guides\source\developers\workflows.rst
- removed non-ASCII characters

\doc\sphinx-guides\source\container\base-image.rst
- tied `:widths: auto` for csv-table but the description would not wrap the description
12 changes: 12 additions & 0 deletions doc/sphinx-guides/SphinxDocBuildHtml/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# get latest sphinx image
FROM sphinxdoc/sphinx:latest

RUN export DEBIAN_FRONTEND=noninteractive \
&& apt-get update && apt-get install --yes --no-install-recommends wget rsync git && \
apt-get autoremove -y && \
pip3 install --upgrade pip setuptools && \
rm -r /root/.cache

WORKDIR /docs
ADD requirements.txt /docs
RUN pip3 install -r requirements.txt
2 changes: 2 additions & 0 deletions doc/sphinx-guides/SphinxDocBuildHtml/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sphinx_bootstrap_theme
sphinx-icon
12 changes: 12 additions & 0 deletions doc/sphinx-guides/SphinxDocBuildPDF/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# get latest sphinx image with latext pdf
FROM sphinxdoc/sphinx-latexpdf:latest

RUN export DEBIAN_FRONTEND=noninteractive \
&& apt-get update && apt-get install --yes --no-install-recommends wget rsync git && \
apt-get autoremove -y && \
pip3 install --upgrade pip setuptools && \
rm -r /root/.cache

WORKDIR /docs
ADD requirements.txt /docs
RUN pip3 install -r requirements.txt
3 changes: 3 additions & 0 deletions doc/sphinx-guides/SphinxDocBuildPDF/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sphinx_bootstrap_theme
sphinx-icon
pdflatex
3 changes: 3 additions & 0 deletions doc/sphinx-guides/SphinxDocLocalHtmlImage/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Note: this file MUST BE in the root build directory
FROM httpd:2.4
COPY ./html/ /usr/local/apache2/htdocs/
3 changes: 3 additions & 0 deletions doc/sphinx-guides/SphinxDocLocalHtmlImage/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# About this directory

The Dockerfile in this directory needs to be copied to the /build directory where the static documentation files reside. The Docker file will copy the documentation files and to a Docker image that can be run using a simple Docker Apache container. See the [HOWTO-SPHINX-INSTALL.md] instructions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Property Purpose Allowed values and restrictions
DatasetField Specifies the #datasetField to which #datasetField to which this entry applies. "Must reference an existing #datasetField. As a best practice, the value should reference a #datasetField in the current metadata block definition. (It is technically possible to reference an existing #datasetField from another metadata block.)"
Value "A short display string, representing an enumerated value for this field. If the identifier property is empty, this value is used as the identifier." Free text
identifier "A string used to encode the selected enumerated value of a field. If this property is empty, the value of the “Value” field is used as the identifier." Free text
displayOrder Control the order in which the enumerated values are displayed for selection. Non-negative integer.
40 changes: 40 additions & 0 deletions doc/sphinx-guides/source/_static/admin/datasetFieldProperties.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Property Purpose Allowed values and restrictions
name A user-definable string used to identify a #datasetField. Maps directly to field name used by Solr. "- (from DatasetFieldType.java) The internal DDI-like name, no spaces, etc.
- (from Solr) Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other field names will not have first class support from all components and back compatibility is not guaranteed. Names with both leading and trailing underscores (e.g. _version_) are reserved.
- Must not collide with a field of the same same name in another #metadataBlock definition or any name already included as a field in the Solr index."
title Acts as a brief label for display related to this #datasetField. Should be relatively brief.
description Used to provide a description of the field. Free text
watermark A string to initially display in a field as a prompt for what the user should enter. Free text
fieldType "Defines the type of content that the field, if not empty, is meant to contain." "- none
- date
- email
- text
- textbox
- url
- int
- float
- See below for fieldtype definitions"
displayOrder "Controls the sequence in which the fields are displayed, both for input and presentation." Non-negative integer.
displayFormat "Controls how the content is displayed for presentation (not entry). The value of this field may contain one or more special variables (enumerated below). HTML tags, likely in conjunction with one or more of these values, may be used to control the display of content in the web UI." See below for displayFormat variables
advancedSearchField Specify whether this field is available in advanced search. TRUE (available) or FALSE (not available)
allowControlledVocabulary Specify whether the possible values of this field are determined by values in the #controlledVocabulary section. TRUE (controlled) or FALSE (not controlled)
allowmultiples Specify whether this field is repeatable. TRUE (repeatable) or FALSE (not repeatable)
facetable "Specify whether the field is facetable (i.e., if the expected values for this field are themselves useful search terms for this field). If a field is “facetable” (able to be faceted on), it appears under “Browse/Search Facets” when you edit “General Information” for a Dataverse collection. Setting this value to TRUE generally makes sense for enumerated or controlled vocabulary fields, fields representing identifiers (IDs, names, email addresses), and other fields that are likely to share values across entries. It is less likely to make sense for fields containing descriptions, floating point numbers, and other values that are likely to be unique." TRUE (controlled) or FALSE (not controlled)
displayoncreate [5]_ "Designate fields that should display during the creation of a new dataset, even before the dataset is saved. Fields not so designated will not be displayed until the dataset has been saved." TRUE (display during creation) or FALSE (don't display during creation)
required "For primitive fields, specify whether or not the field is required.

For compound fields, also specify if one or more subfields are required or conditionally required. At least one instance of a required field must be present. More than one instance of a field may be allowed, depending on the value of allowmultiples.B15" "For primitive fields, TRUE (required) or FALSE (optional).

For compound fields:

- To make one or more subfields optional, the parent field and subfield(s) must be FALSE (optional).
- To make one or more subfields required, the parent field and the required subfield(s) must be TRUE (required).
- To make one or more subfields conditionally required, make the parent field FALSE (optional) and make TRUE (required) any subfield or subfields that are required if any other subfields are filled.
"
parent "For subfields, specify the name of the parent or containing field." "- Must not result in a cyclical reference.
- Must reference an existing field in the same #metadataBlock. "
metadatablock_id Specify the name of the #metadataBlock that contains this field. "- Must reference an existing #metadataBlock.
- As a best practice, the value should reference the #metadataBlock in the current definition (it is technically possible to reference another existing metadata block.)"
termURI "Specify a global URI identifying this term in an external community vocabulary.

This value overrides the default (created by appending the property name to the blockURI defined for the #metadataBlock)" "For example, the existing citation #metadataBlock defines the property named 'title' as http://purl.org/dc/terms/title - i.e. indicating that it can be interpreted as the Dublin Core term 'title'"
18 changes: 18 additions & 0 deletions doc/sphinx-guides/source/_static/admin/displayFormatVariables.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Variable Description
(blank) "The displayFormat is left blank for primitive fields (e.g. subtitle) and fields that do not take values (e.g. author), since displayFormats do not work for these fields."
#VALUE The value of the field (instance level).
#NAME The name of the field (class level).
#EMAIL For displaying emails.
"<a href=""#VALUE"" >#VALUE</a>" For displaying the value as a link (if the value entered is a link).
"<a href=""/#VALUE"" >#VALUE</a>" "For displaying the value as a link, with the value included in the URL (e.g. if URL is \http://emsearch.rutgers.edu/atlas/#VALUE_summary.html, and the value entered is 1001, the field is displayed as `1001 <http://emsearch.rutgers.edu/atlas/1001_summary.html>`__ (hyperlinked to http://emsearch.rutgers.edu/atlas/1001_summary.html))."
"<img src=""#VALUE"" alt=""#NAME"" class=""metadata-logo""/><br/>" For displaying the image of an entered image URL (used to display images in the producer and distributor logos metadata fields).
"#VALUE:

#VALUE:

(#VALUE)" "Appends and/or prepends characters to the value of the field. e.g. if the displayFormat for the distributorAffiliation is (#VALUE) (wrapped with parens) and the value entered is University of North Carolina, the field is displayed in the UI as (University of North Carolina)."
";

:

," "Displays the character (e.g. semicolon, comma) between the values of fields within compound fields. For example, if the displayFormat for the compound field ""series"" is a colon, and if the value entered for seriesName is IMPs and for seriesInformation is A collection of NMR data, the compound field is displayed in the UI as IMPs: A collection of NMR data."
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Fieldtype Definition
none "Used for compound fields, in which case the parent field would have no value and display no data entry control."
date "A date, expressed in one of three resolutions of the form YYYY-MM-DD, YYYY-MM, or YYYY."
email A valid email address. Not indexed for privacy reasons.
text Any text other than newlines may be entered into this field.
textbox "Any text may be entered. For input, the Dataverse Software presents a multi-line area that accepts newlines. While any HTML is permitted, only a subset of HTML tags will be rendered in the UI. See the :ref:`supported-html-fields` section of the Dataset + File Management page in the User Guide."
url "If not empty, field must contain a valid URL."
int An integer value destined for a numeric field.
float A floating point number destined for a numeric field.
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Property Purpose Allowed values and restrictions
name A user-definable string used to identify a #metadataBlock "- No spaces or punctuation, except underscore
- By convention, should start with a letter, and use lower camel case [3]_
- Must not collide with a field of the same name in the same or any other #datasetField definition, including metadata blocks defined elsewhere [4]_"
dataverseAlias "If specified, this metadata block will be available only to the Dataverse collection designated here by its alias and to children of that Dataverse collection." "Free text. For an example, see ``/scripts/api/data/metadatablocks/custom_hbgdki.tsv``."
displayName Acts as a brief label for display related to this #metadataBlock. "Should be relatively brief. The limit is 256 characters, but very long names might cause display problems."
blockURI Associates the properties in a block with an external URI. Properties will be assigned the global identifier blockURI<name> in the OAI_ORE metadata and archival Bags The citation #metadataBlock has the blockURI https://dataverse.org/schema/citation/ which assigns a default global URI to terms such as https://dataverse.org/schema/citation/subtitle
6 changes: 6 additions & 0 deletions doc/sphinx-guides/source/_static/api/add-storage-site.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"hostname": "dataverse.librascholar.edu",
"name": "LibraScholar, USA",
"primaryStorage": true,
"transferProtocols": "rsync,posix,globus"
}
Loading