Skip to content

issue 9277 basics#9458

Closed
kuhlaid wants to merge 2 commits intoIQSS:developfrom
kuhlaid:kuhlaid-sphinx-9277-2023-01-13
Closed

issue 9277 basics#9458
kuhlaid wants to merge 2 commits intoIQSS:developfrom
kuhlaid:kuhlaid-sphinx-9277-2023-01-13

Conversation

@kuhlaid
Copy link
Copy Markdown
Contributor

@kuhlaid kuhlaid commented Mar 20, 2023

What this PR does / why we need it:
This PR is an update to the documentation for a basic fix to #9277 to get the PDF to build. The HowToSphinxDockerBuild.md has instructions for the build using Docker.

Closes #9277

Special notes for your reviewer:

The following updates were performed due to the files throwing errors on the PDF builds.

\doc\sphinx-guides\source\admin\metadatacustomization.rst

  • had to remove the table and convert to CSV table, removed non-ASCII characters, and links to download files (which are not stored in the document downloads and it does not make sense to have these files downloadable for those already working with the code)
  • adding metadataBlockProperties.tsv, datasetFieldProperties.tsv, controlledVocabularyProperties.tsv, fieldTypeDefinitions.tsv, displayFormatVariables.tsv

\doc\sphinx-guides\source\api\native-api.rst

  • removed non-ASCII characters and links to download code files
  • added \docs\source_static\api\dataset-package-files.json

\doc\sphinx-guides\source\developers\big-data-support.rst

  • explictly stated commands and removed code file download
  • added /docs/source/_static/api/add-storage-site.json

\doc\sphinx-guides\source\developers\dev-environment.rst
\doc\sphinx-guides\source\developers\make-data-count.rst

  • removed code file download

\doc\sphinx-guides\source\developers\testing.rst
\doc\sphinx-guides\source\developers\troubleshooting.rst

  • removed non-ASCII characters, and links to download files

\doc\sphinx-guides\source\installation\advanced.rst

  • links to download files

\doc\sphinx-guides\source\installation\config.rst

  • removed non-ASCII characters, explictly stated commands, and links to download files

\doc\sphinx-guides\source\installation\installation-main.rst
\doc\sphinx-guides\source\installation\prerequisites.rst

  • removed non-ASCII characters and links to download files

\doc\sphinx-guides\source\admin\timers.rst

  • removed nested lists

\doc\sphinx-guides\source\developers\workflows.rst

  • removed non-ASCII characters

\doc\sphinx-guides\source\container\base-image.rst

  • tied :widths: auto for csv-table but the description would not wrap the description

w. Patrick Gale added 2 commits March 20, 2023 17:41
@pdurbin
Copy link
Copy Markdown
Member

pdurbin commented Mar 21, 2023

Wow! 577 pages of Documentation! I'm attaching the PDF I just built: Dataverse.pdf

@kuhlaid this is great. Thanks!

Quick question, do we really have to get rid of the :download: links? I'm rather fond of them.

@kuhlaid
Copy link
Copy Markdown
Contributor Author

kuhlaid commented Mar 21, 2023

@pdurbin, with regards to the :download: links, I don't believe it is worth the trouble to copy a version of code files to the sphinx-guides\source directory. The download references such as entire IPv4 and IPv6 range that you can :download: download <../_static/admin/ipGroupAll.json> seem to be fine and do not seem to cause LaTex to barf because they exist within the sphinx-guides\source directory (they are simply ignored by PDF build), but trying to reference files outside of the Sphinx documentation directory will throw errors on the PDF build.

A problem with download links is they just look bad in the PDF because they serve no function (do not allow downloading a file). So if you look at the entire IPv4 and IPv6 range that you can section of the PDF, it references download text with no functionality. It is poor form because the reader simply sees the word download where the path /_static/util/clear_timer.sh would be more useful whether you can download the file or not. Again, if users of the guide are already working with the code base, they have the files and just need a reference or path name to where a file can be found and not a download link. This makes the text look cleaner across both the HTML docs and the PDF or ePub.

I am unable to find any documentation that says LaTeX supports the :download: markdown, but I have seen references stating downloads are not fully supported by all Sphinx builders, and this is one example. Just something to keep in mind.

@kuhlaid
Copy link
Copy Markdown
Contributor Author

kuhlaid commented Mar 21, 2023

I should also note that the https://github.com/kuhlaid/dataverse/tree/kuhlaid-sphinx-9277-2023-01-13 branch does not address the code needed to insert the PDF into the HTML docs (which are mainly some extra static files). I am not crazy about this branch because it does not provide a clean documentation build. While the earlier branch I sent has many more file updates to make, it is a more complete solution and will give you cleaner documentation as a starting point (such as doing away with those annoying console errors). Another thing to think about. :)

@pdurbin
Copy link
Copy Markdown
Member

pdurbin commented Mar 22, 2023

Locally, I reverted one of the :download: links like this:

-Alternatively you can use the README_python.txtlocated at/scripts/installer/README_python.txt from this guide.
+Alternatively you can download :download:README_python.txt <../../../../scripts/installer/README_python.txt> from this guide.

Then I ran gave the build commands the root of the repo:

export ROOT_DATAVERSE_SG=/Users/pdurbin/github/iqss/dataverse
docker run --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_PDF bash -c "cd doc/sphinx-guides && make latexpdf"

This allowed the PDF to be built. Here's how that README_python.txt text looks:

Screenshot 2023-03-22 at 6 58 18 AM

README_python.txt is not clickable so I agree that it's a bit of an odd, suboptimal experience. However, my thought has been to handle this in stages.

  • Get PDF building working again and put enough CI in place (GitHub Actions and/or Jenkins) to alert us if a PR breaks the PDF building.
  • Take a look at the PDF and decide if it's good enough to go ahead and link it from the main guides, possibly with a warning that the HTML version should be considered canonical. We could even explicitly call out issues like this :download: one.
  • Work with @donsizemore who operates the Jenkins server that builds the guides (not in Docker) to see about adding HTML building.
  • Over time, investigate how to improve the PDF version and try to fix things here and there, timeboxing each effort.

When I put a size estimate of 10 (about a day's effort) on the original PR (#9306) I was thinking about the first three items above. That is, I expect the PDF to be imperfect and to have a note that the HTML version is still canonical. A good "definition of done" for me is having the PDF available, even if it has some issues.

Does this make sense? I'm happy to talk this out over chat or zoom. Thanks! ❤️

@pdurbin
Copy link
Copy Markdown
Member

pdurbin commented Mar 23, 2023

On my machine, the absolute minimum I need to get the PDF to build (using the Dockerfile provided by @kuhlaid, thank you!! ❤️ ) is to remove the deep nesting in admin/timers.rst (added in PR #7152) like this: ccdc37b

Thank you for identifying that the nesting is only in this file, @kuhlaid! I find the output confusing when it fails.

I definitely want a check, probably a GitHub Action, to make sure this deep nesting doesn't happen again (as it has again and again over the years). I'll probably seek advice from @poikilotherm or @donsizemore or @GPortas who know a lot more about GitHub actions than I do.

For completeness, in addition to the change to admin/timers.rst above, I created the following files (copied directly from this PR, thanks again @kuhlaid ):

$ cat SphinxDocBuildPDF/Dockerfile 
# get latest sphinx image with latext pdf
FROM sphinxdoc/sphinx-latexpdf:latest

RUN export DEBIAN_FRONTEND=noninteractive \
  && apt-get update && apt-get install --yes --no-install-recommends wget rsync git && \
    apt-get autoremove -y && \
    pip3 install --upgrade pip setuptools && \
    rm -r /root/.cache

WORKDIR /docs
ADD requirements.txt /docs
RUN pip3 install -r requirements.txt
$ cat SphinxDocBuildPDF/requirements.txt 
sphinx_bootstrap_theme
sphinx-icon

Then I ran these commands:

export ROOT_DATAVERSE_SG=/Users/pdurbin/github/iqss/dataverse
export DKR_DV_PDF="sddi_pdf"
cd $ROOT_DATAVERSE_SG/SphinxDocBuildPDF
docker build -t $DKR_DV_PDF .
cd $ROOT_DATAVERSE_SG
docker run --rm -v $ROOT_DATAVERSE_SG:/docs $DKR_DV_PDF bash -c "cd doc/sphinx-guides && make latexpdf"

I guess I'll go ahead and upload the PDF that came out. Why not. 587 pages!! Dataverse.pdf

@pdurbin
Copy link
Copy Markdown
Member

pdurbin commented Mar 24, 2023

I'm playing around with this...

docker run -it --rm -v $(pwd):/docs sphinxdoc/sphinx-latexpdf:latest bash -c "cd doc/sphinx-guides && pip3 install -r requirements.txt && make latexpdf"

... but I'm getting a strange error:

sphinx-build -b latex -d build/doctrees  -W source build/latex
Running Sphinx v5.3.0
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [latex]: all documents
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
processing Dataverse.tex... index user/index user/account user/find-use-data user/dataverse-management user/dataset-management user/tabulardataingest/index user/tabulardataingest/supportedformats user/tabulardataingest/ingestprocess user/tabulardataingest/spss user/tabulardataingest/stata user/tabulardataingest/rdata user/tabulardataingest/excel user/tabulardataingest/csv-tsv user/appendix admin/index admin/dashboard admin/external-tools admin/harvestclients admin/harvestserver admin/metadatacustomization admin/metadataexport admin/timers admin/make-data-count admin/integrations admin/user-administration admin/dataverses-datasets admin/solr-search-index admin/ip-groups admin/mail-groups admin/monitoring admin/reporting-tools-and-queries admin/maintenance admin/backups admin/troubleshooting api/index api/intro api/getting-started api/auth api/search api/dataaccess api/native-api api/metrics api/sword api/client-libraries api/external-tools api/curation-labels api/linkeddatanotification api/apps api/faq installation/index installation/intro installation/prep installation/prerequisites installation/installation-main installation/config installation/upgrading installation/shibboleth installation/oauth2 installation/oidc installation/external-tools installation/advanced developers/index developers/intro developers/dev-environment developers/windows developers/tips developers/troubleshooting developers/version-control developers/sql-upgrade-scripts developers/testing developers/documentation developers/security developers/dependencies developers/debugging developers/coding-style developers/configuration developers/deployment developers/containers developers/making-releases developers/tools developers/unf/index developers/unf/unf-v3 developers/unf/unf-v5 developers/unf/unf-v6 developers/make-data-count developers/remote-users developers/geospatial developers/selinux developers/big-data-support developers/aux-file-support developers/s3-direct-upload-api developers/dataset-semantic-metadata-api developers/dataset-migration-api developers/workflows developers/fontcustom container/index container/dev-usage container/base-image container/app-image style/index style/foundations style/patterns style/text 
resolving references...
done
writing... done
copying images... [100%] developers/unf/img/unf-diagram.png                               
copying TeX support files... copying TeX support files...
done
build succeeded.

The LaTeX files are in build/latex.
Run 'make' in that directory to run these through (pdf)latex
(use `make latexpdf' here to do that automatically).
Running LaTeX files through pdflatex...
make -C build/latex all-pdf
make[1]: Entering directory '/docs/doc/sphinx-guides/build/latex'
latexmk -pdf -dvi- -ps-  'Dataverse.tex'
Rc files read:
  /etc/LatexMk
  ./latexmkrc
Latexmk: This is Latexmk, John Collins, 29 September 2020, version: 4.70b.
Latexmk: applying rule 'pdflatex'...
Rule 'pdflatex': File changes, etc:
   Changed files, or newly in use since previous run(s):
      'Dataverse.tex'
------------
Run number 1 of rule 'pdflatex'
------------
------------
Running 'pdflatex   -recorder  "Dataverse.tex"'
------------
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020/Debian) (preloaded format=pdflatex)
 restricted \write18 enabled.
entering extended mode
(./Dataverse.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-01-09> xparse <2020-03-03> (./sphinxmanual.cls
Document Class: sphinxmanual 2019/12/01 v2.3.0 Document class (Sphinx manual)
(/usr/share/texlive/texmf-dist/tex/latex/base/report.cls
Document Class: report 2020/04/10 v1.4m Standard LaTeX document class
(/usr/share/texlive/texmf-dist/tex/latex/base/size10.clo)))
[snip]
(/usr/share/texlive/texmf-dist/tex/latex/oberdiek/hypcap.sty)
(./sphinxmessages.sty)
Writing index file Dataverse.idx
(/usr/share/texmf/tex/latex/tex-gyre/t1qtm.fd)
(/usr/share/texlive/texmf-dist/tex/latex/l3backend/l3backend-pdftex.def)
(./Dataverse.aux

! Package babel Error: You haven't defined the language * yet.
(babel)                Perhaps you misspelled it or your installation
(babel)                is not complete.

See the babel package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.19 \selectlanguage *
                      {english}
? 

Here's the complete output: make-pdf-fail.txt

@pdurbin
Copy link
Copy Markdown
Member

pdurbin commented Mar 24, 2023

Duh. A make clean helped. The next error:

LaTeX Warning: Hyper reference `developers/tips:avoid-efficiency-issues-with-re
nder-logic-expressions' on page 482 undefined on input line 35523.


! Package inputenc Error: Unicode character ✅ (U+2705)
(inputenc)                not set up for use with LaTeX.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.35541 ✅
            Database Connection
? 

@kuhlaid
Copy link
Copy Markdown
Contributor Author

kuhlaid commented Mar 24, 2023

Yeah, I had to remove those non-ASCII characters. The PDF build is a pain to work unless some of these other things other than just nesting are taken care of due to the laundry list of errors that need to be addressed.

@pdurbin
Copy link
Copy Markdown
Member

pdurbin commented Mar 24, 2023

@kuhlaid hi! I'm closing this pull request in favor of this one:

I realize my fix is much less ambitious than yours but in practice we've found that smaller chunks of work move more easily through the system. I hope you continue to engage and help us improve the PDF!

As I mentioned In that PR, we are now building it regularly (every time a PR is merged) at http://preview.guides.gdcc.io/_/downloads/en/develop/pdf/

I'm going to repeat this comment on your other open related PRs. Sorry! 😅

@pdurbin pdurbin closed this Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Correcting the Sphinx documentation PDF build

2 participants