build: add Travis job to check for dead URL links#27267
build: add Travis job to check for dead URL links#27267richardlau wants to merge 4 commits intonodejs:masterfrom
Conversation
.travis.yml
Outdated
There was a problem hiding this comment.
Maybe just scanning all.html is a good idea? Or does that just not make a difference?
There was a problem hiding this comment.
Can we make it only run when the docs are touched? (Like a make target that depends on docs?)
There was a problem hiding this comment.
Can we use the same mechanism that add doc label to PR ?
There was a problem hiding this comment.
Added a change that greps the PR's patch diff for *.md files in doc/api and only runs the scan if grep returns something.
For local files linkinator works on a directory level so to scan, say, just all.html would involve moving files into a temporary directory. If someone has time maybe they can run linkinator on just all.html and see how many links it scans.
|
Looks like some are false positive: https://travis-ci.com/nodejs/node/jobs/193473667#L688. |
Like I said in the OP:
e.g. https://travis-ci.com/nodejs/node/jobs/193473667#L688 [404] https://chromedevtools.github.io/devtools-protocol/v8/Debugger-bash-4.2$ curl --head https://chromedevtools.github.io/devtools-protocol/v8/Debugger
HTTP/1.1 404 Not Found
Server: GitHub.com
Content-Type: text/html; charset=utf-8
ETag: "5cb52d6a-1f0"
Access-Control-Allow-Origin: *
X-GitHub-Request-Id: 2612:7F6A:221DC7:2D2A1F:5CB6B0D2
Content-Length: 496
Accept-Ranges: bytes
Date: Wed, 17 Apr 2019 04:51:31 GMT
Via: 1.1 varnish
Age: 0
Connection: keep-alive
X-Served-By: cache-mdw17362-MDW
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1555476691.380921,VS0,VE29
Vary: Accept-Encoding
X-Fastly-Request-ID: 8ced15e319228e1c9c4666e404a8a66c20f7d9d8
-bash-4.2$-bash-4.2$ curl https://chromedevtools.github.io/devtools-protocol/v8/Debugger
<!DOCTYPE html><meta charset="utf-8"><title>Chrome DevTools Protocol Viewer (redirecting)</title><script>// Copyright (c) 2016 Rafael Pedicini, licensed under the MIT License
var segmentCount=1,l=window.location;l.replace(l.protocol+'//'+l.hostname+(l.port?':'+l.port:'')+l.pathname.split('/').slice(0,1+segmentCount).join('/')+'/?p=/'+l.pathname.slice(1).split('/').slice(segmentCount).join('/').replace(/&/g,'~and~')+(l.search?'&q='+l.search.slice(1).replace(/&/g,'~and~'):'')+l.hash);</script>-bash-4.2$It seems very weird for a 404 Not Found response page to be redirecting -- There are specific HTTP response codes for redirection. |
|
3022b98 to
2c15115
Compare
| - npm install -g linkinator | ||
| script: | ||
| - if [ "${TRAVIS_PULL_REQUEST}" != "false" ]; then | ||
| DOC_FILES=`curl -sL https://github.com/nodejs/node/pull/${TRAVIS_PULL_REQUEST}.patch | grep -o '\bdoc/api/.*\.md\b'`; |
There was a problem hiding this comment.
| DOC_FILES=`curl -sL https://github.com/nodejs/node/pull/${TRAVIS_PULL_REQUEST}.patch | grep -o '\bdoc/api/.*\.md\b'`; | |
| DOC_FILES=`git diff --name-only HEAD...$TRAVIS_BRANCH | grep -o '\bdoc/api/.*\.md\b'`; |
Refs: https://stackoverflow.com/questions/41145041/list-files-modified-in-a-pull-request-within-travis
There was a problem hiding this comment.
I think we had trouble with commit message linting before with $TRAVIS_BRANCH which is why we ended up downloading the patch files but I forget the details.
| - make doc-only | ||
| - npm install -g linkinator | ||
| script: | ||
| - if [ "${TRAVIS_PULL_REQUEST}" != "false" ]; then |
|
Nice idea, but why not run it in our CI? (I'm thinking it might be a good idea anyway to move out the |
Yeah, it probably makes more sense to run it on our CI. It could be run on the nightlies rather than PR's so that way we won't be hammering remote webservers. |
Refs: #27168 (comment)
Seems easy enough to put together.
This is a proof of concept of running https://github.com/JustinBeckwith/linkinator
on our built HTML docs to find broken URL links. I don't think we want to be
running this for every PR as when I run it locally it checks 900+ links which could
be seen as a denial of service attack if we run it too often.
Also it seems to be flagging some URL's as broken which I appear to be able to
navigate to in web browser (although there does appear to be redirecting going
on). If anyone wants to look into that or adopt this PR then great as I probably
won't be spending much time on it.
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passes