ci: add doc-validation hooks (markdownlint, typos, placeholder linter, lychee)#18
Merged
Merged
Conversation
…, lychee) Adds a minimal doc-validation layer to pre-commit + a lychee link-check workflow. Catches the bug classes review currently has to find by eye: - markdownlint-cli2 with a tight config (MD051 broken anchors, MD053 dangling link refs); style rules off so the diff stays small - typos with project-term allowlist in .typos.toml (CNA, Vulnogram, ponymail, mis-, Nd, pre-empted) - tools/dev/check-placeholders.sh refuses hardcoded apache/airflow / Apache Airflow inside .claude/skills/ and tools/*.md (PR #1 already had to scrub these once) - lychee runs in a separate workflow on PR + daily cron; informational only today (continue-on-error: true) because the existing tree has 24 pre-existing broken refs to files that have not landed yet (config/, projects/airflow/, the issue-template YAML); flips to a hard gate once the baseline reaches zero Wiring this up surfaced five real broken anchors I fixed along the way: - AGENTS.md missing "the project's" prefix in #point-reporters-to-the-security-model-dont-re-explain-it - tools/ponymail/operations.md anchors #get-email and #get-thread pointed at headings that are actually "Get an email" / "Get a thread" - projects/_template/scope-labels.md and tools/github/issue-template.md carried headings with a literal "→" that GitHub URL-encodes into unresolvable slugs; renamed to "to" and re-ran doctoc Signed-off-by: André Ahlert <andre@aex.partners>
d8540ff to
53fa19e
Compare
…on SHA PR apache#18's first CI run failed three checks; this fixup commit addresses all three: - prek / markdownlint — markdownlint-cli2 v0.22.1 requires Node ≥ 20 (its string-width dep uses the regex `/v` flag), but `default_language_version.node` was pinned to 18.6.0. Bumped to 22.11.0 (current active LTS). - asf-allowlist-check — `lycheeverse/lychee-action@82202e5e…` (v2.6.1) is not on the ASF infrastructure-actions allowlist. Re-pinned to the allowlisted v2.8.0 SHA `8646ba30535128ac92d33dfc9133794bfdd9b411`. Comment in the workflow now explains the allowlist requirement so future bumps go through the same check. - zizmor (ref-version-mismatch) — the v2.6.1-comment + actual-SHA combination from the original pin was flagged because the SHA pointed to v2.4.1, not v2.6.1. The new v2.8.0 SHA correctly maps to its tag, so the warning disappears with the same change. prek + zizmor verified clean locally. Generated-by: Claude Code (Opus 4.7)
potiuk
approved these changes
May 1, 2026
Member
|
Fixed all failures. Merging. Yep. Improvements welcome @andreahlert :) |
4 tasks
potiuk
added a commit
that referenced
this pull request
May 4, 2026
…sedes #30) (#44) * docs: enable markdownlint MD040 and tag all fenced code blocks Follow-up to #18. Flips MD040 from `false` to `true` and tags the 64 previously untagged fences across the tree. Most fences ended up `text` (MCP call sketches, URL examples, dir trees, plain output, commit trailers). 3 got `html` for HTML-comment idempotency markers and the <details> envelope, 3 `markdown` for the AI-disclosure block and rollup body samples, 1 `yaml` for the subagent return block in sync-security-issue. One nested-fence case in allocate-cve/SKILL.md needed the outer fence promoted from 3 to 4 backticks so the inner 3-backtick block renders as an actual nested code block instead of breaking the outer one. Signed-off-by: André Ahlert <andre@aex.partners> * docs: extend MD040 tagging to skills added since #30 #30 was opened against an earlier tree state; the pr-management skill family (lifted in #33, renamed to type-what-action in #35) added 9 new skill supporting files with 20 untagged fences that fail markdownlint MD040 once it's enabled. This commit applies the same tagging convention #30 established for the security family + tools to the new pr-management files: pr-management-triage/fetch-and-batch.md 1 fence pr-management-triage/comment-templates.md 2 fences → markdown pr-management-triage/interaction-loop.md 4 fences → text (UI mockups) pr-management-triage/workflow-approval.md 2 fences → text (UI mockups) pr-management-stats/fetch.md 3 fences → text (search queries) pr-management-stats/render.md 3 fences → text (output samples) pr-management-stats/classify.md 3 fences → text (pseudocode) pr-management-code-review/review-flow.md 1 fence → text (CLI mockup) pr-management-code-review/prerequisites.md 1 fence → text (HTTP error) Most fences ended up `text` (the same catch-all #30's commit message used for "MCP call sketches, URL examples, dir trees, plain output"). Two `markdown` fences in `pr-management-triage/comment-templates.md` because the content is a markdown link / list item example that GitHub should render as markdown. prek run --all-files clean. MD040 reports zero violations across the tree. Generated-by: Claude Code (Claude Opus 4.7) --------- Signed-off-by: André Ahlert <andre@aex.partners> Co-authored-by: André Ahlert <andre@aex.partners>
potiuk
pushed a commit
that referenced
this pull request
Jul 3, 2026
* feat(validator): add branch-name confidentiality check (#18, SOFT advisory) Adds check #17 to skill-and-tool-validator: scans git checkout -b and git switch -c examples inside fenced code blocks (across skills/ and docs/) and flags any concrete branch name that contains an embargo-breaking term — CVE IDs (CVE-YYYY-NNNNN), security, vulnerability/vuln, or advisory. Pre-disclosure public branch names must not reveal embargo context; neutral descriptive slugs are the safe alternative. Lines explicitly marked as bad examples (**bad**, bad:) are exempt, and placeholder branch names (<fix-slug>, $VAR) are silently skipped. The check is SOFT-advisory only (never blocks the run). 14 unit tests cover CVE IDs, security framing, vuln/advisory terms, placeholder exemptions, neutral names, and bad-example exemptions. The full codebase currently produces zero new violations. Generated-by: Claude (Opus 4.7) * fix for tool directories * change regular expression
potiuk
pushed a commit
to justinmclean/airflow-steward
that referenced
this pull request
Jul 3, 2026
…OFT advisory) Reads the Axis 1 (skill) and Axis 2 (tool) capability vocabulary tables from docs/labels-and-capabilities.md and verifies every taxonomy entry appears in at least one mapping-table row; entries marked *(reserved)* or *(future)* are exempt. Cross-checks the SKILL_CAPABILITIES / TOOL_CAPABILITIES code constants against the parsed vocabulary. The reserved/future marker accepts an elaborated parenthetical (e.g. *(future work)*, *(reserved for #999)*), not only the exact forms. Co-authored-by: Justin McLean <justin@classsoftware.com>
potiuk
pushed a commit
to justinmclean/airflow-steward
that referenced
this pull request
Jul 3, 2026
…SOFT advisory) contract:mail-source and contract:mail-archive adapter READMEs must declare that fetched mail content is external data (not instructions) and mention the prompt-injection risk in embedded mail content. Both are SOFT advisories. Co-authored-by: Justin McLean <justin@classsoftware.com>
potiuk
pushed a commit
to justinmclean/airflow-steward
that referenced
this pull request
Jul 3, 2026
…OFT advisory) Reads the Axis 1 (skill) and Axis 2 (tool) capability vocabulary tables from docs/labels-and-capabilities.md and verifies every taxonomy entry appears in at least one mapping-table row; entries marked *(reserved)* or *(future)* are exempt. Cross-checks the SKILL_CAPABILITIES / TOOL_CAPABILITIES code constants against the parsed vocabulary. The reserved/future marker accepts an elaborated parenthetical (e.g. *(future work)*, *(reserved for #999)*), not only the exact forms. Co-authored-by: Justin McLean <justin@classsoftware.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds doc validation hooks to pre-commit and a lychee link-check workflow. Catches the kind of bugs review currently has to find by eye.
The hooks:
markdownlint-cli2with a tight config (MD051for broken anchors,MD053for dangling link refs). Style rules off so the diff stays small and existing prose isn't churned.typoswith project terms allowlisted in.typos.toml(CNA,Vulnogram,ponymail,mis-,Nd,pre-empted).tools/dev/check-placeholders.sh, a small bash linter that refuses hardcodedapache/airflow/Apache Airflowinside.claude/skills/andtools/*.md. PR docs: tighten Airflow references to placeholders across framework files #1 already had to scrub these once. Cheaper to prevent the regression than redo the cleanup.Lychee runs in a separate workflow on PRs and on a daily cron. Marked
continue-on-error: truefor now because the tree has 24 pre-existing broken refs to files that haven't landed yet (config/,projects/airflow/, the issue-template YAML). When the baseline hits zero we flip it to a hard gate.Wiring this up surfaced five real broken anchors I fixed along the way:
AGENTS.mdlink to#point-reporters-to-the-security-model-dont-re-explain-itwas missing the "project's" prefix the actual heading uses.tools/ponymail/operations.md(#get-email,#get-thread) pointed at headings that are actually## Get an emailand## Get a thread.→inprojects/_template/scope-labels.mdandtools/github/issue-template.mdproduce URL-encoded slugs that GitHub doesn't resolve. Renamed totoand re-ran doctoc.Files
Local run
Or one at a time:
Test plan
npx markdownlint-cli2 "**/*.md"clean.typosclean.tools/dev/check-placeholders.shclean.lychee --offline .reports the 24 pre-existing breakages, no new ones from this PR.prek run --all-filesgreen.Out of scope
@potiuk, thinking about picking up a couple more fronts after this one if it lands well. Nothing huge, just smoothing edges:
MD040(fenced code language tags). 62 untagged fences in the tree, mechanical fix but it would balloon the diff, better as its own PR.MD038(no-space-in-code). Most hits are intentional literal markdown samples like` # `,` ### `,` - `. Re-enabling means escaping those consistently, again its own PR.Happy to do any or none of those, whatever fits the direction you have in mind.
Was generative AI tooling used to co-author this PR?