Skip to content

Add a docs-review Claude skill to enforce style guide and documentation quality #4420

@andygrove

Description

@andygrove

Motivation

Comet's docs are growing — user guide, contributor guide, compatibility tables, operator references, plus per-release changelogs. Quality varies, terminology drifts, and once the nomenclature style guide from #4419 lands, we need a way to keep new and existing docs aligned with it. PR reviewers shouldn't have to be the last line of defense for prose quality, and contributors shouldn't have to internalize a multi-page style guide before opening a docs PR.

A sibling skill to review-comet-pr, focused on documentation, would let reviewers (and contributors during self-review) get a docs-expert pass over a PR or an existing doc. The skill produces advisory feedback — it does not edit files or post comments directly, mirroring review-comet-pr.

What the skill should check

Style guide adherence (post #4419)

  • Bare "native" used as a vague adjective (e.g. "runs natively", "the native path") — flag and suggest the specific axis: Rust-implemented, Arrow-native, Comet pipeline.
  • Permitted compounds (native Rust, Arrow-native, native shuffle in shuffle-pair context) recognized and not flagged.
  • "Vectorized" used as a synonym for columnar — flag.
  • "Falls back to Spark" / "Spark fallback" used consistently for the same concept.
  • Operator names match plan output exactly (e.g. CometProject, not CometProjectExec or Comet Project).

Information architecture

  • Audience is identifiable from the first paragraph — user, contributor, or operator.
  • Heading hierarchy is correct (no jumps from H2 to H4, no multiple H1s).
  • Tables used for structured comparisons (operators, configs, support matrices) rather than bulleted prose.
  • Long pages have a brief overview / table of contents at the top.

Completeness

  • New user-guide pages have: overview, prerequisites or version applicability, at least one runnable example, links to related docs.
  • New contributor-guide pages have: scope/audience, prerequisites, the procedure, how to verify it worked.
  • New configs in prose are also added to configs.md with key, default, type, version added.
  • New operators in prose are also added to the operator reference in understanding-comet-plans.md and operators.md.
  • New expressions are reflected in spark_expressions_support.md.

Accuracy

  • Config keys mentioned exist in code (grep spark.comet.* against the repo).
  • Operator names mentioned exist as classes.
  • Spark version claims ("Spark 3.4+", "Spark 4.0 only") match the version profiles the surrounding code lives in.
  • Code samples are syntactically valid for the language they claim (Scala / Python / SQL / shell).
  • Links resolve (relative paths exist; external links return 200).

Voice and tone

  • Present tense, active voice, second person where appropriate.
  • No marketing fluff ("blazing fast", "seamlessly", "powerful").
  • No future-tense aspirations in user-facing docs ("will support", "is planned to") — those belong in the roadmap, not the user guide.
  • No stale TODOs or XXX markers in user-visible files.

Apache project hygiene

  • New .md files have the ASF license header.
  • No copyrighted third-party content without attribution.
  • Diagrams/images referenced exist in the repo.

Comet-specific conventions

  • User-guide content lives under docs/source/user-guide/latest/, contributor-guide content under docs/source/contributor-guide/. Flag misplacements.
  • Spark version-specific content notes the version profile clearly.
  • Plan-output examples use real plan formatting (tree-form with +- connectors), not invented prose.
  • Cross-references use relative links within docs/source/.

Scope

The skill should support two modes:

  1. PR review mode — given a PR number or branch, surface docs changes (and prose changes in code comments / config descriptions / Scaladoc) and audit them.
  2. Standalone audit mode — given a file or directory under docs/source/, audit it as-is.

In both modes the skill returns a prioritized list (must-fix / should-fix / nit) suitable for a reviewer to paste or paraphrase into review comments.

Output format

Mirror review-comet-pr: produce structured advisory feedback for the human, not direct edits. Categorize findings by severity. For each finding:

  • File and line reference.
  • The rule that was violated (link to the style guide section).
  • A concrete suggested rewrite when one is obvious.

The skill must not edit files, post PR comments, or push branches.

Open questions

  • Should the skill enforce stricter rules on user-guide content (lower tolerance for jargon, mandatory examples) than contributor-guide content (where domain-specific vocabulary is fine)?
  • Should the skill check generated docs (e.g. expression support tables generated by make) or only hand-written prose?
  • Auto-detection of the style guide source — pin to a specific doc path (e.g. docs/source/contributor-guide/style_guide.md once Establish nomenclature style guide and audit operator names for clarity #4419 lands), or read the latest committed version on each invocation?
  • Is there value in a separate audit-comet-docs skill for full-site audits versus a single skill that handles both PR review and standalone audit modes?

Dependencies

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions