Skip to content

W3C-standard HTML annotator with inline JSON-LD storage #843

@waleedkadous

Description

@waleedkadous

Summary

Add a W3C Web Annotation-compliant annotator for HTML files, complementing the existing line-based annotator (packages/codev/templates/open.html). Annotations are anchored to text ranges (not line numbers) and stored inline in the HTML file as embedded JSON-LD, so the file remains a single source of truth — no sidecar annotation server required.

Motivation

The current annotator is line-based and stores comments as inline source comments (<!-- REVIEW: ... --> for HTML/MD, // REVIEW: for JS, etc.). This works well for code review but is awkward for prose-style HTML content where annotations need to attach to arbitrary text ranges, not whole lines.

W3C Web Annotation Data Model provides the right primitive for this:

  • TextQuoteSelector with prefix/exact/suffix tolerates minor drift
  • TextPositionSelector as a fallback for disambiguation
  • JSON-LD is the canonical serialization

Storing the annotation set inline (in a <script type=\"application/ld+json\"> block at the end of <body>) keeps the file self-contained and editable in any text editor — same ergonomic win as the existing inline-comment approach, just with W3C-standard data structures.

Goals

  1. Render HTML files in the existing sandboxed iframe with annotated ranges highlighted (<mark> overlays applied from JSON-LD on load).
  2. Capture user selections in the iframe via window.getSelection() and convert ranges to W3C TextQuoteSelector + TextPositionSelector.
  3. Persist annotations to the embedded <script type=\"application/ld+json\" id=\"annotations\"> block in the HTML file, using the same write path as the current annotator.
  4. Re-anchor annotations on load, tolerating minor edits to the underlying HTML. Use Apache Annotator (@apache-annotator/dom) for matching — do not roll our own.
  5. Reuse UI: the annotations panel, comment dialog, triple-enter-to-submit, etc., from open.html should carry over largely unchanged. Only the anchoring layer (line → text-range) and the storage layer (inline comment → embedded JSON-LD) change.

Non-goals

  • An annotation server / sync protocol. This is purely a file-based annotator.
  • Annotating non-HTML files with this new model. The existing line-based annotator keeps owning MD/code files.
  • Cross-document annotation graphs.
  • Author identity / multi-user merge resolution beyond what the W3C model implies for free.

End-to-end usability check

Before approving, a human must be able to:

  1. Open an HTML file via afx open path/to/file.html
  2. Select a text range in the rendered iframe
  3. Enter a comment
  4. Save — verify the JSON-LD <script> block now contains the annotation
  5. Close and reopen the file — the annotation is re-anchored and highlighted
  6. Edit the surrounding HTML in any text editor and reopen — the annotation either re-anchors (minor edit) or is reported as orphaned (major edit)

This is the headline path. Unit tests alone are insufficient — verify by hand before tagging.

Open questions for the spec phase

  • File-extension trigger: is this the default for .html in afx open, or opt-in via a flag / file-extension list? (The current annotator already handles .html in line mode.)
  • How are orphaned annotations surfaced when re-anchoring fails?
  • Should the JSON-LD block include author info (pull from git config user.name like the current annotator presumably does)?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions