Skip to content

docs(knowledge): record the wiki-presence provenance boundary as an accepted residual#3451

Merged
marcusrbrown merged 1 commit into
mainfrom
docs/wiki-provenance-boundary
Jun 5, 2026
Merged

docs(knowledge): record the wiki-presence provenance boundary as an accepted residual#3451
marcusrbrown merged 1 commit into
mainfrom
docs/wiki-provenance-boundary

Conversation

@marcusrbrown

Copy link
Copy Markdown
Collaborator

Records the wiki-presence gate's provenance boundary as an explicit, accepted residual in knowledge/schema.md.

The promotion gate's attribution check confirms a slug-matching wiki/repos/ page declares its public repository in structured sources[].url with an exact owner/repo match. That stops decoy-URL mis-attribution, but it trusts the page's own declared sources — it does not independently prove the body concerns that repository.

The note documents why that is safe and why nothing more is built:

  • The real provenance boundary is enforced upstream — every wiki write is authored under Fro Bot's identity, the data branch is the sole writer, and the authority guard rejects any other origin. A page can only reach the attribution check by passing through those controls, so there is no path to plant a page with forged self-declared sources.
  • Content-level provenance (cross-checking body against trusted generator metadata) is intentionally not built: it would add real machinery for a layer with no reachable attack given the upstream controls. The residual is recorded so the decision is explicit and revisitable if those controls ever weaken.

Docs-only; no code or behavior change.

Closes #3418

…ccepted residual

The promotion gate's attribution check trusts a page's self-declared structured
sources rather than independently proving the body concerns that repo. Document
why that is safe — the real provenance boundary is the upstream Fro Bot identity,
data-branch sole-writer, and authority-guard controls, so no untrusted actor can
plant a page with forged sources — and why content-level provenance is
intentionally not built (no reachable attack; the machinery is not worth it).
Recorded so the decision is explicit and revisitable if upstream controls weaken.

Closes #3418

@fro-bot fro-bot left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of residual worth carving into the schema: a security decision that looks like an omission until someone writes down why it's an omission. The note is accurate against the code it describes.

I verified the claims rather than taking them on faith:

  • scripts/check-wiki-private-presence.ts exists and does exactly what the note says. parseFrontmatterSources (L52) extracts structured sources[].url, and sourceUrlMatchesRepo (L108) enforces an exact https://github.com/{owner}/{repo} match — strict protocol, host, port, and first-two-path-segment checks. The structured-sources path (L195-207) deliberately ignores body substring, which is precisely the "trusts the page's own declared sources" boundary the note describes.
  • The "it does not independently prove the page body genuinely concerns that repository" claim is correct. The gate verifies a declared URL matches the slug; it never inspects whether the prose is actually about that repo.
  • The upstream controls cited are real: check-wiki-authority.ts exists and gates wiki writes to fro-bot/fro-bot[bot] (schema.md:88-90), and the private-presence gate is wired into merge-data.yaml. The data-sole-writer invariant holds, so the "no path to plant a page with forged self-declared sources" reasoning is sound — the attribution check is genuinely defense-in-depth, not the load-bearing wall.

The threat model is honest about its own shape: it names the gap (content-level provenance), explains why the gap is unreachable today (identity + sole-writer), and leaves a tripwire — "revisitable if the upstream controls ever weaken." That last clause is the part that matters. Residuals rot when the conditions that made them safe drift out from under them silently. This one carries its own expiry condition.

Additive, docs-only, no behavior change. Closes #3418. Approving.

Verdict: PASS

Blocking issues

None.

Non-blocking concerns

None. The note correctly scopes itself to wiki/repos/ slug attribution and does not overclaim coverage of non-repo wiki areas (consistent with schema.md:8, which already flags those as relying on the in-progress companion scan).

Missing tests

None required. Docs-only change to knowledge/schema.md, which is explicitly editable through normal PRs to main (schema.md:105). No code, no behavior, nothing to test. The gate logic this note describes is already covered by the script's own exported, testable seams (detectPrivateWikiLeaks, sourceUrlMatchesRepo, runCli).

Risk assessment: LOW

Single-file, +8/-0, prose-only. No workflow, dependency, or permissions surface touched. Worst case if the note were wrong is a misleading comment — but it isn't wrong; it tracks the implementation.


Run Summary
Field Value
Event pull_request
Repository fro-bot/.github
Run ID 26996674533
Cache hit
Session ses_169d01d9affeBBMfmzLeJenSaQ

@marcusrbrown marcusrbrown merged commit bb5ae03 into main Jun 5, 2026
12 checks passed
@marcusrbrown marcusrbrown deleted the docs/wiki-provenance-boundary branch June 5, 2026 05:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wiki-presence gate trusts page-declared sources (provenance boundary)

2 participants