feat(privacy): wire per-PR private-leak gate via trusted workflow_run#3452
Conversation
… PAT-scoped resolver Replace the pull_request event reader in main() with a workflow_run payload reader that resolves PR identity via the GitHub API (pull_requests[] with a head-SHA fallback), validates base repo/branch and head-SHA match, and requires exactly one PR to survive validation or fails closed. Wire FRO_BOT_POLL_PAT into the node-id resolver (previously unwired) and isolate it from the diff subprocess, mirroring the promotion path. The --promotion entrypoint is unchanged.
… topology Add a no-op pull_request sentinel that fires a privileged workflow_run job. The privileged job runs the default-branch definition, checks out main only (never PR head), installs with no restored cache, and scopes FRO_BOT_POLL_PAT to the scan step. It posts a security/check-private-leak commit status to the PR head SHA, fail-closed if the status POST fails. github.token handles diff/status; the PAT resolves private names only. Required-check registration is deferred to a fork-head attachment proof.
…able head SHA Align the per-PR gate's access-lost handling with promotion mode: an unresolvable private node_id (deleted or token-cannot-see) now blocks rather than skips, so a mis-scoped PAT can no longer empty the token set and pass everything. Fetch the scanned diff from the GitHub compare API pinned to the workflow_run head SHA the status posts to, closing the force-push window between scan and status. Narrow gh API JSON instead of asserting its shape.
fro-bot
left a comment
There was a problem hiding this comment.
The topology here is sound: the sentinel fires a no-op pull_request job, workflow_run runs the default-branch definition so the broad-scope PAT never touches PR-author code, the PR head is never checked out, the cache is never restored, and the resolver subprocess gets the PAT under a scrubbed env while github.token handles diff/status. The credential split is real, not theater — and the tests prove it: capturedEnvs asserts the compare-API call never carries FRO_BOT_POLL_PAT. 83 tests green, covering the fail-closed matrix (access-lost, missing node_id, ambiguous PR resolution, base-repo/branch mismatch, head-SHA mismatch) and the force-push TOCTOU regression lock.
The strongest move is pinning the diff to the immutable workflow_run.head_sha via the compare API instead of the live PR head. That closes the force-push desync between the scanned diff and the status target — a page that gets force-pushed out still can't slip past the SHA the status lands on. The regression test at check-private-leak.test.ts:2306 nails this exact ghost.
Verdict: CONDITIONAL
Blocking issues
None.
Non-blocking concerns
-
Doc/comment drift misdescribes the diff mechanism — and the lie is security-relevant. Two places claim the diff is fetched via
gh pr diff, but the code uses the compare API pinned to the immutable SHA (check-private-leak.ts:417-428,repos/{owner}/{repo}/compare/main...${headSha}):.github/workflows/check-private-leak.yaml:10—"via the GitHub API (\gh pr diff`), never from a checked-out tree."`metadata/README.md:98—"The PR diff is fetched as pure data via \gh pr diff` using the default `GITHUB_TOKEN`"`
This isn't a cosmetic nit.
gh pr diffresolves the live PR head — the mutable target the pinned-SHA design exists to avoid. A future maintainer who trusts the comment and "simplifies" the compare call intogh pr diffwould silently reintroduce the force-push TOCTOU thatcheck-private-leak.test.ts:2306was written to lock out. Fix: replace bothgh pr diffreferences with the compare-API/pinned-SHA description. Remediation impact: prevents a security regression disguised as a cleanup. -
fetchDiffForShatruncation risk on large diffs (verify, don't assume). The compare API caps at 300 files and 3000-line patches per the documented limits; beyond that the.diffmedia type returns a truncated body. A PR large enough to exceed that could carry a private name in the truncated tail and pass. I can't confirm from the diff whether any guard exists. Evidence missing: behavior on a >300-file PR. If unhandled, fail-closed when the response signals truncation (the JSONfiles/total_commitsshape exposes it; the raw.diffdoes not). Defer if you judge a 300-file metadata/wiki PR implausible, but state it.
Missing tests
None. Coverage is the most complete surface in this PR — happy path, every fail-closed branch, redaction, PAT-isolation env capture, and the TOCTOU regression lock are all present.
Risk assessment: LOW
The code is fail-closed by construction, the credential isolation is test-enforced, actions are full-SHA-pinned (actions/checkout@de0fac2e..., jdx/mise-action@dba19683...), permissions are minimal (contents: read, statuses: write, pull-requests: read), the sentinel runs permissions: {}, and there is no untrusted interpolation into any run: block (head_sha is hex; title/branch never reach shell). The residual risk is entirely in the deferred required-check registration for fork heads — which you've correctly flagged as the open empirical question. The doc drift is the one thing worth correcting before this becomes load-bearing folklore.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/.github |
| Run ID | 27001427088 |
| Cache | hit |
| Session | ses_169566a18ffeSHoLzS0k3zN3sR |
…diff-fetch docs Detect compare-API truncation (300-file cap or an omitted per-file patch) and fail closed rather than scanning a partial diff that could hide a private name in the truncated tail. Correct two comments that described the diff fetch as gh pr diff; the gate fetches the compare API pinned to the immutable head SHA, and the stale wording risked a maintainer reintroducing the force-push race.
fro-bot
left a comment
There was a problem hiding this comment.
Both concerns from the prior review are addressed.
Doc drift: The gh pr diff references in .github/workflows/check-private-leak.yaml:10 and metadata/README.md:98 are corrected to describe the compare-API/pinned-SHA mechanism accurately.
Truncation gap: assertCompareNotTruncated (exported, tested) fires on two signals — files.length >= 300 and any file missing a patch field — and throws so main()'s try/catch routes to process.exit(1). The check runs as step 1 of fetchDiffForSha before the raw diff fetch, meaning a truncated response never reaches the scanner. Twelve new tests cover both signals plus the happy path. Filename is intentionally withheld from the truncation error message (a knowledge/wiki/repos/ path could embed a private slug) — that's the right call.
95/95 tests pass.
Verdict: PASS
Blocking issues
None.
Non-blocking concerns
None.
Missing tests
None.
Risk assessment: LOW
The two open items from the prior review are closed. The truncation guard is fail-closed, exported, and covered at the boundary. The stale doc was corrected before it could become load-bearing folklore. Required-check registration for fork heads remains correctly deferred — that's the only open question, and it's documented as such.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/.github |
| Run ID | 27003037822 |
| Cache | hit |
| Session | ses_169566a18ffeSHoLzS0k3zN3sR |
Wires the private-leak scan as a per-PR gate using a trusted
workflow_runtopology, so the broad-scope token that resolves private repo names never executes PR-author code.Why
The scan already gates the
data→mainpromotion path. This adds the per-PR layer: a PR tomainthat introduces a private repository's name in its diff is now caught before merge. The first attempt at this ran the scan in apull_requestjob that checked out PR-author code with the resolution token in scope — a token-exfiltration vector on a public repo. That wiring was removed; this rebuilds it safely.How
pull_requestsentinel fires a privilegedworkflow_runjob. Becauseworkflow_runalways runs the default-branch workflow definition, the privileged job never executes PR-author code.mainonly (never the PR head), installs with no restored cache, and scopes the resolution token to the scan step alone.github.tokenhandles the diff and the commit status; the resolution token is confined to the name-resolver subprocess.workflow_runpayload (with a head-SHA API fallback), validates the base repo/branch and that exactly one PR matches the scanned SHA, fetches the diff pinned to that immutable SHA, and posts asecurity/check-private-leakcommit status to it.[allow-private-leak]operator override (title prefix, permitted login) still applies.Required-check registration is deferred until the commit status is proven to attach and block for fork PR heads.
Closes #3407