Skip to content

feat(privacy): wire per-PR private-leak gate via trusted workflow_run#3452

Merged
marcusrbrown merged 4 commits into
mainfrom
feat/wire-private-leak-pr-gate
Jun 5, 2026
Merged

feat(privacy): wire per-PR private-leak gate via trusted workflow_run#3452
marcusrbrown merged 4 commits into
mainfrom
feat/wire-private-leak-pr-gate

Conversation

@marcusrbrown

Copy link
Copy Markdown
Collaborator

Wires the private-leak scan as a per-PR gate using a trusted workflow_run topology, so the broad-scope token that resolves private repo names never executes PR-author code.

Why

The scan already gates the data→main promotion path. This adds the per-PR layer: a PR to main that introduces a private repository's name in its diff is now caught before merge. The first attempt at this ran the scan in a pull_request job that checked out PR-author code with the resolution token in scope — a token-exfiltration vector on a public repo. That wiring was removed; this rebuilds it safely.

How

  • A no-op pull_request sentinel fires a privileged workflow_run job. Because workflow_run always runs the default-branch workflow definition, the privileged job never executes PR-author code.
  • The privileged job checks out main only (never the PR head), installs with no restored cache, and scopes the resolution token to the scan step alone. github.token handles the diff and the commit status; the resolution token is confined to the name-resolver subprocess.
  • The scan resolves PR identity from the workflow_run payload (with a head-SHA API fallback), validates the base repo/branch and that exactly one PR matches the scanned SHA, fetches the diff pinned to that immutable SHA, and posts a security/check-private-leak commit status to it.
  • It fails closed throughout: unresolvable identity, a private repo the token can't resolve, a scan error, or a status-post failure all block rather than pass. The [allow-private-leak] operator override (title prefix, permitted login) still applies.

Required-check registration is deferred until the commit status is proven to attach and block for fork PR heads.

Closes #3407

… PAT-scoped resolver

Replace the pull_request event reader in main() with a workflow_run payload
reader that resolves PR identity via the GitHub API (pull_requests[] with a
head-SHA fallback), validates base repo/branch and head-SHA match, and requires
exactly one PR to survive validation or fails closed. Wire FRO_BOT_POLL_PAT into
the node-id resolver (previously unwired) and isolate it from the diff subprocess,
mirroring the promotion path. The --promotion entrypoint is unchanged.
… topology

Add a no-op pull_request sentinel that fires a privileged workflow_run job. The
privileged job runs the default-branch definition, checks out main only (never PR
head), installs with no restored cache, and scopes FRO_BOT_POLL_PAT to the scan
step. It posts a security/check-private-leak commit status to the PR head SHA,
fail-closed if the status POST fails. github.token handles diff/status; the PAT
resolves private names only. Required-check registration is deferred to a fork-head
attachment proof.
…able head SHA

Align the per-PR gate's access-lost handling with promotion mode: an unresolvable
private node_id (deleted or token-cannot-see) now blocks rather than skips, so a
mis-scoped PAT can no longer empty the token set and pass everything. Fetch the
scanned diff from the GitHub compare API pinned to the workflow_run head SHA the
status posts to, closing the force-push window between scan and status. Narrow
gh API JSON instead of asserting its shape.

@fro-bot fro-bot left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topology here is sound: the sentinel fires a no-op pull_request job, workflow_run runs the default-branch definition so the broad-scope PAT never touches PR-author code, the PR head is never checked out, the cache is never restored, and the resolver subprocess gets the PAT under a scrubbed env while github.token handles diff/status. The credential split is real, not theater — and the tests prove it: capturedEnvs asserts the compare-API call never carries FRO_BOT_POLL_PAT. 83 tests green, covering the fail-closed matrix (access-lost, missing node_id, ambiguous PR resolution, base-repo/branch mismatch, head-SHA mismatch) and the force-push TOCTOU regression lock.

The strongest move is pinning the diff to the immutable workflow_run.head_sha via the compare API instead of the live PR head. That closes the force-push desync between the scanned diff and the status target — a page that gets force-pushed out still can't slip past the SHA the status lands on. The regression test at check-private-leak.test.ts:2306 nails this exact ghost.

Verdict: CONDITIONAL

Blocking issues

None.

Non-blocking concerns

  • Doc/comment drift misdescribes the diff mechanism — and the lie is security-relevant. Two places claim the diff is fetched via gh pr diff, but the code uses the compare API pinned to the immutable SHA (check-private-leak.ts:417-428, repos/{owner}/{repo}/compare/main...${headSha}):

    • .github/workflows/check-private-leak.yaml:10"via the GitHub API (\gh pr diff`), never from a checked-out tree."`
    • metadata/README.md:98"The PR diff is fetched as pure data via \gh pr diff` using the default `GITHUB_TOKEN`"`

    This isn't a cosmetic nit. gh pr diff resolves the live PR head — the mutable target the pinned-SHA design exists to avoid. A future maintainer who trusts the comment and "simplifies" the compare call into gh pr diff would silently reintroduce the force-push TOCTOU that check-private-leak.test.ts:2306 was written to lock out. Fix: replace both gh pr diff references with the compare-API/pinned-SHA description. Remediation impact: prevents a security regression disguised as a cleanup.

  • fetchDiffForSha truncation risk on large diffs (verify, don't assume). The compare API caps at 300 files and 3000-line patches per the documented limits; beyond that the .diff media type returns a truncated body. A PR large enough to exceed that could carry a private name in the truncated tail and pass. I can't confirm from the diff whether any guard exists. Evidence missing: behavior on a >300-file PR. If unhandled, fail-closed when the response signals truncation (the JSON files/total_commits shape exposes it; the raw .diff does not). Defer if you judge a 300-file metadata/wiki PR implausible, but state it.

Missing tests

None. Coverage is the most complete surface in this PR — happy path, every fail-closed branch, redaction, PAT-isolation env capture, and the TOCTOU regression lock are all present.

Risk assessment: LOW

The code is fail-closed by construction, the credential isolation is test-enforced, actions are full-SHA-pinned (actions/checkout@de0fac2e..., jdx/mise-action@dba19683...), permissions are minimal (contents: read, statuses: write, pull-requests: read), the sentinel runs permissions: {}, and there is no untrusted interpolation into any run: block (head_sha is hex; title/branch never reach shell). The residual risk is entirely in the deferred required-check registration for fork heads — which you've correctly flagged as the open empirical question. The doc drift is the one thing worth correcting before this becomes load-bearing folklore.


Run Summary
Field Value
Event pull_request
Repository fro-bot/.github
Run ID 27001427088
Cache hit
Session ses_169566a18ffeSHoLzS0k3zN3sR

…diff-fetch docs

Detect compare-API truncation (300-file cap or an omitted per-file patch) and fail
closed rather than scanning a partial diff that could hide a private name in the
truncated tail. Correct two comments that described the diff fetch as gh pr diff;
the gate fetches the compare API pinned to the immutable head SHA, and the stale
wording risked a maintainer reintroducing the force-push race.

@fro-bot fro-bot left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both concerns from the prior review are addressed.

Doc drift: The gh pr diff references in .github/workflows/check-private-leak.yaml:10 and metadata/README.md:98 are corrected to describe the compare-API/pinned-SHA mechanism accurately.

Truncation gap: assertCompareNotTruncated (exported, tested) fires on two signals — files.length >= 300 and any file missing a patch field — and throws so main()'s try/catch routes to process.exit(1). The check runs as step 1 of fetchDiffForSha before the raw diff fetch, meaning a truncated response never reaches the scanner. Twelve new tests cover both signals plus the happy path. Filename is intentionally withheld from the truncation error message (a knowledge/wiki/repos/ path could embed a private slug) — that's the right call.

95/95 tests pass.

Verdict: PASS

Blocking issues

None.

Non-blocking concerns

None.

Missing tests

None.

Risk assessment: LOW

The two open items from the prior review are closed. The truncation guard is fail-closed, exported, and covered at the boundary. The stale doc was corrected before it could become load-bearing folklore. Required-check registration for fork heads remains correctly deferred — that's the only open question, and it's documented as such.


Run Summary
Field Value
Event pull_request
Repository fro-bot/.github
Run ID 27003037822
Cache hit
Session ses_169566a18ffeSHoLzS0k3zN3sR

@marcusrbrown marcusrbrown merged commit a78e122 into main Jun 5, 2026
13 checks passed
@marcusrbrown marcusrbrown deleted the feat/wire-private-leak-pr-gate branch June 5, 2026 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire Check Private Leak in a trusted workflow_run topology

2 participants