Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 30 additions & 13 deletions review/agent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -203,19 +203,7 @@ export function labelNames(labels: unknown): string[] {
async function reviewAndFix(ctx: WorkforceCtx, pr: Pr): Promise<void> {
const run = await ctx.harness.run({
cwd: ctx.sandbox.cwd,
prompt: [
`Review pull request #${pr.number} in ${pr.owner}/${pr.repo}. The PR code is checked out in the current directory.`,
`Focus on the actual PR changes: read .workforce/pr.diff first, then .workforce/changed-files.txt and .workforce/context.json.`,
`Use the checked-out repo to trace the impact of this diff across callers, types, tests, config, and related files.`,
`Flag and fix breakage even when the affected file is outside the changed-file set, but do not do an unrelated full-repo audit.`,
`Then proactively FIX everything that needs changing — your own findings and any other bot reviews on the PR —`,
`and resolve failing CI checks and merge conflicts by editing the code. Don't use git or the gh CLI; cloud commits`,
`and pushes your file edits to the PR after this run. In your output, do not claim that fixes were pushed,`,
`a GitHub review was submitted, or CI was verified; those are post-harness actions that cloud reports separately.`,
`Only end your output with READY on its own last line when the PR genuinely needs a human now — meaning you have`,
`resolved or addressed every bot and reviewer comment, there are no failing checks left that you could fix, and the`,
`remaining decision requires human judgment. If anything is still red, unresolved, or in-progress, do NOT print READY.`
].join('\n')
prompt: reviewHarnessPrompt(pr)
});

const exitCode = (run as { exitCode?: unknown }).exitCode;
Expand Down Expand Up @@ -250,6 +238,35 @@ async function reviewAndFix(ctx: WorkforceCtx, pr: Pr): Promise<void> {
}
}

export function reviewHarnessPrompt(pr: { owner: string; repo: string; number: number }): string {
return [
`Review pull request #${pr.number} in ${pr.owner}/${pr.repo}. The PR code is checked out in the current directory.`,
`Focus on the actual PR changes: read .workforce/pr.diff first, then .workforce/changed-files.txt and .workforce/context.json.`,
`Use the checked-out repo to trace the impact of this diff across callers, types, tests, config, and related files.`,
`Flag and fix breakage even when the affected file is outside the changed-file set, but do not do an unrelated full-repo audit.`,
`Then proactively FIX everything that needs changing — your own findings and any other bot reviews on the PR —`,
`and resolve failing CI checks and merge conflicts by editing the code. Don't use git or the gh CLI; cloud commits`,
`and pushes your file edits to the PR after this run. In your output, do not claim that fixes were pushed,`,
`a GitHub review was submitted, or CI was verified; those are post-harness actions that cloud reports separately.`,
`Validate every finding — yours or another bot's — against the CURRENT checkout before editing: review comments`,
`are often stale (already fixed by a later push). Reproduce the problem in the code as it is now, or skip it.`,
`Make the smallest fix that addresses a demonstrated problem. Do not rewrite, restructure, or "harden" working`,
`code beyond what the finding requires.`,
`Verify every edit before you finish: run the repo's tests for the files you touched (install dependencies if`,
`needed). When you change code that GENERATES commands, scripts, or queries, also execute a sample of the`,
`generated output against a throwaway fixture — tests that only assert on the generated string prove nothing`,
`about its behavior.`,
`If you cannot verify an edit (tests cannot run in this sandbox and you cannot make them run), do not leave it`,
`in the working tree: discard it with "git restore <file>" — the one exception to the no-git rule, because`,
`rewriting a file back from memory is error-prone — delete files you created, and present the proposed change as`,
`advisory text in your review instead. Anything left in the working tree is committed and pushed to the PR after`,
`you exit — an unverified push is worse than no push.`,
`Only end your output with READY on its own last line when the PR genuinely needs a human now — meaning you have`,
`resolved or addressed every bot and reviewer comment, there are no failing checks left that you could fix, and the`,
`remaining decision requires human judgment. If anything is still red, unresolved, or in-progress, do NOT print READY.`
].join('\n');
}

async function failReviewRun(ctx: WorkforceCtx, pr: Pr, reason: string): Promise<never> {
const message = [
`pr-reviewer could not complete review for #${pr.number} in ${pr.owner}/${pr.repo}.`,
Expand Down
12 changes: 12 additions & 0 deletions tests/review-agent.test.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import {
labelNames,
readPr,
resolveAuthorLogin,
reviewHarnessPrompt,
reviewAuthorAllowlistDecision,
} from '../.test-build/review/agent.js';

Expand Down Expand Up @@ -101,6 +102,17 @@ test('labelNames normalizes github label arrays defensively', () => {
assert.deepEqual(labelNames(undefined), []);
});

test('reviewHarnessPrompt forbids git except the explicit restore-only carve-out', () => {
const prompt = reviewHarnessPrompt({ owner: 'AgentWorkforce', repo: 'agents', number: 47 });
assert.match(prompt, /Don't use git or the gh CLI/);
// "git restore <file>" is deliberately permitted for discarding unverified
// edits (agents#47 review): rewriting a file back from memory is error-prone,
// a restore from HEAD is not. It must be framed as the exception...
assert.match(prompt, /git restore <file>.*exception to the no-git rule/);
// ...and no destructive/state-mutating git verb may creep in.
assert.doesNotMatch(prompt, /\bgit\s+(checkout|reset|clean|commit|push|add|fetch|pull|rebase|merge|stash)\b/);
});

// Cloud only mounts an integration's relayfile subtree from its `scope` (or
// from triggers — and this persona has github triggers only). A scope-less
// `slack: {}` mounts nothing, so every slackClient() post was written to
Expand Down