Add eval harness + cases for all 10 agents by khaliqgant · Pull Request #58 · AgentWorkforce/agents

khaliqgant · 2026-06-11T08:13:04Z

Repeatable dry-runs of the showcase agents, ported from the watchdog-agents eval harness. Each case fires one event at one agent's real handler and asserts routing + side effects. Complements the existing npm test unit suite (which uses hand-rolled ctx spies) by running the real handler through the runtime's simulation API and, in live mode, against an actual cheap model.

What's here

scripts/evals/run-evals.mjs — two executors:
- simulate (default): free, offline. Runs each handler through the runtime's simulateInvocation against an in-memory VFS; harness.run/llm.complete are stubbed but recorded as side effects. Asserts status, eventSource, sideEffectsAll/Any, logsAny.
- live (--live): backs the model calls with a cheap opencode model (gpt-5-nano); --judge grades chat replies against the case rubric (LLM-as-judge).
- Flat/nested agent layout; seeds materialized to both ctx.files and the disk mount so relayClient reads (linearClient().getIssue) resolve; new expect.status:"failed" + expect.errorIncludes for required-input guard cases.
evals/cases.jsonl — 11 cases covering all 10 agents (linear-slack, linear, review, repo-hygiene, hn-monitor, spotify-releases, vendor-monitor, granola, both cloud-team members). 11/11 green in simulate.
evals/seeds/*, evals/README.md — fixtures + docs.
package.json: evals / evals:live scripts (compile personas first); tsx devDep. .gitignore: .evals/ + provider draft-tree safety net.

How to run

npm run evals                 # simulate, all cases
npm run evals -- --list
npm run evals:live -- --judge # real cheap-model replies + judge

Verification

tsc --noEmit clean · all 10 personas compile · npm run evals → 11/11.

🤖 Generated with Claude Code

Summary by cubic

Adds a repeatable eval harness that runs real agent handlers in simulate or live mode, with 11 cases across all 10 showcase agents. Improves coverage for routing, side effects, and chat reply quality.

New Features
- scripts/evals/run-evals.mjs: simulate (offline, stubbed harness.run/llm.complete) and live (--live via opencode, optional --judge); supports flat and nested agent dirs.
- Seeds written to in-memory VFS and disk mount so client reads resolve; supports short and exact VFS-path seeds.
- 11 cases in evals/cases.jsonl with fixtures and expectations, including required-input guard checks (expect.status: "failed" + errorIncludes).
- Artifacts saved to .evals/runs/<stamp>/{result.json,summary.md}; evals/README.md documents usage and seeding; .gitignore ignores .evals/ and adds a provider draft-tree safety net.
- package.json: evals / evals:live scripts (compile personas first) and tsx dev dependency.
Bug Fixes
- Updated tests/linear-slack-agent.test.mjs to map SLACK_CHANNEL to TEST_SLACK_CHANNEL via inputSpecs, preventing env collisions and aligning with the harness env mapping.

^{Written for commit ad556b5. Summary will update on new commits.}

Repeatable dry-runs of the showcase agents, ported from the watchdog-agents eval harness. Each case fires one event at one agent's handler and asserts routing + side effects. - scripts/evals/run-evals.mjs: simulate (free, offline, stubbed harness/llm) and live (cheap opencode model + LLM-as-judge) executors. Flat/nested agent layout; seeds materialized to both ctx.files and the disk mount so relayClient reads (linearClient().getIssue) resolve. New expect.status: "failed" + expect.errorIncludes for required-input guard cases. - evals/cases.jsonl: 11 cases covering linear-slack, linear, review, repo-hygiene, hn-monitor, spotify-releases, vendor-monitor, granola, and the two cloud-team members. 11/11 green in simulate. - evals/seeds/*: linear board fixtures + PR meta, granola note, issue alias. - evals/README.md: documents the two executors, _index.json seeding, what simulate can/can't observe, and the guard-case pattern. - package.json: `evals` / `evals:live` scripts (compile personas first); tsx devDep. .gitignore: .evals/ + provider draft-tree safety net. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-11T08:13:13Z

Warning

Review limit reached

@agent-relay-code[bot], we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 16 minutes and 27 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5a1cbcfd-5f9c-4c9e-8fe0-d6ca76b28a67

📥 Commits

Reviewing files that changed from the base of the PR and between 7c0e59f and ad556b5.

📒 Files selected for processing (1)

scripts/evals/run-evals.mjs

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive evaluation framework for agent showcase cases. It includes detailed documentation of the eval workflow, 11 structured test cases with supporting seed data, a Node.js eval runner supporting both simulate (deterministic, stubbed) and live (real handler, opencode model) execution modes, and integration into the development workflow via npm scripts and environment configuration.

Changes

Agent Evaluation Framework

Layer / File(s)	Summary
Evaluation documentation and architecture `evals/README.md`	README explains repeatable evaluation workflow, distinguishes simulate (deterministic stubs) and live (real handler + opencode model) modes, details VFS seeding via `_index.json`, documents recorded vs. untracked side effects, and specifies `cases.jsonl` schema with fixture, inputs, seeds, expect, and rubric fields.
Test cases and seed data `evals/cases.jsonl`, `evals/seeds/*`	11 structured evaluation cases covering chat, triage, guard, and scheduled agents with fixtures (Slack, Linear, GitHub, cron, Granola), plus supporting seed data: GitHub PR diffs with `exportCsv` function, Linear issues/projects/teams/workflow states, Slack users, and Granola prospect records.
Eval runner script implementation `scripts/evals/run-evals.mjs`	Node.js runner with CLI parsing, simulate executor (seed mounts, stubbed model behavior), live executor (real handler + opencode integration), expectation validation, optional rubric-based grading for live chat cases, and orchestration with `result.json`/`summary.md` output and exit code handling.
Project configuration and integration `.gitignore`, `package.json`, `tests/linear-slack-agent.test.mjs`	Updates `.gitignore` for eval artifacts and provider drafts, adds `compile`/`evals`/`evals:live` npm scripts, extends `devDependencies` with `tsx` and pins `@relayfile/adapter-linear` to 0.3.11, and wires `SLACK_CHANNEL` via `TEST_SLACK_CHANNEL` in linear-slack test context.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

AgentWorkforce/agents#53: The update to tests/linear-slack-agent.test.mjs wiring SLACK_CHANNEL via TEST_SLACK_CHANNEL directly aligns with the newly added linear-slack persona/agent that depends on this environment variable for Slack-channel scoping.

Poem

🐰 Eval cases hop and skip so fine,
Seeds germinate in JSON line,
Simulate runs and live runs dance,
Agents showcase their best stance,
Results bloom in summary's light!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly matches the main change: adding an eval harness and test cases for all 10 agents, which is the primary purpose of this PR.
Description check	✅ Passed	The description is comprehensively related to the changeset, detailing the eval harness structure, execution modes, cases, seeds, and how to run them.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/eval-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a repeatable evaluation suite for showcase agents, adding a runner script (run-evals.mjs), test cases, mock seed data, and npm scripts for both simulated and live runs. The review feedback highlights several key improvements for the runner script: fixing the premature cleanup of the RELAYFILE_MOUNT_ROOT environment variable, adding support for named handler exports, implementing error handling for the opencode process execution, and ensuring that malformed or failed LLM judgments correctly fail the test cases instead of silently passing.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-11T08:15:16Z

+    const mod = await tsImport(pathToFileURL(agentEntry(testCase.agent)).href, import.meta.url);
+    if (!event) throw new Error('envelopeToAgentEvent returned null (unsupported envelope)');
+    const handler = mod.default?.handler ?? mod.default;
+    await withCaseEnv(personaSpec, testCase.inputs ?? {}, { RELAYFILE_MOUNT_ROOT: mount }, () => handler(ctx, event));


The comment on lines 268-273 states that RELAYFILE_MOUNT_ROOT is pinned for the rest of the process to allow fire-and-forget draft writes to complete without falling back to the current working directory. However, because it is passed as part of extraEnv to withCaseEnv, it gets deleted or restored in the finally block of withCaseEnv as soon as the handler promise resolves. This defeats the purpose of pinning it.

To fix this, set process.env.RELAYFILE_MOUNT_ROOT = mount; globally before calling withCaseEnv, and pass an empty object {} for extraEnv so it is not cleaned up prematurely.

process.env.RELAYFILE_MOUNT_ROOT = mount; await withCaseEnv(personaSpec, testCase.inputs ?? {}, {}, () => handler(ctx, event));

gemini-code-assist · 2026-06-11T08:15:16Z

+    const rec = await withCaseEnv(persona, testCase.inputs ?? {}, { RELAYFILE_MOUNT_ROOT: tmp }, () =>
+      simulateInvocation({
+        persona,
+        handler: mod.default?.handler ?? mod.default,


The current handler resolution only supports default exports (mod.default?.handler ?? mod.default). If an agent uses a named export (e.g., export const handler = ...), mod.default will be undefined, causing the handler to resolve to undefined and crash at runtime.

Updating this to support named exports improves robustness and prevents future agents from failing. Please apply this same change to line 282 in runLive as well.

Suggested change

handler: mod.default?.handler ?? mod.default,

handler: mod.handler ?? mod.default?.handler ?? mod.default,

gemini-code-assist · 2026-06-11T08:15:16Z

+    maxBuffer: 16 * 1024 * 1024,
+    env: { ...process.env },
+  });
+  const raw = (res.stdout ?? '').replace(/\x1b\[[0-9;]*m/g, '');


spawnSync is called without checking for execution errors or non-zero exit codes. If the opencode binary is missing, misconfigured, or fails during execution, the function will fail silently and return an empty string, leading to hard-to-debug test failures or false positives.

Adding explicit error handling ensures that any issues with the LLM runner are surfaced immediately.

if (res.error) { throw new Error(`Failed to execute opencode: ${res.error.message}`); } if (res.status !== 0) { throw new Error(`opencode exited with code ${res.status}: ${res.stderr || res.stdout}`); } const raw = (res.stdout ?? '').replace(/\x1b\[[0-9;]*m/g, '');

gemini-code-assist · 2026-06-11T08:15:16Z

+    // A case may deliberately expect a failure (e.g. a required-input guard throw);
+    // only treat an unexpected failed status as an automatic fail.
+    const expectsFailure = (testCase.expect?.status ?? null) === 'failed';
+    const passed = checks.every((c) => c.pass) && (expectsFailure || outcome.status !== 'failed') && (verdict ? verdict.pass !== false : true);


If the LLM judge fails to parse the response (e.g., due to invalid JSON), judge returns { pass: null, reason: ... }. The check verdict ? verdict.pass !== false : true evaluates to true when verdict.pass is null. This means a failed or malformed judgment silently passes the test instead of failing it.

Changing this to verdict.pass === true ensures that any non-true judgment (including parsing errors) correctly fails the test case.

Suggested change

const passed = checks.every((c) => c.pass) && (expectsFailure || outcome.status !== 'failed') && (verdict ? verdict.pass !== false : true);

const passed = checks.every((c) => c.pass) && (expectsFailure || outcome.status !== 'failed') && (verdict ? verdict.pass === true : true);

agent-relay-code · 2026-06-11T08:20:07Z

Reviewed PR #58 and made scoped fixes.

Changes made:

Fixed eval sandbox seeding by setting both RELAYFILE_MOUNT_ROOT and WORKSPACE_ROOT, since relayfile resolves WORKSPACE_ROOT first: scripts/evals/run-evals.mjs, scripts/evals/run-evals.mjs
Pinned @relayfile/adapter-linear to 0.3.11 so the lockfile no longer installs the older top-level adapter: package.json
Isolated Linear Slack tests from ambient SLACK_CHANNEL env so CI/runtime env does not skip the handler path under test: tests/linear-slack-agent.test.mjs

Validation run locally:

npm ci
npm test passed, 60/60
npm run typecheck passed
npm run evals passed, 11/11
npm ls @relayfile/adapter-linear @relayfile/adapter-core @relayfile/relay-helpers passed

Addressed comments

No bot or reviewer comments were present in the provided .workforce artifacts, so there were no external review threads to validate or address.

Advisory Notes

I could not verify GitHub-side CI status or mergeability from this checkout without using GitHub tooling, so I am not marking this as READY.

agent-relay-code · 2026-06-11T08:26:33Z

Reviewed PR #58 and made one scoped runner fix pass in scripts/evals/run-evals.mjs.

Changes made:

Added named handler export support for simulate/live eval execution.
Made opencode execution failures explicit in live mode.
Kept live-mode RELAYFILE_MOUNT_ROOT and WORKSPACE_ROOT pinned after handler return so async writebacks do not fall back to cwd.
Made malformed/non-true LLM judge verdicts fail instead of passing.

Validation run locally:

npm ci passed
npm test passed, 60/60
npm run typecheck passed
npm run evals passed, 11/11
npm ls @relayfile/adapter-linear @relayfile/adapter-core @relayfile/relay-helpers passed

Addressed comments

gemini-code-assist[bot]: live-mode mount env was restored immediately despite the comment saying it stays pinned; fixed in scripts/evals/run-evals.mjs.
gemini-code-assist[bot]: handler resolution ignored named handler exports; fixed with resolveHandler in scripts/evals/run-evals.mjs, used at scripts/evals/run-evals.mjs and scripts/evals/run-evals.mjs.
gemini-code-assist[bot]: spawnSync failures from opencode were swallowed; fixed in scripts/evals/run-evals.mjs.
gemini-code-assist[bot]: malformed judge output with pass: null could pass; fixed in scripts/evals/run-evals.mjs.
coderabbitai[bot]: processing/status comment only; no concrete code finding to address.
agent-relay-code[bot]: prior summary noted scoped fixes already present in the current checkout; I validated them and only changed the still-reproducible Gemini findings above.

Advisory Notes

I am not printing READY: GitHub API currently reports combined status pending because CodeRabbit is pending, and PR mergeable_state is unstable, even though local validation is green.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

scripts/evals/run-evals.mjs (1)
268-273: 💤 Low value

Temp directory leak is documented but may accumulate.

The comment explains that the mount directory is intentionally not cleaned up to avoid breaking async draft writes, but this means every live eval run leaves a directory in /tmp. Consider tracking these directories and cleaning up older runs (e.g., older than 1 hour) to prevent unbounded accumulation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/evals/run-evals.mjs` around lines 268 - 273, The temp mount
directories created and intentionally not removed (see RELAYFILE_MOUNT_ROOT and
local variable mount) can accumulate; add a short cleanup routine at the top of
the run-evals flow that scans the same temp-root pattern used for mount (e.g.,
RELAYFILE_MOUNT_ROOT/* or whatever naming convention creates the per-run mount
dirs), checks mtime/ctime, and deletes any directories older than a threshold
(suggest 1 hour) using fs/stat and fs.rm or rimraf; invoke this cleanup before
creating a new mount to avoid unbounded accumulation while preserving recent
mounts used by running drafts.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/cases.jsonl`:
- Line 4: The test case currently can't assert "no harness run" because the
expectation only checks status/logs; update the case in evals/cases.jsonl to
include a new boolean flag (e.g., "sideEffectsNone": true) and then modify the
runner's checkExpectations function in scripts/evals/run-evals.mjs to assert
that when sideEffectsNone is true no side-effecting methods were invoked
(specifically ensure harness.run was not called) by adding a negative assertion
path that fails if harness.run or equivalent side-effect markers were observed;
reference the expectation key "sideEffectsNone", the runner helper
checkExpectations, and the side-effecting method harness.run to locate where to
add the check.

---

Nitpick comments:
In `@scripts/evals/run-evals.mjs`:
- Around line 268-273: The temp mount directories created and intentionally not
removed (see RELAYFILE_MOUNT_ROOT and local variable mount) can accumulate; add
a short cleanup routine at the top of the run-evals flow that scans the same
temp-root pattern used for mount (e.g., RELAYFILE_MOUNT_ROOT/* or whatever
naming convention creates the per-run mount dirs), checks mtime/ctime, and
deletes any directories older than a threshold (suggest 1 hour) using fs/stat
and fs.rm or rimraf; invoke this cleanup before creating a new mount to avoid
unbounded accumulation while preserving recent mounts used by running drafts.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8cbecaaf-f1a8-4e0d-bd04-2bc29897ec9a

📥 Commits

Reviewing files that changed from the base of the PR and between 946c200 and 7c0e59f.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (14)

.gitignore
evals/README.md
evals/cases.jsonl
evals/seeds/github-pr-widget-7-meta.json
evals/seeds/granola-note-prospect.json
evals/seeds/linear-issue-1.json
evals/seeds/linear-issues.json
evals/seeds/linear-projects.json
evals/seeds/linear-teams.json
evals/seeds/linear-workflow-states.json
evals/seeds/slack-users.json
package.json
scripts/evals/run-evals.mjs
tests/linear-slack-agent.test.mjs

coderabbitai · 2026-06-11T08:31:04Z

+{"id":"linear-slack.chat","agent":"linear-slack","kind":"chat","fixture":{"type":"slack.message.created","resource":{"channel":"C0TEST","ts":"100.1","text":"What's open on the board for the export work?","user":"U1"}},"inputs":{"SLACK_CHANNEL":"C0TEST"},"seeds":["linear/projects","linear/issues","linear/teams"],"expect":{"status":"succeeded","eventSource":"slack","sideEffectsAll":["harness.run"]},"rubric":"A grounded Slack answer about open Linear issues for the export work, citing real issues from the board. Read-only unless asked to create; must not fabricate issue refs."}
+{"id":"linear.chat","agent":"linear","kind":"chat","fixture":{"type":"linear.AgentSessionEvent.prompted","resource":{"payload":{"agentSession":{"id":"session-1","issue":{"id":"issue-1"}},"agentActivity":{"body":"What's the current status of this issue?"}}}},"inputs":{},"seeds":[{"vfs":"/linear/issues/by-uuid/issue-1.json","file":"linear-issue-1.json"}],"expect":{"status":"succeeded","eventSource":"linear","sideEffectsAll":["llm.complete"],"logsAny":["linear event"]},"rubric":"A grounded conversational status reply about the issue. Read-only: must not claim to have edited or closed anything."}
+{"id":"review.review","agent":"review","kind":"triage","fixture":{"type":"github.pull_request.opened","resource":{"pull_request":{"number":7,"html_url":"https://github.com/acme/widget/pull/7","user":{"login":"alice"},"head":{"sha":"abc123"},"state":"open","draft":false},"repository":{"name":"widget","owner":{"login":"acme"}}}},"inputs":{},"seeds":[],"expect":{"status":"succeeded","eventSource":"github","sideEffectsAll":["harness.run"]},"rubric":"A code review that runs the harness against the PR diff and surfaces real issues (e.g. the unpaginated export OOM)."}
+{"id":"review.skip-label","agent":"review","kind":"triage","fixture":{"type":"github.pull_request.opened","resource":{"pull_request":{"number":8,"html_url":"https://github.com/acme/widget/pull/8","user":{"login":"alice"},"labels":[{"name":"no-agent-relay-review"}],"head":{"sha":"def456"},"state":"open","draft":false},"repository":{"name":"widget","owner":{"login":"acme"}}}},"inputs":{},"seeds":[],"expect":{"status":"succeeded","eventSource":"github","logsAny":["pr-reviewer skipped"]},"rubric":"A PR carrying the opt-out label must be skipped without running the review harness."}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Rubric requires “no harness run,” but this case cannot enforce it.

Line 4 says skip must happen without running review, but expect only checks status/logs. With current runner checks, this can still pass even if harness.run is called.

Proposed direction (cases + runner)

-{"id":"review.skip-label",...,"expect":{"status":"succeeded","eventSource":"github","logsAny":["pr-reviewer skipped"]},...} +{"id":"review.skip-label",...,"expect":{"status":"succeeded","eventSource":"github","logsAny":["pr-reviewer skipped"],"sideEffectsNone":["harness.run"]},...}

And in scripts/evals/run-evals.mjs, add a corresponding negative assertion in checkExpectations(...) for sideEffectsNone.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/cases.jsonl` at line 4, The test case currently can't assert "no harness run" because the expectation only checks status/logs; update the case in evals/cases.jsonl to include a new boolean flag (e.g., "sideEffectsNone": true) and then modify the runner's checkExpectations function in scripts/evals/run-evals.mjs to assert that when sideEffectsNone is true no side-effecting methods were invoked (specifically ensure harness.run was not called) by adding a negative assertion path that fails if harness.run or equivalent side-effect markers were observed; reference the expectation key "sideEffectsNone", the runner helper checkExpectations, and the side-effecting method harness.run to locate where to add the check.

gemini-code-assist Bot reviewed Jun 11, 2026

View reviewed changes

chore: apply pr-reviewer fixes for #58

7c0e59f

chore: apply pr-reviewer fixes for #58

ad556b5

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

khaliqgant merged commit b109a4d into main Jun 11, 2026
2 checks passed

khaliqgant deleted the feat/eval-harness branch June 11, 2026 08:47

coderabbitai Bot mentioned this pull request Jun 22, 2026

feat: conversational Slack chat bots (inbox-buddy + joke-bot) + @mention wake-gate #85

Merged

	handler: mod.default?.handler ?? mod.default,
	handler: mod.handler ?? mod.default?.handler ?? mod.default,

	const passed = checks.every((c) => c.pass) && (expectsFailure \|\| outcome.status !== 'failed') && (verdict ? verdict.pass !== false : true);
	const passed = checks.every((c) => c.pass) && (expectsFailure \|\| outcome.status !== 'failed') && (verdict ? verdict.pass === true : true);

Conversation

khaliqgant commented Jun 11, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's here

How to run

Verification

Summary by cubic

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

agent-relay-code Bot commented Jun 11, 2026

Addressed comments

Advisory Notes

Uh oh!

agent-relay-code Bot commented Jun 11, 2026

Addressed comments

Advisory Notes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khaliqgant commented Jun 11, 2026 •

edited by cubic-dev-ai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading