feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs by khaliqgant · Pull Request #375 · AgentWorkforce/pear

khaliqgant · 2026-06-17T11:24:15Z

Summary

Eval harness (evals/): spawns real agents via the broker in a temp fixture dir with a fake .integrations/ mount, scores whether the agent wrote the correct path + valid JSON. 6 scenarios × 5 variants, runnable with npm run eval.
prescriptiveSpawnInstructions() on IntegrationsManager: compact write-path lookup table derived from real mount data — same source as initialSpawnInstructions, different format. No discovery reads required from the model.
broker:spawn-agent routing: non-claude CLIs (opencode, etc.) get the prescriptive format; claude gets the existing narrative full-inject format.

Eval results

variant	deepseek-v4-flash-free	gpt-5.4-nano
bare	0%	0%
slim-inject	100%	100%
full-inject	100%	100%
prescriptive	100% (fastest)	100%

All free/Chinese/OpenAI models tested via opencode pass at 100% on prescriptive. bare always fails — no model self-discovers paths without guidance.

Key implementation notes

opencode headless transport: spawnCli({ transport: 'headless' }) + skipRelayPrompt: true (opencode has no agent-relay agent)
Default eval model: opencode/deepseek-v4-flash-free (free, ~18s/run)
Non-claude CLIs receive absolute fixture paths in the task prefix — opencode's cwd doesn't always match the project dir
recordSpawnInstructionDelivery guarded to narrative path only to avoid stale snippet cross-contamination

Test plan

npx vitest run src/main/integrations.test.ts — 54 tests pass (2 new for prescriptiveSpawnInstructions)
npm run eval -- --cli=opencode --variant=prescriptive --repeat=3 — 18/18
Spawn opencode agent in pear with a Slack/Linear integration connected — verify the prescriptive table appears in the task instead of the <integrations-update> block

🤖 Generated with Claude Code

…on-claude CLIs Adds an eval harness (evals/) that spawns real agents via the broker in a fixture dir with a fake .integrations/ mount, then scores whether the agent wrote to the correct path with a valid JSON payload. Covers 6 scenarios (Slack channel/DM, Linear create/update/comment/delete) across 5 guidance variants (bare, claude-md, slim-inject, full-inject, prescriptive). Key findings from eval runs: - `prescriptive` variant achieves 18/18 (100%) across all free and Chinese models (deepseek-v4-flash-free, mimo, nemotron, north-mini-code, gpt-5.4-nano, gpt-5.4-mini, gpt-5.1-codex-mini, gpt-5.5) — reliable for non-claude CLIs - `full-inject` and `slim-inject` also reach 100% once absolute paths are injected for CLIs whose cwd doesn't match the project fixture dir - `bare` fails universally — no model self-discovers integration paths Harness changes: - opencode uses `spawnCli({ transport: 'headless' })` + `skipRelayPrompt` - Default opencode model is `opencode/deepseek-v4-flash-free` (free, fast) - Non-claude CLIs receive absolute fixture paths in the task prefix so writes land in the correct temp dir regardless of CLI cwd detection Production wiring: - `IntegrationsManager.prescriptiveSpawnInstructions()`: derives the lookup table from real `writebackCommandMountPaths` — same data as `initialSpawnInstructions`, compact format instead of narrative prose - `broker:spawn-agent` IPC handler routes `cli !== 'claude'` to prescriptive; `recordSpawnInstructionDelivery` guarded to narrative path only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-06-17T11:24:42Z

Warning

Review limit reached

@khaliqgant, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 28 minutes and 20 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5d0ba5dc-ac8d-4a4f-af0f-065387c9c65f

📥 Commits

Reviewing files that changed from the base of the PR and between 0518e02 and eb1def0.

📒 Files selected for processing (2)

src/main/integrations.test.ts
src/main/integrations.ts

📝 Walkthrough

Walkthrough

Adds prescriptiveSpawnInstructions to IntegrationsManager and routes non-claude CLI agent spawns through it in the broker:spawn-agent IPC handler. Introduces a complete eval harness under evals/ with discovery fixture schemas, six mount scenarios, five task-prefix variants, a harness driver wrapper, a CLI runner with pass/fail scoring, and an HTML/JSON report generator.

Changes

Prescriptive Spawn Instructions (production)

Layer / File(s)	Summary
`IntegrationsManager.prescriptiveSpawnInstructions` method and tests `src/main/integrations.ts`, `src/main/integrations.test.ts`	Adds the public method that builds compact, provider-specific JSON write-path instruction strings for Slack, Linear, and generic providers, returning `undefined` when no integrations are visible; three new tests assert content, DM shape, and the `undefined` return.
IPC handler routing and delivery-tracking gate `src/main/ipc-handlers.ts`, `src/main/ipc-handlers.test.ts`	Introduces a `usePrescriptive` flag in `broker:spawn-agent` derived from `input.cli`, switches between the two instruction generators, and gates `recordSpawnInstructionDelivery` to non-prescriptive spawns; two new tests verify non-`claude` vs `claude` branching.

Eval Harness Infrastructure

Layer / File(s)	Summary
Discovery fixture schemas, examples, and adapter docs `evals/fixtures/discovery/linear/...`, `evals/fixtures/discovery/slack/...`	Adds JSON Schemas, create/update/delete example JSON fixtures, and adapter markdown contracts for Linear issues (including comments) and Slack channel/DM messages, defining the writeback payload shapes agents must follow.
Fixture helper, variants, and scenario definitions `evals/fixture.ts`, `evals/variants.ts`, `evals/scenarios/*`	`createFixture` scaffolds a temp directory with discovery schemas and writable provider paths; `variants.ts` defines five task-prefix variants; six `MountScenario` exports cover Slack post, Slack DM, and Linear create/update/comment/delete; `scenarios/index.ts` aggregates them.
Harness driver wrapper `evals/harness.ts`	Caches a Relaycast workspace key, implements `runEval` which spawns a `HarnessDriverClient` and an agent (headless `opencode` or PTY), collects broker events with a 3-minute timeout, and returns `{ agentName, exit, events, durationMs }`.
CLI runner, report generator, and package wiring `evals/runner.ts`, `evals/report.ts`, `package.json`, `.gitignore`	`runner.ts` parses CLI flags, iterates scenario/variant/repeat cells, creates fixtures, runs evals, scores outputs, and prints a results table; `report.ts` writes timestamped JSON and HTML reports; `package.json` adds `eval`/`eval:quick` scripts, `@agent-relay/evals`, and `tsx`; `.gitignore` excludes report output files.

Incident Documentation

Layer / File(s)	Summary
Mount-root invariant incident record `memory/INCIDENT-20260617T122713Z.md`	Records a missing local mount root failure with recovery instructions for `--reset-after-clobber` / `RELAYFILE_RESET_AFTER_CLOBBER=1`.

Sequence Diagram

sequenceDiagram
  participant CLI as CLI (non-claude)
  participant IPC as broker:spawn-agent
  participant IM as IntegrationsManager
  participant HarnessDriver as HarnessDriverClient
  participant Agent as opencode / PTY agent
  participant Mount as .integrations/ filesystem

  CLI->>IPC: spawn-agent { cli, projectId, task }
  IPC->>IM: prescriptiveSpawnInstructions(projectId)
  IM-->>IPC: write-path instruction string
  IPC->>HarnessDriver: spawn agent with task + instructions
  HarnessDriver->>Agent: start headless/PTY process
  Agent->>Mount: write JSON file under .integrations/<provider>/
  Mount-->>HarnessDriver: file system event
  HarnessDriver-->>IPC: AgentExitInfo + BrokerEvents
  IPC-->>CLI: EvalRunResult { exit, events, durationMs }

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Comprehensive harness evals for pear — .integrations, agent lifecycle, and multi-model coverage #269: This PR directly implements the .integrations evaluation harness infrastructure requested in that issue, including scenario definitions (s01–s06 covering Slack and Linear), fixture setup, runner, variant testing, and report generation.

Poem

🐇 Hoppity-hop through the mount's JSON halls,
I wrote all my fixtures, my schemas, my calls.
Prescriptive instructions now guide every spawn—
Not Claude? Here's the path, write your JSON and move on!
Reports in HTML, pass/fail in a row,
The rabbit says: run eval:quick and watch the scores flow. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: introducing integration mount evaluation infrastructure and adaptive routing of prescriptive spawn instructions for non-Claude CLIs.
Description check	✅ Passed	The description thoroughly documents the eval harness, prescriptive spawn instructions feature, broker routing logic, evaluation results, implementation notes, and test plan—all directly related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ar-267-relayfile-sdk-bump

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new 'prescriptive' onboarding variant for integration mount evaluations, designed to provide a compact, explicit write-path lookup table with exact paths and minimal payload schemas for non-Claude CLIs. Key feedback on these changes includes: adding the missing 'prescriptive' variant to the VARIANT_ORDER array in evals/report.ts to fix HTML report sorting; removing a redundant explanation about <channelDir> in prescriptiveSpawnInstructions since the path is already fully resolved; and removing the redundant userId field from the Slack DM payload templates in both src/main/integrations.ts and evals/variants.ts to align with the actual schema.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-17T11:25:56Z

+  console.log(`HTML report: ${htmlPath}`)
+}
+
+const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject']


The newly introduced prescriptive variant is missing from the VARIANT_ORDER array. This causes the prescriptive column to be sorted incorrectly (at the very beginning, before bare) in the generated HTML report. Adding it to the end of the array maintains the intended order from lightest to heaviest guidance.

Suggested change

const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject']

const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject', 'prescriptive']

gemini-code-assist · 2026-06-17T11:25:56Z

+          lines.push(`  Slack channel message → ${p}/<name>.json`)
+          lines.push(`    <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)`)
+          lines.push(`    payload: {"text":"<message>","channelId":"<channelId>"}`)


In prescriptiveSpawnInstructions, the path p is already a fully resolved, concrete path (e.g., .integrations/slack/channels/C12345__general/messages). It does not contain the <channelDir> placeholder. Therefore, the explanation line about <channelDir> is redundant and potentially confusing to the agent, and should be removed.

Suggested change

lines.push(` Slack channel message → ${p}/<name>.json`)

lines.push(` <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)`)

lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`)

lines.push(` Slack channel message → ${p}/<name>.json`)

lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`)

gemini-code-assist · 2026-06-17T11:25:56Z

+        if (dmPaths.length > 0 || channelPaths.length === 0) {
+          const dmBase = dmPaths[0] ?? `${PROJECT_INTEGRATIONS_LINK_NAME}/slack/users/<userId>/messages`
+          lines.push(`  Slack DM → ${dmBase}/<name>.json`)
+          lines.push(`    payload: {"text":"<message>","userId":"<id>"}`)


The Slack direct message schema (evals/fixtures/discovery/slack/users/{userId}/messages/.schema.json) and its create example do not contain a userId property in the payload, as the user ID is already determined from the path. Specifying "userId":"<id>" in the payload description is redundant and deviates from the schema.

Suggested change

lines.push(` payload: {"text":"<message>","userId":"<id>"}`)

lines.push(` payload: {"text":"<message>"}`)

gemini-code-assist · 2026-06-17T11:25:56Z

+    <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)
+    payload: {"text":"<message>","channelId":"<channelId>"}
+  Slack DM             → ${base}/slack/users/<userId>/messages/<name>.json
+    payload: {"text":"<message>","userId":"<id>"}


To keep the prescriptive instructions consistent with the Slack DM schema and create example (which do not use a userId property in the payload), the "userId":"<id>" field should be removed from the payload template.

Suggested change

payload: {"text":"<message>","userId":"<id>"}

payload: {"text":"<message>"}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdfda7cbee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T11:28:22Z

+        lines.push(`    payload: {"id":"<issueId>","_action":"update",<fields to change>}`)
+        lines.push(`  Linear delete issue → ${issueBase}/<name>.json`)
+        lines.push(`    payload: {"id":"<issueId>","_action":"delete"}`)
+        lines.push(`  Linear comment → ${issueBase}/<issueId>/comments/<name>.json`)


Use the canonical Linear issue file for comments

For non-Claude agents with a Linear integration, this instruction is the exact path they are told to use, but Linear comment writeback is keyed off the canonical issue resource filename returned from /linear/issues (for example KEY-123__uuid.json), not an arbitrary <issueId> directory. The existing linearIssueCommentRemotePath helper only accepts /linear/issues/<identifier>__<uuid>.json/comments/... and rejects UUID-only issue paths, so agents following this prompt can write files that the local mount accepts but that do not create visible Linear comments.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-17T11:28:22Z

    "protobufjs": "8.5.0"
  },
  "devDependencies": {
+    "@agent-relay/evals": "file:../relay/packages/evals",


Replace the local evals file dependency

This makes every fresh Pear install depend on a sibling checkout at ../relay/packages/evals; in a standalone clone or CI workspace without that directory, npm cannot resolve @agent-relay/evals, so the newly added eval scripts and even a normal dev install fail before they can run. Since the eval files import this package directly, this needs to be a published/versioned dependency or vendored into this repo rather than a machine-local path.

Useful? React with 👍 / 👎.

CI fix: - ipc-handlers.test.ts mocked integrationsManager was missing prescriptiveSpawnInstructions/recordSpawnInstructionDelivery, so the broker:spawn-agent test (cli: 'codex') threw "not a function". Added the mocks plus focused routing tests (non-claude → prescriptive + no delivery record; claude → narrative + delivery record). Review feedback: - package.json: depend on published @agent-relay/evals@^8.8.2 instead of file:../relay/packages/evals so fresh clones / CI can resolve it (codex P2) - integrations.ts: Linear comment path now references the canonical issue resource file (<KEY>-<num>__<uuid>.json) instead of a bare <issueId> dir — the local mount's linearIssueCommentRemotePath rejects UUID-less paths, so the old instruction produced files that never became visible comments (codex P1) - integrations.ts + variants.ts: drop redundant "userId" from the Slack DM payload (path-derived; matches the discovery schema) (gemini) - integrations.ts: remove the stale <channelDir> note — the emitted path is already concrete (gemini) - report.ts: add 'prescriptive' to VARIANT_ORDER so the HTML report column sorts last instead of first (gemini) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ter-driven Removes the hardcoded per-provider branches (and the interim curated payload map) from prescriptiveSpawnInstructions. Writable resources + path templates now come entirely from each provider's discovery `.adapter.md` (shipped by relayfile-adapters), and payload shape is pointed at that resource's discovery `.create.example.json`. No per-provider knowledge lives in pear, so a new integration works with zero code change here. - Parse the adapter doc's "Writable resources" section (provider-agnostic) - Resolve each resource's concrete, in-scope path from the integration's writeback mount roots; preserve {id} placeholders for nested resources - Point at the adapter's create example for fields instead of inlining payloads - Graceful fallback to a discovery pointer when the adapter doc isn't mounted Known gap tracked upstream: the local mount currently serves discovery inferred from synced read records (which omits required write fields like Slack's `text`), so the pointed-at example is imperfect until that's fixed — AgentWorkforce/relayfile#299. The adapters already publish correct write-shaped discovery; the fix belongs in the mount/sync pipeline, not pear. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

agent-relay-code[bot] pushed an unrelated mount-root "incident" note (from its own Daytona sandbox) onto this PR; it references a non-existent doc and has nothing to do with the prescriptive-spawn/evals change. Removing it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

src/main/integrations.test.ts (1)

1543-1568: ⚡ Quick win

Add a prescriptive DM-path test for /slack/dms/<id>/messages.

Current coverage validates only the /users/ DM variant. Add a sibling case for a /dms/ mount to lock the intended routing and prevent fallback-path regressions.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/integrations.test.ts` around lines 1543 - 1568, Add a new sibling
test case after the existing test 'emits a text-only Slack DM payload when a
user mount is present'. The new test should follow the same structure and
assertions but use a mountPath with the `/slack/dms/<id>/messages` variant
instead of `/slack/users/U67890EVAL/messages` to ensure both DM path formats are
properly validated and prevent regressions. Verify that the IntegrationsManager
correctly generates the prescriptive spawn instructions for the DMs path
variant, including the text-only payload structure and absence of userId field.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/fixture.ts`:
- Around line 54-55: The re-export statement is exporting snapshotMount and
newMountFiles, but these functions cannot be verified as public APIs in the
`@agent-relay/evals` v8.8.2 package. Verify that both snapshotMount and
newMountFiles actually exist as exported functions in the
`@agent-relay/evals/scoring/mount` module (or wherever they are currently imported
from). If they don't exist as public exports, either locate where these
utilities are actually defined and update the import path accordingly, or remove
the re-exports if they are not intended for external use.

In `@evals/fixtures/discovery/slack/.adapter.md`:
- Around line 14-23: The adapter references reactions and replies resources with
corresponding schema and create example fixture files, but these files do not
exist in the repository. Either create the missing fixture files for the
reactions and replies resources by adding .schema.json and .create.example.json
files for each resource following the same structure as the existing messages
fixtures, or remove the reactions and replies resource sections from the adapter
documentation to ensure all advertised resources have their corresponding
fixture files present.

In `@evals/runner.ts`:
- Around line 50-59: The variants variable at line 50 uses unsafe casting (as
Variant[]) without validating that the split variant strings are actually valid
Variant types, and the repeat variable at line 58 uses parseInt without checking
for invalid results like NaN, zero, or negative numbers. Validate each variant
string against the known VARIANTS list similar to how scenarioById validates
scenarios, throwing an error if an unknown variant is provided. For the repeat
value, validate the parsed integer result to ensure it is a positive number
greater than zero, throwing an error if the parsed value is invalid.

In `@memory/INCIDENT-20260617T122713Z.md`:
- Line 4: The incident documentation file contains a hardcoded
developer-specific filesystem path on line 4 that exposes the username "daytona"
and filesystem structure. Replace the hardcoded path
`/home/daytona/workspace/memory/workspace` with a generic placeholder that
represents the computed mount root location, such as referencing the
`integrationMountRootForWorkspace(workspaceId)` function pattern or a
descriptive placeholder like `{computed_mount_root}/workspace` to make the
documentation reusable across different developer environments without exposing
personal filesystem details.
- Around line 17-18: The documentation reference in INCIDENT-20260617T122713Z.md
on lines 17-18 points to a non-existent file
`docs/architecture/mount-invariants.md` and the `docs/architecture` directory
does not exist. Either create the missing documentation file with the
appropriate content about protected invariants and recovery procedures, or
remove the broken reference from lines 17-18 of the incident report. Choose the
option that aligns with your documentation strategy.

In `@src/main/integrations.ts`:
- Around line 2310-2319: The dmPaths filter at line 2310 only checks for paths
containing /users/, which misses DM paths using the /dms/ pattern in the Slack
mount. Update the filter to also match paths that include /dms/ so that when the
connected Slack mount resolves to .integrations/slack/dms/<id>/messages, it is
properly detected as a DM path instead of falling back to the incorrect
/users/<userId>/messages path. Modify the filter predicate on writebackPaths to
check for both /users/ and /dms/ patterns.

---

Nitpick comments:
In `@src/main/integrations.test.ts`:
- Around line 1543-1568: Add a new sibling test case after the existing test
'emits a text-only Slack DM payload when a user mount is present'. The new test
should follow the same structure and assertions but use a mountPath with the
`/slack/dms/<id>/messages` variant instead of `/slack/users/U67890EVAL/messages`
to ensure both DM path formats are properly validated and prevent regressions.
Verify that the IntegrationsManager correctly generates the prescriptive spawn
instructions for the DMs path variant, including the text-only payload structure
and absence of userId field.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 85145ef4-22cd-4785-ab14-2b9c1e2512ac

📥 Commits

Reviewing files that changed from the base of the PR and between 5e2489f and 0518e02.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (32)

.gitignore
evals/fixture.ts
evals/fixtures/discovery/linear/.adapter.md
evals/fixtures/discovery/linear/issues/.create.example.json
evals/fixtures/discovery/linear/issues/.delete.example.json
evals/fixtures/discovery/linear/issues/.schema.json
evals/fixtures/discovery/linear/issues/.update.example.json
evals/fixtures/discovery/linear/issues/{issueId}/comments/.create.example.json
evals/fixtures/discovery/linear/issues/{issueId}/comments/.schema.json
evals/fixtures/discovery/slack/.adapter.md
evals/fixtures/discovery/slack/channels/{channelId}/messages/.create.example.json
evals/fixtures/discovery/slack/channels/{channelId}/messages/.schema.json
evals/fixtures/discovery/slack/users/{userId}/messages/.create.example.json
evals/fixtures/discovery/slack/users/{userId}/messages/.schema.json
evals/harness.ts
evals/report.ts
evals/reports/.gitkeep
evals/runner.ts
evals/scenarios/index.ts
evals/scenarios/s01-slack-post.ts
evals/scenarios/s02-slack-dm.ts
evals/scenarios/s03-linear-create.ts
evals/scenarios/s04-linear-update.ts
evals/scenarios/s05-linear-comment.ts
evals/scenarios/s06-linear-delete.ts
evals/variants.ts
memory/INCIDENT-20260617T122713Z.md
package.json
src/main/integrations.test.ts
src/main/integrations.ts
src/main/ipc-handlers.test.ts
src/main/ipc-handlers.ts

coderabbitai · 2026-06-17T12:38:49Z

+// Re-export for convenience so callers don't need to import from two places
+export { snapshotMount, newMountFiles }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

What functions does @agent-relay/evals v8.8.2 export from the /scoring/mount subpath?

💡 Result:

The @agent-relay/evals package (part of the Agent Relay monorepo) exports several functions from its /scoring/mount subpath, which are utilized for evaluating agent interactions within isolated or configured environments [1]. Based on the recent development history for @agent-relay/evals (which tracks alongside the broader Agent Relay v8.x series), the /scoring/mount subpath provides access to logic handling fixture-backed and provider-backed mounts for evaluation suites [1][2]. While exact function signatures are subject to the specific internal implementation in v8.8.2, the exports generally include utilities for mount lifecycle and protocol validation, such as: * Mount lifecycle management helpers [1] * Stream cleaning and protocol verification tools [1] * Integration-specific check functions (e.g., toolcheck, phantom, and native-subagent protocols) [1] These exports are part of the shared evaluation harness designed to support deterministic VFS (Virtual File System) and ACL testing in agent scenarios [1][2]. For the most accurate and precise list of exported identifiers in v8.8.2, you should consult the packages/evals/package.json file within the official AgentWorkforce/relay repository, which defines these subpath exports, or inspect the src/scoring/mount/index.ts file in the same package [1].

Citations:

1: AgentWorkforce/relay@87c05f1

2: https://github.com/AgentWorkforce/relayfile/

🌐 Web query:

snapshotMount newMountFiles @agent-relay/evals v8.8.2

💡 Result:

The terms "snapshotMount" and "newMountFiles" are not standard public APIs or documented functions within the @agent-relay/evals package or the broader AgentWorkforce/relay ecosystem as of June 2026. Based on current documentation: 1. The @agent-relay/evals package is primarily a shared test harness package within the Relay monorepo, used for running deterministic evaluation suites and managing tool-based evaluations [1]. 2. Relayfile (the integration filesystem for AI agents) uses a different set of APIs for managing mounts, such as ensureMountedWorkspace and mountWorkspace [2][3][4]. 3. The names snapshotMount and newMountFiles likely represent internal identifiers, private methods, or custom utilities specific to a local implementation or a transient experimental branch within the AgentWorkforce/relay codebase, rather than stable public interfaces [1]. If you are encountering these terms in a stack trace or code, they are most likely internal implementation details within the @agent-relay/evals harness driver or a specific test case runner [1]. You should check the repository's source code directly at github.com/AgentWorkforce/relay for the most accurate context regarding these specific identifiers [1][5].

Citations:

1: AgentWorkforce/relay@87c05f1

2: https://github.com/AgentWorkforce/relayfile/

3: https://github.com/AgentWorkforce/relayfile/blob/main/docs/guides/post-auth-mount-session.md

4: https://agentrelay.com/file

5: https://github.com/AgentWorkforce/relay/

🌐 Web query:

@agent-relay/evals scoring mount index.ts snapshotMount newMountFiles

💡 Result:

In the @agent-relay/evals package, the terms snapshotMount and newMountFiles relate to the state management and file-system handling within the framework's evaluation harnesses [1]. The @agent-relay/evals package is part of the Agent Relay ecosystem [1]. Its scoring infrastructure includes integration tests and harness utilities designed to handle VFS (Virtual Filesystem) and mount-backed scenarios [1]. 1. snapshotMount: This refers to the mechanism used to capture the state of a filesystem mount, ensuring that evaluation cases can run deterministically [2]. By creating a snapshot, the system can verify filesystem changes, ACLs, and concurrency behaviors in an isolated, repeatable environment [2]. 2. newMountFiles: This typically represents the set of files or changes generated or modified during an evaluation run [1][3]. The framework manages these by comparing the state after an agent's execution against the initial snapshot [1]. The system uses these to perform diffing, validation, or to sync writable changes back to the source project after an evaluation or agent task is complete [3]. These functions are part of the broader Agent Relay strategy to provide agents with a reliable, file-based interface for interacting with integrated systems (like Notion, Linear, GitHub, etc.) [4], where complex API interactions are normalized into a local filesystem [5][4]. The evaluation harness specifically leverages these patterns to ensure that agent behaviors—such as reading, writing, and coordinating through these mounts—are correctly scored and verified [1][6].

Citations:

1: AgentWorkforce/relay@87c05f1

2: https://github.com/AgentWorkforce/relayfile/

3: https://npmx.dev/package/@relayfile/local-mount

4: https://agentrelay.com/file

5: https://github.com/AgentWorkforce/skills/blob/main/skills/setting-up-relayfile/SKILL.md

6: https://github.com/tangle-network/agent-eval

Fix the re-exports: snapshotMount and newMountFiles are not documented or standard public APIs in @agent-relay/evals v8.8.2.

The re-export at lines 54-55 references functions that appear to be internal/custom utilities rather than stable public exports from @agent-relay/evals/scoring/mount. These function names are not confirmed to exist as public APIs in v8.8.2. Verify that the source functions actually exist and are properly exported, or update the import path to the correct internal modules where these are defined.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/fixture.ts` around lines 54 - 55, The re-export statement is exporting snapshotMount and newMountFiles, but these functions cannot be verified as public APIs in the `@agent-relay/evals` v8.8.2 package. Verify that both snapshotMount and newMountFiles actually exist as exported functions in the `@agent-relay/evals/scoring/mount` module (or wherever they are currently imported from). If they don't exist as public exports, either locate where these utilities are actually defined and update the import path accordingly, or remove the re-exports if they are not intended for external use.

coderabbitai · 2026-06-17T12:38:49Z

+### reactions — `/slack/channels/{channelId}/messages/{messageTs}/reactions`
+
+- schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.schema.json`
+- create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.create.example.json`
+
+### replies — `/slack/channels/{channelId}/messages/{messageTs}/replies`
+
+- schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.schema.json`
+- create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.create.example.json`
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read-only verification: ensure every path referenced in the Slack adapter exists. set -euo pipefail adapter="evals/fixtures/discovery/slack/.adapter.md" # Extract backticked discovery paths and check existence under evals/fixtures. python - <<'PY' import re from pathlib import Path adapter = Path("evals/fixtures/discovery/slack/.adapter.md").read_text() paths = re.findall(r'`(discovery/slack/[^`]+)`', adapter) missing = [] for p in paths: candidate = Path("evals/fixtures") / p if not candidate.exists(): missing.append(str(candidate)) print("Referenced paths:") for p in paths: print(" -", p) if missing: print("\nMissing paths:") for m in missing: print(" -", m) raise SystemExit(1) print("\nAll referenced paths exist.") PY

Repository: AgentWorkforce/pear

Length of output: 1105

Add missing reactions and replies fixture files referenced in the adapter.

The adapter advertises reactions/replies resources with references to schema and example files, but verification confirms these fixtures do not exist:

evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.schema.json

evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.create.example.json

evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.schema.json

evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.create.example.json

Only the messages and users/{userId}/messages fixtures are present. Create the missing reactions and replies fixtures or remove their references from the adapter.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/fixtures/discovery/slack/.adapter.md` around lines 14 - 23, The adapter references reactions and replies resources with corresponding schema and create example fixture files, but these files do not exist in the repository. Either create the missing fixture files for the reactions and replies resources by adding .schema.json and .create.example.json files for each resource following the same structure as the existing messages fixtures, or remove the reactions and replies resource sections from the adapter documentation to ensure all advertised resources have their corresponding fixture files present.

coderabbitai · 2026-06-17T12:38:49Z

+  const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[]
+  const scenarios = scenarioArg
+    ? scenarioArg.split(',').map((id) => {
+        const s = scenarioById(id)
+        if (!s) throw new Error(`Unknown scenario: ${id}`)
+        return s
+      })
+    : SCENARIOS
+  const repeat = repeatArg ? parseInt(repeatArg, 10) : 3
+  const model = modelArg ?? undefined


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate --variant and --repeat instead of unsafe casting.

At Line 50 and Line 58, raw CLI input is accepted without validation (as Variant[], unchecked parseInt). Unknown variants silently flow through, and invalid repeats (NaN, 0, negative) can produce invalid/empty cells and misleading percentages.

Suggested fix

- const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[] + const parsedVariants = variantArg ? variantArg.split(',').map((v) => v.trim()) : [...VARIANTS] + const invalidVariants = parsedVariants.filter((v) => !VARIANTS.includes(v as Variant)) + if (invalidVariants.length) { + throw new Error(`Unknown variant(s): ${invalidVariants.join(', ')}. Valid: ${VARIANTS.join(', ')}`) + } + const variants = parsedVariants as Variant[] @@ - const repeat = repeatArg ? parseInt(repeatArg, 10) : 3 + const repeat = repeatArg ? Number.parseInt(repeatArg, 10) : 3 + if (!Number.isFinite(repeat) || repeat < 1) { + throw new Error(`--repeat must be a positive integer, received: ${repeatArg}`) + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[]

const scenarios = scenarioArg

? scenarioArg.split(',').map((id) => {

const s = scenarioById(id)

if (!s) throw new Error(`Unknown scenario: ${id}`)

return s

})

: SCENARIOS

const repeat = repeatArg ? parseInt(repeatArg, 10) : 3

const model = modelArg ?? undefined

const parsedVariants = variantArg ? variantArg.split(',').map((v) => v.trim()) : [...VARIANTS]

const invalidVariants = parsedVariants.filter((v) => !VARIANTS.includes(v as Variant))

if (invalidVariants.length) {

throw new Error(`Unknown variant(s): ${invalidVariants.join(', ')}. Valid: ${VARIANTS.join(', ')}`)

}

const variants = parsedVariants as Variant[]

const scenarios = scenarioArg

? scenarioArg.split(',').map((id) => {

const s = scenarioById(id)

if (!s) throw new Error(`Unknown scenario: ${id}`)

return s

})

: SCENARIOS

const repeat = repeatArg ? Number.parseInt(repeatArg, 10) : 3

if (!Number.isFinite(repeat) || repeat < 1) {

throw new Error(`--repeat must be a positive integer, received: ${repeatArg}`)

}

const model = modelArg ?? undefined

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@evals/runner.ts` around lines 50 - 59, The variants variable at line 50 uses unsafe casting (as Variant[]) without validating that the split variant strings are actually valid Variant types, and the repeat variable at line 58 uses parseInt without checking for invalid results like NaN, zero, or negative numbers. Validate each variant string against the known VARIANTS list similar to how scenarioById validates scenarios, throwing an error if an unknown variant is provided. For the repeat value, validate the parsed integer result to ensure it is a positive number greater than zero, throwing an error if the parsed value is invalid.

coderabbitai · 2026-06-17T12:38:50Z

+# relayfile mount-root invariant incident
+
+- timestamp: 20260617T122713Z
+- local root: /home/daytona/workspace/memory/workspace


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace hardcoded developer path with a generic placeholder.

Line 4 hardcodes /home/daytona/workspace/memory/workspace, which exposes a specific developer's username and filesystem structure. This reduces the reusability of the incident documentation for other developers and poses unnecessary path exposure. Replace it with a generic placeholder that reflects the computed mount root location.

💡 Suggested replacement

- local root: /home/daytona/workspace/memory/workspace + local root: ~/.agentworkforce/pear/relayfile/<workspace-id>/integrations

Alternatively, reference the integrationMountRootForWorkspace(workspaceId) function or a similar pattern from the runtime code to make the path generic and descriptive.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- local root: /home/daytona/workspace/memory/workspace

- local root: ~/.agentworkforce/pear/relayfile/<workspace-id>/integrations

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@memory/INCIDENT-20260617T122713Z.md` at line 4, The incident documentation file contains a hardcoded developer-specific filesystem path on line 4 that exposes the username "daytona" and filesystem structure. Replace the hardcoded path `/home/daytona/workspace/memory/workspace` with a generic placeholder that represents the computed mount root location, such as referencing the `integrationMountRootForWorkspace(workspaceId)` function pattern or a descriptive placeholder like `{computed_mount_root}/workspace` to make the documentation reusable across different developer environments without exposing personal filesystem details.

gemini-code-assist Bot reviewed Jun 17, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

khaliqgant and others added 4 commits June 17, 2026 13:50

chore: apply pr-reviewer fixes for #375

0518e02

coderabbitai Bot reviewed Jun 17, 2026

View reviewed changes

khaliqgant merged commit 835651d into main Jun 17, 2026
5 checks passed

khaliqgant deleted the ar-267-relayfile-sdk-bump branch June 17, 2026 17:51

coderabbitai Bot mentioned this pull request Jun 18, 2026

Build integrations-update from @agent-relay/integration-prompts + suppress idle integrations #380

Merged

	const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject']
	const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject', 'prescriptive']

	lines.push(` payload: {"text":"<message>","userId":"<id>"}`)
	lines.push(` payload: {"text":"<message>"}`)

	payload: {"text":"<message>","userId":"<id>"}
	payload: {"text":"<message>"}

		// Re-export for convenience so callers don't need to import from two places
		export { snapshotMount, newMountFiles }

	- local root: /home/daytona/workspace/memory/workspace
	- local root: ~/.agentworkforce/pear/relayfile/<workspace-id>/integrations

Conversation

khaliqgant commented Jun 17, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Eval results

Key implementation notes

Test plan

Uh oh!

coderabbitai Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khaliqgant commented Jun 17, 2026 •

edited by cubic-dev-ai Bot

Loading

coderabbitai Bot commented Jun 17, 2026 •

edited

Loading