Skip to content

feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs#375

Merged
khaliqgant merged 5 commits into
mainfrom
ar-267-relayfile-sdk-bump
Jun 17, 2026
Merged

feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs#375
khaliqgant merged 5 commits into
mainfrom
ar-267-relayfile-sdk-bump

Conversation

@khaliqgant

@khaliqgant khaliqgant commented Jun 17, 2026

Copy link
Copy Markdown
Member

Summary

  • Eval harness (evals/): spawns real agents via the broker in a temp fixture dir with a fake .integrations/ mount, scores whether the agent wrote the correct path + valid JSON. 6 scenarios × 5 variants, runnable with npm run eval.
  • prescriptiveSpawnInstructions() on IntegrationsManager: compact write-path lookup table derived from real mount data — same source as initialSpawnInstructions, different format. No discovery reads required from the model.
  • broker:spawn-agent routing: non-claude CLIs (opencode, etc.) get the prescriptive format; claude gets the existing narrative full-inject format.

Eval results

variant deepseek-v4-flash-free gpt-5.4-nano
bare 0% 0%
slim-inject 100% 100%
full-inject 100% 100%
prescriptive 100% (fastest) 100%

All free/Chinese/OpenAI models tested via opencode pass at 100% on prescriptive. bare always fails — no model self-discovers paths without guidance.

Key implementation notes

  • opencode headless transport: spawnCli({ transport: 'headless' }) + skipRelayPrompt: true (opencode has no agent-relay agent)
  • Default eval model: opencode/deepseek-v4-flash-free (free, ~18s/run)
  • Non-claude CLIs receive absolute fixture paths in the task prefix — opencode's cwd doesn't always match the project dir
  • recordSpawnInstructionDelivery guarded to narrative path only to avoid stale snippet cross-contamination

Test plan

  • npx vitest run src/main/integrations.test.ts — 54 tests pass (2 new for prescriptiveSpawnInstructions)
  • npm run eval -- --cli=opencode --variant=prescriptive --repeat=3 — 18/18
  • Spawn opencode agent in pear with a Slack/Linear integration connected — verify the prescriptive table appears in the task instead of the <integrations-update> block

🤖 Generated with Claude Code

Review in cubic

…on-claude CLIs

Adds an eval harness (evals/) that spawns real agents via the broker in a
fixture dir with a fake .integrations/ mount, then scores whether the agent
wrote to the correct path with a valid JSON payload. Covers 6 scenarios
(Slack channel/DM, Linear create/update/comment/delete) across 5 guidance
variants (bare, claude-md, slim-inject, full-inject, prescriptive).

Key findings from eval runs:
- `prescriptive` variant achieves 18/18 (100%) across all free and Chinese
  models (deepseek-v4-flash-free, mimo, nemotron, north-mini-code, gpt-5.4-nano,
  gpt-5.4-mini, gpt-5.1-codex-mini, gpt-5.5) — reliable for non-claude CLIs
- `full-inject` and `slim-inject` also reach 100% once absolute paths are
  injected for CLIs whose cwd doesn't match the project fixture dir
- `bare` fails universally — no model self-discovers integration paths

Harness changes:
- opencode uses `spawnCli({ transport: 'headless' })` + `skipRelayPrompt`
- Default opencode model is `opencode/deepseek-v4-flash-free` (free, fast)
- Non-claude CLIs receive absolute fixture paths in the task prefix so writes
  land in the correct temp dir regardless of CLI cwd detection

Production wiring:
- `IntegrationsManager.prescriptiveSpawnInstructions()`: derives the lookup
  table from real `writebackCommandMountPaths` — same data as
  `initialSpawnInstructions`, compact format instead of narrative prose
- `broker:spawn-agent` IPC handler routes `cli !== 'claude'` to prescriptive;
  `recordSpawnInstructionDelivery` guarded to narrative path only

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@khaliqgant, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 28 minutes and 20 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5d0ba5dc-ac8d-4a4f-af0f-065387c9c65f

📥 Commits

Reviewing files that changed from the base of the PR and between 0518e02 and eb1def0.

📒 Files selected for processing (2)
  • src/main/integrations.test.ts
  • src/main/integrations.ts
📝 Walkthrough

Walkthrough

Adds prescriptiveSpawnInstructions to IntegrationsManager and routes non-claude CLI agent spawns through it in the broker:spawn-agent IPC handler. Introduces a complete eval harness under evals/ with discovery fixture schemas, six mount scenarios, five task-prefix variants, a harness driver wrapper, a CLI runner with pass/fail scoring, and an HTML/JSON report generator.

Changes

Prescriptive Spawn Instructions (production)

Layer / File(s) Summary
IntegrationsManager.prescriptiveSpawnInstructions method and tests
src/main/integrations.ts, src/main/integrations.test.ts
Adds the public method that builds compact, provider-specific JSON write-path instruction strings for Slack, Linear, and generic providers, returning undefined when no integrations are visible; three new tests assert content, DM shape, and the undefined return.
IPC handler routing and delivery-tracking gate
src/main/ipc-handlers.ts, src/main/ipc-handlers.test.ts
Introduces a usePrescriptive flag in broker:spawn-agent derived from input.cli, switches between the two instruction generators, and gates recordSpawnInstructionDelivery to non-prescriptive spawns; two new tests verify non-claude vs claude branching.

Eval Harness Infrastructure

Layer / File(s) Summary
Discovery fixture schemas, examples, and adapter docs
evals/fixtures/discovery/linear/..., evals/fixtures/discovery/slack/...
Adds JSON Schemas, create/update/delete example JSON fixtures, and adapter markdown contracts for Linear issues (including comments) and Slack channel/DM messages, defining the writeback payload shapes agents must follow.
Fixture helper, variants, and scenario definitions
evals/fixture.ts, evals/variants.ts, evals/scenarios/*
createFixture scaffolds a temp directory with discovery schemas and writable provider paths; variants.ts defines five task-prefix variants; six MountScenario exports cover Slack post, Slack DM, and Linear create/update/comment/delete; scenarios/index.ts aggregates them.
Harness driver wrapper
evals/harness.ts
Caches a Relaycast workspace key, implements runEval which spawns a HarnessDriverClient and an agent (headless opencode or PTY), collects broker events with a 3-minute timeout, and returns { agentName, exit, events, durationMs }.
CLI runner, report generator, and package wiring
evals/runner.ts, evals/report.ts, package.json, .gitignore
runner.ts parses CLI flags, iterates scenario/variant/repeat cells, creates fixtures, runs evals, scores outputs, and prints a results table; report.ts writes timestamped JSON and HTML reports; package.json adds eval/eval:quick scripts, @agent-relay/evals, and tsx; .gitignore excludes report output files.

Incident Documentation

Layer / File(s) Summary
Mount-root invariant incident record
memory/INCIDENT-20260617T122713Z.md
Records a missing local mount root failure with recovery instructions for --reset-after-clobber / RELAYFILE_RESET_AFTER_CLOBBER=1.

Sequence Diagram

sequenceDiagram
  participant CLI as CLI (non-claude)
  participant IPC as broker:spawn-agent
  participant IM as IntegrationsManager
  participant HarnessDriver as HarnessDriverClient
  participant Agent as opencode / PTY agent
  participant Mount as .integrations/ filesystem

  CLI->>IPC: spawn-agent { cli, projectId, task }
  IPC->>IM: prescriptiveSpawnInstructions(projectId)
  IM-->>IPC: write-path instruction string
  IPC->>HarnessDriver: spawn agent with task + instructions
  HarnessDriver->>Agent: start headless/PTY process
  Agent->>Mount: write JSON file under .integrations/<provider>/
  Mount-->>HarnessDriver: file system event
  HarnessDriver-->>IPC: AgentExitInfo + BrokerEvents
  IPC-->>CLI: EvalRunResult { exit, events, durationMs }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Poem

🐇 Hoppity-hop through the mount's JSON halls,
I wrote all my fixtures, my schemas, my calls.
Prescriptive instructions now guide every spawn—
Not Claude? Here's the path, write your JSON and move on!
Reports in HTML, pass/fail in a row,
The rabbit says: run eval:quick and watch the scores flow. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: introducing integration mount evaluation infrastructure and adaptive routing of prescriptive spawn instructions for non-Claude CLIs.
Description check ✅ Passed The description thoroughly documents the eval harness, prescriptive spawn instructions feature, broker routing logic, evaluation results, implementation notes, and test plan—all directly related to the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ar-267-relayfile-sdk-bump

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new 'prescriptive' onboarding variant for integration mount evaluations, designed to provide a compact, explicit write-path lookup table with exact paths and minimal payload schemas for non-Claude CLIs. Key feedback on these changes includes: adding the missing 'prescriptive' variant to the VARIANT_ORDER array in evals/report.ts to fix HTML report sorting; removing a redundant explanation about <channelDir> in prescriptiveSpawnInstructions since the path is already fully resolved; and removing the redundant userId field from the Slack DM payload templates in both src/main/integrations.ts and evals/variants.ts to align with the actual schema.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread evals/report.ts Outdated
console.log(`HTML report: ${htmlPath}`)
}

const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject']

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The newly introduced prescriptive variant is missing from the VARIANT_ORDER array. This causes the prescriptive column to be sorted incorrectly (at the very beginning, before bare) in the generated HTML report. Adding it to the end of the array maintains the intended order from lightest to heaviest guidance.

Suggested change
const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject']
const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject', 'prescriptive']

Comment thread src/main/integrations.ts Outdated
Comment on lines +2312 to +2314
lines.push(` Slack channel message → ${p}/<name>.json`)
lines.push(` <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)`)
lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In prescriptiveSpawnInstructions, the path p is already a fully resolved, concrete path (e.g., .integrations/slack/channels/C12345__general/messages). It does not contain the <channelDir> placeholder. Therefore, the explanation line about <channelDir> is redundant and potentially confusing to the agent, and should be removed.

Suggested change
lines.push(` Slack channel message → ${p}/<name>.json`)
lines.push(` <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)`)
lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`)
lines.push(` Slack channel message → ${p}/<name>.json`)
lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`)

Comment thread src/main/integrations.ts Outdated
if (dmPaths.length > 0 || channelPaths.length === 0) {
const dmBase = dmPaths[0] ?? `${PROJECT_INTEGRATIONS_LINK_NAME}/slack/users/<userId>/messages`
lines.push(` Slack DM → ${dmBase}/<name>.json`)
lines.push(` payload: {"text":"<message>","userId":"<id>"}`)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Slack direct message schema (evals/fixtures/discovery/slack/users/{userId}/messages/.schema.json) and its create example do not contain a userId property in the payload, as the user ID is already determined from the path. Specifying "userId":"<id>" in the payload description is redundant and deviates from the schema.

Suggested change
lines.push(` payload: {"text":"<message>","userId":"<id>"}`)
lines.push(` payload: {"text":"<message>"}`)

Comment thread evals/variants.ts Outdated
<channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)
payload: {"text":"<message>","channelId":"<channelId>"}
Slack DM → ${base}/slack/users/<userId>/messages/<name>.json
payload: {"text":"<message>","userId":"<id>"}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To keep the prescriptive instructions consistent with the Slack DM schema and create example (which do not use a userId property in the payload), the "userId":"<id>" field should be removed from the payload template.

Suggested change
payload: {"text":"<message>","userId":"<id>"}
payload: {"text":"<message>"}

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdfda7cbee

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/main/integrations.ts Outdated
lines.push(` payload: {"id":"<issueId>","_action":"update",<fields to change>}`)
lines.push(` Linear delete issue → ${issueBase}/<name>.json`)
lines.push(` payload: {"id":"<issueId>","_action":"delete"}`)
lines.push(` Linear comment → ${issueBase}/<issueId>/comments/<name>.json`)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use the canonical Linear issue file for comments

For non-Claude agents with a Linear integration, this instruction is the exact path they are told to use, but Linear comment writeback is keyed off the canonical issue resource filename returned from /linear/issues (for example KEY-123__uuid.json), not an arbitrary <issueId> directory. The existing linearIssueCommentRemotePath helper only accepts /linear/issues/<identifier>__<uuid>.json/comments/... and rejects UUID-only issue paths, so agents following this prompt can write files that the local mount accepts but that do not create visible Linear comments.

Useful? React with 👍 / 👎.

Comment thread package.json Outdated
"protobufjs": "8.5.0"
},
"devDependencies": {
"@agent-relay/evals": "file:../relay/packages/evals",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace the local evals file dependency

This makes every fresh Pear install depend on a sibling checkout at ../relay/packages/evals; in a standalone clone or CI workspace without that directory, npm cannot resolve @agent-relay/evals, so the newly added eval scripts and even a normal dev install fail before they can run. Since the eval files import this package directly, this needs to be a published/versioned dependency or vendored into this repo rather than a machine-local path.

Useful? React with 👍 / 👎.

khaliqgant and others added 4 commits June 17, 2026 13:50
CI fix:
- ipc-handlers.test.ts mocked integrationsManager was missing
  prescriptiveSpawnInstructions/recordSpawnInstructionDelivery, so the
  broker:spawn-agent test (cli: 'codex') threw "not a function". Added the
  mocks plus focused routing tests (non-claude → prescriptive + no delivery
  record; claude → narrative + delivery record).

Review feedback:
- package.json: depend on published @agent-relay/evals@^8.8.2 instead of
  file:../relay/packages/evals so fresh clones / CI can resolve it (codex P2)
- integrations.ts: Linear comment path now references the canonical issue
  resource file (<KEY>-<num>__<uuid>.json) instead of a bare <issueId> dir —
  the local mount's linearIssueCommentRemotePath rejects UUID-less paths, so
  the old instruction produced files that never became visible comments (codex P1)
- integrations.ts + variants.ts: drop redundant "userId" from the Slack DM
  payload (path-derived; matches the discovery schema) (gemini)
- integrations.ts: remove the stale <channelDir> note — the emitted path is
  already concrete (gemini)
- report.ts: add 'prescriptive' to VARIANT_ORDER so the HTML report column
  sorts last instead of first (gemini)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ter-driven

Removes the hardcoded per-provider branches (and the interim curated payload
map) from prescriptiveSpawnInstructions. Writable resources + path templates
now come entirely from each provider's discovery `.adapter.md` (shipped by
relayfile-adapters), and payload shape is pointed at that resource's discovery
`.create.example.json`. No per-provider knowledge lives in pear, so a new
integration works with zero code change here.

- Parse the adapter doc's "Writable resources" section (provider-agnostic)
- Resolve each resource's concrete, in-scope path from the integration's
  writeback mount roots; preserve {id} placeholders for nested resources
- Point at the adapter's create example for fields instead of inlining payloads
- Graceful fallback to a discovery pointer when the adapter doc isn't mounted

Known gap tracked upstream: the local mount currently serves discovery inferred
from synced read records (which omits required write fields like Slack's
`text`), so the pointed-at example is imperfect until that's fixed —
AgentWorkforce/relayfile#299. The adapters already publish correct write-shaped
discovery; the fix belongs in the mount/sync pipeline, not pear.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agent-relay-code[bot] pushed an unrelated mount-root "incident" note (from its
own Daytona sandbox) onto this PR; it references a non-existent doc and has
nothing to do with the prescriptive-spawn/evals change. Removing it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
src/main/integrations.test.ts (1)

1543-1568: ⚡ Quick win

Add a prescriptive DM-path test for /slack/dms/<id>/messages.

Current coverage validates only the /users/ DM variant. Add a sibling case for a /dms/ mount to lock the intended routing and prevent fallback-path regressions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/integrations.test.ts` around lines 1543 - 1568, Add a new sibling
test case after the existing test 'emits a text-only Slack DM payload when a
user mount is present'. The new test should follow the same structure and
assertions but use a mountPath with the `/slack/dms/<id>/messages` variant
instead of `/slack/users/U67890EVAL/messages` to ensure both DM path formats are
properly validated and prevent regressions. Verify that the IntegrationsManager
correctly generates the prescriptive spawn instructions for the DMs path
variant, including the text-only payload structure and absence of userId field.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@evals/fixture.ts`:
- Around line 54-55: The re-export statement is exporting snapshotMount and
newMountFiles, but these functions cannot be verified as public APIs in the
`@agent-relay/evals` v8.8.2 package. Verify that both snapshotMount and
newMountFiles actually exist as exported functions in the
`@agent-relay/evals/scoring/mount` module (or wherever they are currently imported
from). If they don't exist as public exports, either locate where these
utilities are actually defined and update the import path accordingly, or remove
the re-exports if they are not intended for external use.

In `@evals/fixtures/discovery/slack/.adapter.md`:
- Around line 14-23: The adapter references reactions and replies resources with
corresponding schema and create example fixture files, but these files do not
exist in the repository. Either create the missing fixture files for the
reactions and replies resources by adding .schema.json and .create.example.json
files for each resource following the same structure as the existing messages
fixtures, or remove the reactions and replies resource sections from the adapter
documentation to ensure all advertised resources have their corresponding
fixture files present.

In `@evals/runner.ts`:
- Around line 50-59: The variants variable at line 50 uses unsafe casting (as
Variant[]) without validating that the split variant strings are actually valid
Variant types, and the repeat variable at line 58 uses parseInt without checking
for invalid results like NaN, zero, or negative numbers. Validate each variant
string against the known VARIANTS list similar to how scenarioById validates
scenarios, throwing an error if an unknown variant is provided. For the repeat
value, validate the parsed integer result to ensure it is a positive number
greater than zero, throwing an error if the parsed value is invalid.

In `@memory/INCIDENT-20260617T122713Z.md`:
- Line 4: The incident documentation file contains a hardcoded
developer-specific filesystem path on line 4 that exposes the username "daytona"
and filesystem structure. Replace the hardcoded path
`/home/daytona/workspace/memory/workspace` with a generic placeholder that
represents the computed mount root location, such as referencing the
`integrationMountRootForWorkspace(workspaceId)` function pattern or a
descriptive placeholder like `{computed_mount_root}/workspace` to make the
documentation reusable across different developer environments without exposing
personal filesystem details.
- Around line 17-18: The documentation reference in INCIDENT-20260617T122713Z.md
on lines 17-18 points to a non-existent file
`docs/architecture/mount-invariants.md` and the `docs/architecture` directory
does not exist. Either create the missing documentation file with the
appropriate content about protected invariants and recovery procedures, or
remove the broken reference from lines 17-18 of the incident report. Choose the
option that aligns with your documentation strategy.

In `@src/main/integrations.ts`:
- Around line 2310-2319: The dmPaths filter at line 2310 only checks for paths
containing /users/, which misses DM paths using the /dms/ pattern in the Slack
mount. Update the filter to also match paths that include /dms/ so that when the
connected Slack mount resolves to .integrations/slack/dms/<id>/messages, it is
properly detected as a DM path instead of falling back to the incorrect
/users/<userId>/messages path. Modify the filter predicate on writebackPaths to
check for both /users/ and /dms/ patterns.

---

Nitpick comments:
In `@src/main/integrations.test.ts`:
- Around line 1543-1568: Add a new sibling test case after the existing test
'emits a text-only Slack DM payload when a user mount is present'. The new test
should follow the same structure and assertions but use a mountPath with the
`/slack/dms/<id>/messages` variant instead of `/slack/users/U67890EVAL/messages`
to ensure both DM path formats are properly validated and prevent regressions.
Verify that the IntegrationsManager correctly generates the prescriptive spawn
instructions for the DMs path variant, including the text-only payload structure
and absence of userId field.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 85145ef4-22cd-4785-ab14-2b9c1e2512ac

📥 Commits

Reviewing files that changed from the base of the PR and between 5e2489f and 0518e02.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (32)
  • .gitignore
  • evals/fixture.ts
  • evals/fixtures/discovery/linear/.adapter.md
  • evals/fixtures/discovery/linear/issues/.create.example.json
  • evals/fixtures/discovery/linear/issues/.delete.example.json
  • evals/fixtures/discovery/linear/issues/.schema.json
  • evals/fixtures/discovery/linear/issues/.update.example.json
  • evals/fixtures/discovery/linear/issues/{issueId}/comments/.create.example.json
  • evals/fixtures/discovery/linear/issues/{issueId}/comments/.schema.json
  • evals/fixtures/discovery/slack/.adapter.md
  • evals/fixtures/discovery/slack/channels/{channelId}/messages/.create.example.json
  • evals/fixtures/discovery/slack/channels/{channelId}/messages/.schema.json
  • evals/fixtures/discovery/slack/users/{userId}/messages/.create.example.json
  • evals/fixtures/discovery/slack/users/{userId}/messages/.schema.json
  • evals/harness.ts
  • evals/report.ts
  • evals/reports/.gitkeep
  • evals/runner.ts
  • evals/scenarios/index.ts
  • evals/scenarios/s01-slack-post.ts
  • evals/scenarios/s02-slack-dm.ts
  • evals/scenarios/s03-linear-create.ts
  • evals/scenarios/s04-linear-update.ts
  • evals/scenarios/s05-linear-comment.ts
  • evals/scenarios/s06-linear-delete.ts
  • evals/variants.ts
  • memory/INCIDENT-20260617T122713Z.md
  • package.json
  • src/main/integrations.test.ts
  • src/main/integrations.ts
  • src/main/ipc-handlers.test.ts
  • src/main/ipc-handlers.ts

Comment thread evals/fixture.ts
Comment on lines +54 to +55
// Re-export for convenience so callers don't need to import from two places
export { snapshotMount, newMountFiles }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

What functions does @agent-relay/evals v8.8.2 export from the /scoring/mount subpath?

💡 Result:

The @agent-relay/evals package (part of the Agent Relay monorepo) exports several functions from its /scoring/mount subpath, which are utilized for evaluating agent interactions within isolated or configured environments [1]. Based on the recent development history for @agent-relay/evals (which tracks alongside the broader Agent Relay v8.x series), the /scoring/mount subpath provides access to logic handling fixture-backed and provider-backed mounts for evaluation suites [1][2]. While exact function signatures are subject to the specific internal implementation in v8.8.2, the exports generally include utilities for mount lifecycle and protocol validation, such as: * Mount lifecycle management helpers [1] * Stream cleaning and protocol verification tools [1] * Integration-specific check functions (e.g., toolcheck, phantom, and native-subagent protocols) [1] These exports are part of the shared evaluation harness designed to support deterministic VFS (Virtual File System) and ACL testing in agent scenarios [1][2]. For the most accurate and precise list of exported identifiers in v8.8.2, you should consult the packages/evals/package.json file within the official AgentWorkforce/relay repository, which defines these subpath exports, or inspect the src/scoring/mount/index.ts file in the same package [1].

Citations:


🌐 Web query:

snapshotMount newMountFiles @agent-relay/evals v8.8.2

💡 Result:

The terms "snapshotMount" and "newMountFiles" are not standard public APIs or documented functions within the @agent-relay/evals package or the broader AgentWorkforce/relay ecosystem as of June 2026. Based on current documentation: 1. The @agent-relay/evals package is primarily a shared test harness package within the Relay monorepo, used for running deterministic evaluation suites and managing tool-based evaluations [1]. 2. Relayfile (the integration filesystem for AI agents) uses a different set of APIs for managing mounts, such as ensureMountedWorkspace and mountWorkspace [2][3][4]. 3. The names snapshotMount and newMountFiles likely represent internal identifiers, private methods, or custom utilities specific to a local implementation or a transient experimental branch within the AgentWorkforce/relay codebase, rather than stable public interfaces [1]. If you are encountering these terms in a stack trace or code, they are most likely internal implementation details within the @agent-relay/evals harness driver or a specific test case runner [1]. You should check the repository's source code directly at github.com/AgentWorkforce/relay for the most accurate context regarding these specific identifiers [1][5].

Citations:


🌐 Web query:

@agent-relay/evals scoring mount index.ts snapshotMount newMountFiles

💡 Result:

In the @agent-relay/evals package, the terms snapshotMount and newMountFiles relate to the state management and file-system handling within the framework's evaluation harnesses [1]. The @agent-relay/evals package is part of the Agent Relay ecosystem [1]. Its scoring infrastructure includes integration tests and harness utilities designed to handle VFS (Virtual Filesystem) and mount-backed scenarios [1]. 1. snapshotMount: This refers to the mechanism used to capture the state of a filesystem mount, ensuring that evaluation cases can run deterministically [2]. By creating a snapshot, the system can verify filesystem changes, ACLs, and concurrency behaviors in an isolated, repeatable environment [2]. 2. newMountFiles: This typically represents the set of files or changes generated or modified during an evaluation run [1][3]. The framework manages these by comparing the state after an agent's execution against the initial snapshot [1]. The system uses these to perform diffing, validation, or to sync writable changes back to the source project after an evaluation or agent task is complete [3]. These functions are part of the broader Agent Relay strategy to provide agents with a reliable, file-based interface for interacting with integrated systems (like Notion, Linear, GitHub, etc.) [4], where complex API interactions are normalized into a local filesystem [5][4]. The evaluation harness specifically leverages these patterns to ensure that agent behaviors—such as reading, writing, and coordinating through these mounts—are correctly scored and verified [1][6].

Citations:


Fix the re-exports: snapshotMount and newMountFiles are not documented or standard public APIs in @agent-relay/evals v8.8.2.

The re-export at lines 54-55 references functions that appear to be internal/custom utilities rather than stable public exports from @agent-relay/evals/scoring/mount. These function names are not confirmed to exist as public APIs in v8.8.2. Verify that the source functions actually exist and are properly exported, or update the import path to the correct internal modules where these are defined.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/fixture.ts` around lines 54 - 55, The re-export statement is exporting
snapshotMount and newMountFiles, but these functions cannot be verified as
public APIs in the `@agent-relay/evals` v8.8.2 package. Verify that both
snapshotMount and newMountFiles actually exist as exported functions in the
`@agent-relay/evals/scoring/mount` module (or wherever they are currently imported
from). If they don't exist as public exports, either locate where these
utilities are actually defined and update the import path accordingly, or remove
the re-exports if they are not intended for external use.

Comment on lines +14 to +23
### reactions — `/slack/channels/{channelId}/messages/{messageTs}/reactions`

- schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.schema.json`
- create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.create.example.json`

### replies — `/slack/channels/{channelId}/messages/{messageTs}/replies`

- schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.schema.json`
- create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.create.example.json`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only verification: ensure every path referenced in the Slack adapter exists.
set -euo pipefail

adapter="evals/fixtures/discovery/slack/.adapter.md"

# Extract backticked discovery paths and check existence under evals/fixtures.
python - <<'PY'
import re
from pathlib import Path

adapter = Path("evals/fixtures/discovery/slack/.adapter.md").read_text()
paths = re.findall(r'`(discovery/slack/[^`]+)`', adapter)

missing = []
for p in paths:
    candidate = Path("evals/fixtures") / p
    if not candidate.exists():
        missing.append(str(candidate))

print("Referenced paths:")
for p in paths:
    print(" -", p)

if missing:
    print("\nMissing paths:")
    for m in missing:
        print(" -", m)
    raise SystemExit(1)

print("\nAll referenced paths exist.")
PY

Repository: AgentWorkforce/pear

Length of output: 1105


Add missing reactions and replies fixture files referenced in the adapter.

The adapter advertises reactions/replies resources with references to schema and example files, but verification confirms these fixtures do not exist:

  • evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.schema.json
  • evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.create.example.json
  • evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.schema.json
  • evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.create.example.json

Only the messages and users/{userId}/messages fixtures are present. Create the missing reactions and replies fixtures or remove their references from the adapter.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/fixtures/discovery/slack/.adapter.md` around lines 14 - 23, The adapter
references reactions and replies resources with corresponding schema and create
example fixture files, but these files do not exist in the repository. Either
create the missing fixture files for the reactions and replies resources by
adding .schema.json and .create.example.json files for each resource following
the same structure as the existing messages fixtures, or remove the reactions
and replies resource sections from the adapter documentation to ensure all
advertised resources have their corresponding fixture files present.

Comment thread evals/runner.ts
Comment on lines +50 to +59
const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[]
const scenarios = scenarioArg
? scenarioArg.split(',').map((id) => {
const s = scenarioById(id)
if (!s) throw new Error(`Unknown scenario: ${id}`)
return s
})
: SCENARIOS
const repeat = repeatArg ? parseInt(repeatArg, 10) : 3
const model = modelArg ?? undefined

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate --variant and --repeat instead of unsafe casting.

At Line 50 and Line 58, raw CLI input is accepted without validation (as Variant[], unchecked parseInt). Unknown variants silently flow through, and invalid repeats (NaN, 0, negative) can produce invalid/empty cells and misleading percentages.

Suggested fix
-  const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[]
+  const parsedVariants = variantArg ? variantArg.split(',').map((v) => v.trim()) : [...VARIANTS]
+  const invalidVariants = parsedVariants.filter((v) => !VARIANTS.includes(v as Variant))
+  if (invalidVariants.length) {
+    throw new Error(`Unknown variant(s): ${invalidVariants.join(', ')}. Valid: ${VARIANTS.join(', ')}`)
+  }
+  const variants = parsedVariants as Variant[]
@@
-  const repeat = repeatArg ? parseInt(repeatArg, 10) : 3
+  const repeat = repeatArg ? Number.parseInt(repeatArg, 10) : 3
+  if (!Number.isFinite(repeat) || repeat < 1) {
+    throw new Error(`--repeat must be a positive integer, received: ${repeatArg}`)
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[]
const scenarios = scenarioArg
? scenarioArg.split(',').map((id) => {
const s = scenarioById(id)
if (!s) throw new Error(`Unknown scenario: ${id}`)
return s
})
: SCENARIOS
const repeat = repeatArg ? parseInt(repeatArg, 10) : 3
const model = modelArg ?? undefined
const parsedVariants = variantArg ? variantArg.split(',').map((v) => v.trim()) : [...VARIANTS]
const invalidVariants = parsedVariants.filter((v) => !VARIANTS.includes(v as Variant))
if (invalidVariants.length) {
throw new Error(`Unknown variant(s): ${invalidVariants.join(', ')}. Valid: ${VARIANTS.join(', ')}`)
}
const variants = parsedVariants as Variant[]
const scenarios = scenarioArg
? scenarioArg.split(',').map((id) => {
const s = scenarioById(id)
if (!s) throw new Error(`Unknown scenario: ${id}`)
return s
})
: SCENARIOS
const repeat = repeatArg ? Number.parseInt(repeatArg, 10) : 3
if (!Number.isFinite(repeat) || repeat < 1) {
throw new Error(`--repeat must be a positive integer, received: ${repeatArg}`)
}
const model = modelArg ?? undefined
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@evals/runner.ts` around lines 50 - 59, The variants variable at line 50 uses
unsafe casting (as Variant[]) without validating that the split variant strings
are actually valid Variant types, and the repeat variable at line 58 uses
parseInt without checking for invalid results like NaN, zero, or negative
numbers. Validate each variant string against the known VARIANTS list similar to
how scenarioById validates scenarios, throwing an error if an unknown variant is
provided. For the repeat value, validate the parsed integer result to ensure it
is a positive number greater than zero, throwing an error if the parsed value is
invalid.

Comment thread memory/INCIDENT-20260617T122713Z.md Outdated
# relayfile mount-root invariant incident

- timestamp: 20260617T122713Z
- local root: /home/daytona/workspace/memory/workspace

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Replace hardcoded developer path with a generic placeholder.

Line 4 hardcodes /home/daytona/workspace/memory/workspace, which exposes a specific developer's username and filesystem structure. This reduces the reusability of the incident documentation for other developers and poses unnecessary path exposure. Replace it with a generic placeholder that reflects the computed mount root location.

💡 Suggested replacement
- local root: /home/daytona/workspace/memory/workspace
+ local root: ~/.agentworkforce/pear/relayfile/<workspace-id>/integrations

Alternatively, reference the integrationMountRootForWorkspace(workspaceId) function or a similar pattern from the runtime code to make the path generic and descriptive.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- local root: /home/daytona/workspace/memory/workspace
- local root: ~/.agentworkforce/pear/relayfile/<workspace-id>/integrations
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@memory/INCIDENT-20260617T122713Z.md` at line 4, The incident documentation
file contains a hardcoded developer-specific filesystem path on line 4 that
exposes the username "daytona" and filesystem structure. Replace the hardcoded
path `/home/daytona/workspace/memory/workspace` with a generic placeholder that
represents the computed mount root location, such as referencing the
`integrationMountRootForWorkspace(workspaceId)` function pattern or a
descriptive placeholder like `{computed_mount_root}/workspace` to make the
documentation reusable across different developer environments without exposing
personal filesystem details.

Comment thread memory/INCIDENT-20260617T122713Z.md Outdated
Comment thread src/main/integrations.ts Outdated
@khaliqgant khaliqgant merged commit 835651d into main Jun 17, 2026
5 checks passed
@khaliqgant khaliqgant deleted the ar-267-relayfile-sdk-bump branch June 17, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant