feat: integration mount evals + prescriptive spawn instructions for non-claude CLIs#375
Conversation
…on-claude CLIs
Adds an eval harness (evals/) that spawns real agents via the broker in a
fixture dir with a fake .integrations/ mount, then scores whether the agent
wrote to the correct path with a valid JSON payload. Covers 6 scenarios
(Slack channel/DM, Linear create/update/comment/delete) across 5 guidance
variants (bare, claude-md, slim-inject, full-inject, prescriptive).
Key findings from eval runs:
- `prescriptive` variant achieves 18/18 (100%) across all free and Chinese
models (deepseek-v4-flash-free, mimo, nemotron, north-mini-code, gpt-5.4-nano,
gpt-5.4-mini, gpt-5.1-codex-mini, gpt-5.5) — reliable for non-claude CLIs
- `full-inject` and `slim-inject` also reach 100% once absolute paths are
injected for CLIs whose cwd doesn't match the project fixture dir
- `bare` fails universally — no model self-discovers integration paths
Harness changes:
- opencode uses `spawnCli({ transport: 'headless' })` + `skipRelayPrompt`
- Default opencode model is `opencode/deepseek-v4-flash-free` (free, fast)
- Non-claude CLIs receive absolute fixture paths in the task prefix so writes
land in the correct temp dir regardless of CLI cwd detection
Production wiring:
- `IntegrationsManager.prescriptiveSpawnInstructions()`: derives the lookup
table from real `writebackCommandMountPaths` — same data as
`initialSpawnInstructions`, compact format instead of narrative prose
- `broker:spawn-agent` IPC handler routes `cli !== 'claude'` to prescriptive;
`recordSpawnInstructionDelivery` guarded to narrative path only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 28 minutes and 20 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds ChangesPrescriptive Spawn Instructions (production)
Eval Harness Infrastructure
Incident Documentation
Sequence DiagramsequenceDiagram
participant CLI as CLI (non-claude)
participant IPC as broker:spawn-agent
participant IM as IntegrationsManager
participant HarnessDriver as HarnessDriverClient
participant Agent as opencode / PTY agent
participant Mount as .integrations/ filesystem
CLI->>IPC: spawn-agent { cli, projectId, task }
IPC->>IM: prescriptiveSpawnInstructions(projectId)
IM-->>IPC: write-path instruction string
IPC->>HarnessDriver: spawn agent with task + instructions
HarnessDriver->>Agent: start headless/PTY process
Agent->>Mount: write JSON file under .integrations/<provider>/
Mount-->>HarnessDriver: file system event
HarnessDriver-->>IPC: AgentExitInfo + BrokerEvents
IPC-->>CLI: EvalRunResult { exit, events, durationMs }
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new 'prescriptive' onboarding variant for integration mount evaluations, designed to provide a compact, explicit write-path lookup table with exact paths and minimal payload schemas for non-Claude CLIs. Key feedback on these changes includes: adding the missing 'prescriptive' variant to the VARIANT_ORDER array in evals/report.ts to fix HTML report sorting; removing a redundant explanation about <channelDir> in prescriptiveSpawnInstructions since the path is already fully resolved; and removing the redundant userId field from the Slack DM payload templates in both src/main/integrations.ts and evals/variants.ts to align with the actual schema.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| console.log(`HTML report: ${htmlPath}`) | ||
| } | ||
|
|
||
| const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject'] |
There was a problem hiding this comment.
The newly introduced prescriptive variant is missing from the VARIANT_ORDER array. This causes the prescriptive column to be sorted incorrectly (at the very beginning, before bare) in the generated HTML report. Adding it to the end of the array maintains the intended order from lightest to heaviest guidance.
| const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject'] | |
| const VARIANT_ORDER = ['bare', 'claude-md', 'slim-inject', 'full-inject', 'prescriptive'] |
| lines.push(` Slack channel message → ${p}/<name>.json`) | ||
| lines.push(` <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)`) | ||
| lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`) |
There was a problem hiding this comment.
In prescriptiveSpawnInstructions, the path p is already a fully resolved, concrete path (e.g., .integrations/slack/channels/C12345__general/messages). It does not contain the <channelDir> placeholder. Therefore, the explanation line about <channelDir> is redundant and potentially confusing to the agent, and should be removed.
| lines.push(` Slack channel message → ${p}/<name>.json`) | |
| lines.push(` <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>)`) | |
| lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`) | |
| lines.push(` Slack channel message → ${p}/<name>.json`) | |
| lines.push(` payload: {"text":"<message>","channelId":"<channelId>"}`) |
| if (dmPaths.length > 0 || channelPaths.length === 0) { | ||
| const dmBase = dmPaths[0] ?? `${PROJECT_INTEGRATIONS_LINK_NAME}/slack/users/<userId>/messages` | ||
| lines.push(` Slack DM → ${dmBase}/<name>.json`) | ||
| lines.push(` payload: {"text":"<message>","userId":"<id>"}`) |
There was a problem hiding this comment.
The Slack direct message schema (evals/fixtures/discovery/slack/users/{userId}/messages/.schema.json) and its create example do not contain a userId property in the payload, as the user ID is already determined from the path. Specifying "userId":"<id>" in the payload description is redundant and deviates from the schema.
| lines.push(` payload: {"text":"<message>","userId":"<id>"}`) | |
| lines.push(` payload: {"text":"<message>"}`) |
| <channelDir> is the mount directory name provided in the task (format: <channelId>__<slug>) | ||
| payload: {"text":"<message>","channelId":"<channelId>"} | ||
| Slack DM → ${base}/slack/users/<userId>/messages/<name>.json | ||
| payload: {"text":"<message>","userId":"<id>"} |
There was a problem hiding this comment.
To keep the prescriptive instructions consistent with the Slack DM schema and create example (which do not use a userId property in the payload), the "userId":"<id>" field should be removed from the payload template.
| payload: {"text":"<message>","userId":"<id>"} | |
| payload: {"text":"<message>"} |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fdfda7cbee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| lines.push(` payload: {"id":"<issueId>","_action":"update",<fields to change>}`) | ||
| lines.push(` Linear delete issue → ${issueBase}/<name>.json`) | ||
| lines.push(` payload: {"id":"<issueId>","_action":"delete"}`) | ||
| lines.push(` Linear comment → ${issueBase}/<issueId>/comments/<name>.json`) |
There was a problem hiding this comment.
Use the canonical Linear issue file for comments
For non-Claude agents with a Linear integration, this instruction is the exact path they are told to use, but Linear comment writeback is keyed off the canonical issue resource filename returned from /linear/issues (for example KEY-123__uuid.json), not an arbitrary <issueId> directory. The existing linearIssueCommentRemotePath helper only accepts /linear/issues/<identifier>__<uuid>.json/comments/... and rejects UUID-only issue paths, so agents following this prompt can write files that the local mount accepts but that do not create visible Linear comments.
Useful? React with 👍 / 👎.
| "protobufjs": "8.5.0" | ||
| }, | ||
| "devDependencies": { | ||
| "@agent-relay/evals": "file:../relay/packages/evals", |
There was a problem hiding this comment.
Replace the local evals file dependency
This makes every fresh Pear install depend on a sibling checkout at ../relay/packages/evals; in a standalone clone or CI workspace without that directory, npm cannot resolve @agent-relay/evals, so the newly added eval scripts and even a normal dev install fail before they can run. Since the eval files import this package directly, this needs to be a published/versioned dependency or vendored into this repo rather than a machine-local path.
Useful? React with 👍 / 👎.
CI fix: - ipc-handlers.test.ts mocked integrationsManager was missing prescriptiveSpawnInstructions/recordSpawnInstructionDelivery, so the broker:spawn-agent test (cli: 'codex') threw "not a function". Added the mocks plus focused routing tests (non-claude → prescriptive + no delivery record; claude → narrative + delivery record). Review feedback: - package.json: depend on published @agent-relay/evals@^8.8.2 instead of file:../relay/packages/evals so fresh clones / CI can resolve it (codex P2) - integrations.ts: Linear comment path now references the canonical issue resource file (<KEY>-<num>__<uuid>.json) instead of a bare <issueId> dir — the local mount's linearIssueCommentRemotePath rejects UUID-less paths, so the old instruction produced files that never became visible comments (codex P1) - integrations.ts + variants.ts: drop redundant "userId" from the Slack DM payload (path-derived; matches the discovery schema) (gemini) - integrations.ts: remove the stale <channelDir> note — the emitted path is already concrete (gemini) - report.ts: add 'prescriptive' to VARIANT_ORDER so the HTML report column sorts last instead of first (gemini) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ter-driven
Removes the hardcoded per-provider branches (and the interim curated payload
map) from prescriptiveSpawnInstructions. Writable resources + path templates
now come entirely from each provider's discovery `.adapter.md` (shipped by
relayfile-adapters), and payload shape is pointed at that resource's discovery
`.create.example.json`. No per-provider knowledge lives in pear, so a new
integration works with zero code change here.
- Parse the adapter doc's "Writable resources" section (provider-agnostic)
- Resolve each resource's concrete, in-scope path from the integration's
writeback mount roots; preserve {id} placeholders for nested resources
- Point at the adapter's create example for fields instead of inlining payloads
- Graceful fallback to a discovery pointer when the adapter doc isn't mounted
Known gap tracked upstream: the local mount currently serves discovery inferred
from synced read records (which omits required write fields like Slack's
`text`), so the pointed-at example is imperfect until that's fixed —
AgentWorkforce/relayfile#299. The adapters already publish correct write-shaped
discovery; the fix belongs in the mount/sync pipeline, not pear.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agent-relay-code[bot] pushed an unrelated mount-root "incident" note (from its own Daytona sandbox) onto this PR; it references a non-existent doc and has nothing to do with the prescriptive-spawn/evals change. Removing it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
src/main/integrations.test.ts (1)
1543-1568: ⚡ Quick winAdd a prescriptive DM-path test for
/slack/dms/<id>/messages.Current coverage validates only the
/users/DM variant. Add a sibling case for a/dms/mount to lock the intended routing and prevent fallback-path regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/integrations.test.ts` around lines 1543 - 1568, Add a new sibling test case after the existing test 'emits a text-only Slack DM payload when a user mount is present'. The new test should follow the same structure and assertions but use a mountPath with the `/slack/dms/<id>/messages` variant instead of `/slack/users/U67890EVAL/messages` to ensure both DM path formats are properly validated and prevent regressions. Verify that the IntegrationsManager correctly generates the prescriptive spawn instructions for the DMs path variant, including the text-only payload structure and absence of userId field.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@evals/fixture.ts`:
- Around line 54-55: The re-export statement is exporting snapshotMount and
newMountFiles, but these functions cannot be verified as public APIs in the
`@agent-relay/evals` v8.8.2 package. Verify that both snapshotMount and
newMountFiles actually exist as exported functions in the
`@agent-relay/evals/scoring/mount` module (or wherever they are currently imported
from). If they don't exist as public exports, either locate where these
utilities are actually defined and update the import path accordingly, or remove
the re-exports if they are not intended for external use.
In `@evals/fixtures/discovery/slack/.adapter.md`:
- Around line 14-23: The adapter references reactions and replies resources with
corresponding schema and create example fixture files, but these files do not
exist in the repository. Either create the missing fixture files for the
reactions and replies resources by adding .schema.json and .create.example.json
files for each resource following the same structure as the existing messages
fixtures, or remove the reactions and replies resource sections from the adapter
documentation to ensure all advertised resources have their corresponding
fixture files present.
In `@evals/runner.ts`:
- Around line 50-59: The variants variable at line 50 uses unsafe casting (as
Variant[]) without validating that the split variant strings are actually valid
Variant types, and the repeat variable at line 58 uses parseInt without checking
for invalid results like NaN, zero, or negative numbers. Validate each variant
string against the known VARIANTS list similar to how scenarioById validates
scenarios, throwing an error if an unknown variant is provided. For the repeat
value, validate the parsed integer result to ensure it is a positive number
greater than zero, throwing an error if the parsed value is invalid.
In `@memory/INCIDENT-20260617T122713Z.md`:
- Line 4: The incident documentation file contains a hardcoded
developer-specific filesystem path on line 4 that exposes the username "daytona"
and filesystem structure. Replace the hardcoded path
`/home/daytona/workspace/memory/workspace` with a generic placeholder that
represents the computed mount root location, such as referencing the
`integrationMountRootForWorkspace(workspaceId)` function pattern or a
descriptive placeholder like `{computed_mount_root}/workspace` to make the
documentation reusable across different developer environments without exposing
personal filesystem details.
- Around line 17-18: The documentation reference in INCIDENT-20260617T122713Z.md
on lines 17-18 points to a non-existent file
`docs/architecture/mount-invariants.md` and the `docs/architecture` directory
does not exist. Either create the missing documentation file with the
appropriate content about protected invariants and recovery procedures, or
remove the broken reference from lines 17-18 of the incident report. Choose the
option that aligns with your documentation strategy.
In `@src/main/integrations.ts`:
- Around line 2310-2319: The dmPaths filter at line 2310 only checks for paths
containing /users/, which misses DM paths using the /dms/ pattern in the Slack
mount. Update the filter to also match paths that include /dms/ so that when the
connected Slack mount resolves to .integrations/slack/dms/<id>/messages, it is
properly detected as a DM path instead of falling back to the incorrect
/users/<userId>/messages path. Modify the filter predicate on writebackPaths to
check for both /users/ and /dms/ patterns.
---
Nitpick comments:
In `@src/main/integrations.test.ts`:
- Around line 1543-1568: Add a new sibling test case after the existing test
'emits a text-only Slack DM payload when a user mount is present'. The new test
should follow the same structure and assertions but use a mountPath with the
`/slack/dms/<id>/messages` variant instead of `/slack/users/U67890EVAL/messages`
to ensure both DM path formats are properly validated and prevent regressions.
Verify that the IntegrationsManager correctly generates the prescriptive spawn
instructions for the DMs path variant, including the text-only payload structure
and absence of userId field.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 85145ef4-22cd-4785-ab14-2b9c1e2512ac
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (32)
.gitignoreevals/fixture.tsevals/fixtures/discovery/linear/.adapter.mdevals/fixtures/discovery/linear/issues/.create.example.jsonevals/fixtures/discovery/linear/issues/.delete.example.jsonevals/fixtures/discovery/linear/issues/.schema.jsonevals/fixtures/discovery/linear/issues/.update.example.jsonevals/fixtures/discovery/linear/issues/{issueId}/comments/.create.example.jsonevals/fixtures/discovery/linear/issues/{issueId}/comments/.schema.jsonevals/fixtures/discovery/slack/.adapter.mdevals/fixtures/discovery/slack/channels/{channelId}/messages/.create.example.jsonevals/fixtures/discovery/slack/channels/{channelId}/messages/.schema.jsonevals/fixtures/discovery/slack/users/{userId}/messages/.create.example.jsonevals/fixtures/discovery/slack/users/{userId}/messages/.schema.jsonevals/harness.tsevals/report.tsevals/reports/.gitkeepevals/runner.tsevals/scenarios/index.tsevals/scenarios/s01-slack-post.tsevals/scenarios/s02-slack-dm.tsevals/scenarios/s03-linear-create.tsevals/scenarios/s04-linear-update.tsevals/scenarios/s05-linear-comment.tsevals/scenarios/s06-linear-delete.tsevals/variants.tsmemory/INCIDENT-20260617T122713Z.mdpackage.jsonsrc/main/integrations.test.tssrc/main/integrations.tssrc/main/ipc-handlers.test.tssrc/main/ipc-handlers.ts
| // Re-export for convenience so callers don't need to import from two places | ||
| export { snapshotMount, newMountFiles } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What functions does @agent-relay/evals v8.8.2 export from the /scoring/mount subpath?
💡 Result:
The @agent-relay/evals package (part of the Agent Relay monorepo) exports several functions from its /scoring/mount subpath, which are utilized for evaluating agent interactions within isolated or configured environments [1]. Based on the recent development history for @agent-relay/evals (which tracks alongside the broader Agent Relay v8.x series), the /scoring/mount subpath provides access to logic handling fixture-backed and provider-backed mounts for evaluation suites [1][2]. While exact function signatures are subject to the specific internal implementation in v8.8.2, the exports generally include utilities for mount lifecycle and protocol validation, such as: * Mount lifecycle management helpers [1] * Stream cleaning and protocol verification tools [1] * Integration-specific check functions (e.g., toolcheck, phantom, and native-subagent protocols) [1] These exports are part of the shared evaluation harness designed to support deterministic VFS (Virtual File System) and ACL testing in agent scenarios [1][2]. For the most accurate and precise list of exported identifiers in v8.8.2, you should consult the packages/evals/package.json file within the official AgentWorkforce/relay repository, which defines these subpath exports, or inspect the src/scoring/mount/index.ts file in the same package [1].
Citations:
🌐 Web query:
snapshotMount newMountFiles @agent-relay/evals v8.8.2
💡 Result:
The terms "snapshotMount" and "newMountFiles" are not standard public APIs or documented functions within the @agent-relay/evals package or the broader AgentWorkforce/relay ecosystem as of June 2026. Based on current documentation: 1. The @agent-relay/evals package is primarily a shared test harness package within the Relay monorepo, used for running deterministic evaluation suites and managing tool-based evaluations [1]. 2. Relayfile (the integration filesystem for AI agents) uses a different set of APIs for managing mounts, such as ensureMountedWorkspace and mountWorkspace [2][3][4]. 3. The names snapshotMount and newMountFiles likely represent internal identifiers, private methods, or custom utilities specific to a local implementation or a transient experimental branch within the AgentWorkforce/relay codebase, rather than stable public interfaces [1]. If you are encountering these terms in a stack trace or code, they are most likely internal implementation details within the @agent-relay/evals harness driver or a specific test case runner [1]. You should check the repository's source code directly at github.com/AgentWorkforce/relay for the most accurate context regarding these specific identifiers [1][5].
Citations:
- 1: AgentWorkforce/relay@87c05f1
- 2: https://github.com/AgentWorkforce/relayfile/
- 3: https://github.com/AgentWorkforce/relayfile/blob/main/docs/guides/post-auth-mount-session.md
- 4: https://agentrelay.com/file
- 5: https://github.com/AgentWorkforce/relay/
🌐 Web query:
@agent-relay/evals scoring mount index.ts snapshotMount newMountFiles
💡 Result:
In the @agent-relay/evals package, the terms snapshotMount and newMountFiles relate to the state management and file-system handling within the framework's evaluation harnesses [1]. The @agent-relay/evals package is part of the Agent Relay ecosystem [1]. Its scoring infrastructure includes integration tests and harness utilities designed to handle VFS (Virtual Filesystem) and mount-backed scenarios [1]. 1. snapshotMount: This refers to the mechanism used to capture the state of a filesystem mount, ensuring that evaluation cases can run deterministically [2]. By creating a snapshot, the system can verify filesystem changes, ACLs, and concurrency behaviors in an isolated, repeatable environment [2]. 2. newMountFiles: This typically represents the set of files or changes generated or modified during an evaluation run [1][3]. The framework manages these by comparing the state after an agent's execution against the initial snapshot [1]. The system uses these to perform diffing, validation, or to sync writable changes back to the source project after an evaluation or agent task is complete [3]. These functions are part of the broader Agent Relay strategy to provide agents with a reliable, file-based interface for interacting with integrated systems (like Notion, Linear, GitHub, etc.) [4], where complex API interactions are normalized into a local filesystem [5][4]. The evaluation harness specifically leverages these patterns to ensure that agent behaviors—such as reading, writing, and coordinating through these mounts—are correctly scored and verified [1][6].
Citations:
- 1: AgentWorkforce/relay@87c05f1
- 2: https://github.com/AgentWorkforce/relayfile/
- 3: https://npmx.dev/package/@relayfile/local-mount
- 4: https://agentrelay.com/file
- 5: https://github.com/AgentWorkforce/skills/blob/main/skills/setting-up-relayfile/SKILL.md
- 6: https://github.com/tangle-network/agent-eval
Fix the re-exports: snapshotMount and newMountFiles are not documented or standard public APIs in @agent-relay/evals v8.8.2.
The re-export at lines 54-55 references functions that appear to be internal/custom utilities rather than stable public exports from @agent-relay/evals/scoring/mount. These function names are not confirmed to exist as public APIs in v8.8.2. Verify that the source functions actually exist and are properly exported, or update the import path to the correct internal modules where these are defined.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@evals/fixture.ts` around lines 54 - 55, The re-export statement is exporting
snapshotMount and newMountFiles, but these functions cannot be verified as
public APIs in the `@agent-relay/evals` v8.8.2 package. Verify that both
snapshotMount and newMountFiles actually exist as exported functions in the
`@agent-relay/evals/scoring/mount` module (or wherever they are currently imported
from). If they don't exist as public exports, either locate where these
utilities are actually defined and update the import path accordingly, or remove
the re-exports if they are not intended for external use.
| ### reactions — `/slack/channels/{channelId}/messages/{messageTs}/reactions` | ||
|
|
||
| - schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.schema.json` | ||
| - create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.create.example.json` | ||
|
|
||
| ### replies — `/slack/channels/{channelId}/messages/{messageTs}/replies` | ||
|
|
||
| - schema: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.schema.json` | ||
| - create example: `discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.create.example.json` | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification: ensure every path referenced in the Slack adapter exists.
set -euo pipefail
adapter="evals/fixtures/discovery/slack/.adapter.md"
# Extract backticked discovery paths and check existence under evals/fixtures.
python - <<'PY'
import re
from pathlib import Path
adapter = Path("evals/fixtures/discovery/slack/.adapter.md").read_text()
paths = re.findall(r'`(discovery/slack/[^`]+)`', adapter)
missing = []
for p in paths:
candidate = Path("evals/fixtures") / p
if not candidate.exists():
missing.append(str(candidate))
print("Referenced paths:")
for p in paths:
print(" -", p)
if missing:
print("\nMissing paths:")
for m in missing:
print(" -", m)
raise SystemExit(1)
print("\nAll referenced paths exist.")
PY
Repository: AgentWorkforce/pear
Length of output: 1105
Add missing reactions and replies fixture files referenced in the adapter.
The adapter advertises reactions/replies resources with references to schema and example files, but verification confirms these fixtures do not exist:
evals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.schema.jsonevals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/reactions/.create.example.jsonevals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.schema.jsonevals/fixtures/discovery/slack/channels/{channelId}/messages/{messageTs}/replies/.create.example.json
Only the messages and users/{userId}/messages fixtures are present. Create the missing reactions and replies fixtures or remove their references from the adapter.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@evals/fixtures/discovery/slack/.adapter.md` around lines 14 - 23, The adapter
references reactions and replies resources with corresponding schema and create
example fixture files, but these files do not exist in the repository. Either
create the missing fixture files for the reactions and replies resources by
adding .schema.json and .create.example.json files for each resource following
the same structure as the existing messages fixtures, or remove the reactions
and replies resource sections from the adapter documentation to ensure all
advertised resources have their corresponding fixture files present.
| const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[] | ||
| const scenarios = scenarioArg | ||
| ? scenarioArg.split(',').map((id) => { | ||
| const s = scenarioById(id) | ||
| if (!s) throw new Error(`Unknown scenario: ${id}`) | ||
| return s | ||
| }) | ||
| : SCENARIOS | ||
| const repeat = repeatArg ? parseInt(repeatArg, 10) : 3 | ||
| const model = modelArg ?? undefined |
There was a problem hiding this comment.
Validate --variant and --repeat instead of unsafe casting.
At Line 50 and Line 58, raw CLI input is accepted without validation (as Variant[], unchecked parseInt). Unknown variants silently flow through, and invalid repeats (NaN, 0, negative) can produce invalid/empty cells and misleading percentages.
Suggested fix
- const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[]
+ const parsedVariants = variantArg ? variantArg.split(',').map((v) => v.trim()) : [...VARIANTS]
+ const invalidVariants = parsedVariants.filter((v) => !VARIANTS.includes(v as Variant))
+ if (invalidVariants.length) {
+ throw new Error(`Unknown variant(s): ${invalidVariants.join(', ')}. Valid: ${VARIANTS.join(', ')}`)
+ }
+ const variants = parsedVariants as Variant[]
@@
- const repeat = repeatArg ? parseInt(repeatArg, 10) : 3
+ const repeat = repeatArg ? Number.parseInt(repeatArg, 10) : 3
+ if (!Number.isFinite(repeat) || repeat < 1) {
+ throw new Error(`--repeat must be a positive integer, received: ${repeatArg}`)
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const variants = (variantArg ? variantArg.split(',') : [...VARIANTS]) as Variant[] | |
| const scenarios = scenarioArg | |
| ? scenarioArg.split(',').map((id) => { | |
| const s = scenarioById(id) | |
| if (!s) throw new Error(`Unknown scenario: ${id}`) | |
| return s | |
| }) | |
| : SCENARIOS | |
| const repeat = repeatArg ? parseInt(repeatArg, 10) : 3 | |
| const model = modelArg ?? undefined | |
| const parsedVariants = variantArg ? variantArg.split(',').map((v) => v.trim()) : [...VARIANTS] | |
| const invalidVariants = parsedVariants.filter((v) => !VARIANTS.includes(v as Variant)) | |
| if (invalidVariants.length) { | |
| throw new Error(`Unknown variant(s): ${invalidVariants.join(', ')}. Valid: ${VARIANTS.join(', ')}`) | |
| } | |
| const variants = parsedVariants as Variant[] | |
| const scenarios = scenarioArg | |
| ? scenarioArg.split(',').map((id) => { | |
| const s = scenarioById(id) | |
| if (!s) throw new Error(`Unknown scenario: ${id}`) | |
| return s | |
| }) | |
| : SCENARIOS | |
| const repeat = repeatArg ? Number.parseInt(repeatArg, 10) : 3 | |
| if (!Number.isFinite(repeat) || repeat < 1) { | |
| throw new Error(`--repeat must be a positive integer, received: ${repeatArg}`) | |
| } | |
| const model = modelArg ?? undefined |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@evals/runner.ts` around lines 50 - 59, The variants variable at line 50 uses
unsafe casting (as Variant[]) without validating that the split variant strings
are actually valid Variant types, and the repeat variable at line 58 uses
parseInt without checking for invalid results like NaN, zero, or negative
numbers. Validate each variant string against the known VARIANTS list similar to
how scenarioById validates scenarios, throwing an error if an unknown variant is
provided. For the repeat value, validate the parsed integer result to ensure it
is a positive number greater than zero, throwing an error if the parsed value is
invalid.
| # relayfile mount-root invariant incident | ||
|
|
||
| - timestamp: 20260617T122713Z | ||
| - local root: /home/daytona/workspace/memory/workspace |
There was a problem hiding this comment.
Replace hardcoded developer path with a generic placeholder.
Line 4 hardcodes /home/daytona/workspace/memory/workspace, which exposes a specific developer's username and filesystem structure. This reduces the reusability of the incident documentation for other developers and poses unnecessary path exposure. Replace it with a generic placeholder that reflects the computed mount root location.
💡 Suggested replacement
- local root: /home/daytona/workspace/memory/workspace
+ local root: ~/.agentworkforce/pear/relayfile/<workspace-id>/integrations
Alternatively, reference the integrationMountRootForWorkspace(workspaceId) function or a similar pattern from the runtime code to make the path generic and descriptive.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - local root: /home/daytona/workspace/memory/workspace | |
| - local root: ~/.agentworkforce/pear/relayfile/<workspace-id>/integrations |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@memory/INCIDENT-20260617T122713Z.md` at line 4, The incident documentation
file contains a hardcoded developer-specific filesystem path on line 4 that
exposes the username "daytona" and filesystem structure. Replace the hardcoded
path `/home/daytona/workspace/memory/workspace` with a generic placeholder that
represents the computed mount root location, such as referencing the
`integrationMountRootForWorkspace(workspaceId)` function pattern or a
descriptive placeholder like `{computed_mount_root}/workspace` to make the
documentation reusable across different developer environments without exposing
personal filesystem details.
Summary
evals/): spawns real agents via the broker in a temp fixture dir with a fake.integrations/mount, scores whether the agent wrote the correct path + valid JSON. 6 scenarios × 5 variants, runnable withnpm run eval.prescriptiveSpawnInstructions()onIntegrationsManager: compact write-path lookup table derived from real mount data — same source asinitialSpawnInstructions, different format. No discovery reads required from the model.broker:spawn-agentrouting: non-claude CLIs (opencode, etc.) get the prescriptive format; claude gets the existing narrativefull-injectformat.Eval results
All free/Chinese/OpenAI models tested via opencode pass at 100% on
prescriptive.barealways fails — no model self-discovers paths without guidance.Key implementation notes
spawnCli({ transport: 'headless' })+skipRelayPrompt: true(opencode has noagent-relayagent)opencode/deepseek-v4-flash-free(free, ~18s/run)recordSpawnInstructionDeliveryguarded to narrative path only to avoid stale snippet cross-contaminationTest plan
npx vitest run src/main/integrations.test.ts— 54 tests pass (2 new forprescriptiveSpawnInstructions)npm run eval -- --cli=opencode --variant=prescriptive --repeat=3— 18/18<integrations-update>block🤖 Generated with Claude Code