Skip to content

feat(runtime): invocation dry-run/simulation with Cloud-compatible run records (#186 P1)#187

Merged
kjgbot merged 1 commit into
mainfrom
feat/invocation-simulation
Jun 3, 2026
Merged

feat(runtime): invocation dry-run/simulation with Cloud-compatible run records (#186 P1)#187
kjgbot merged 1 commit into
mainfrom
feat/invocation-simulation

Conversation

@kjgbot

@kjgbot kjgbot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

User description

Summary

P1 of #186 — a TRUE invocation dry-run/simulation, distinct from the deploy-preflight --dry-run (which validates persona/config and exits without ever invoking the handler).

  • createSimulationSubsystems() (packages/runtime/src/simulate/subsystems.ts): recording, no-side-effect implementations of every ctx channel a handler can reach — harness.run, llm.complete, sandbox.exec/readFile/writeFile, files.read/write, memory.save/recall, workflow.run/status, schedule.at/cancel, log. Nothing leaves the process; every call is recorded ({kind, at, args, simulatedResult}) with the inert result the handler received instead. File writes land in an in-memory VFS (seedable via files, persists across envelopes); reads of unseeded paths fail with a message naming the seeding option.
  • simulateInvocation() (simulate.ts): replays fixture RawGatewayEnvelopes through the same shimEnvelope the runner uses, with runner-equivalent dispatch semantics (unsupported envelopes skipped + reported, per-envelope error isolation — one throwing handler never aborts the replay).
  • Cloud-compatible run records: one per dispatched envelope, mirroring Cloud's hosted compact run shape field-for-field (cloud#1788 compactBase + detail logs) with origin: "local_dry_run" — the origin value Cloud reserved for exactly this ingestion. summary is populated from the handler's returned value (string or {summary}), falling back to the error message on failure. failureClass uses Cloud's exact 7-value union (locally only success/runner_error occur, but the type stays identical). Simulation-only data (recorded side effects, captured logs) nests under an additive simulation key so the core shape stays byte-compatible.
  • Exported from the @agentworkforce/runtime root and a new ./simulate subpath.

exitCode: 0 when every dispatched envelope succeeded, 1 when any failed; unsupported envelopes are reported but are not failures (mirrors the runner's warn-and-continue).

Why

Cloud#1783 shipped the hosted run-observability side; per its division of labor the simulation half belongs here. This gives developers "if this agent received this event, what would happen?" with captured ctx.log(...), summary, and predicted (recorded-not-executed) side effects — and a record Cloud can later ingest alongside hosted runs without redesign.

Tests

  • 11 new node:test cases: success-path golden record shape (field-for-field), handler-throw isolation + failure-class invariant (an error can never read as success), all 13 side-effect channel recordings, unseeded-read failure hint, VFS seeding + cross-envelope persistence, unsupported-envelope reporting, workspaceId fallback, explicit triggerKind override, raw-handler acceptance, per-run sink attribution.
  • Full workspace gate green: pnpm run check (lint + typecheck incl. examples + tests across all packages; runtime suite 49/49, CLI 211/211).

Out of scope (P2, after this review)

workforce invoke --persona <path> --fixture <event.json|.ndjsonl> CLI surface + docs/README distinguishing the two dry-run kinds.

Refs #186, AgentWorkforce/cloud#1783, AgentWorkforce/cloud#1788

🤖 Generated with Claude Code


CodeAnt-AI Description

Add a true invocation simulation that replays fixture events and returns Cloud-style run records

What Changed

  • Added a new simulation mode that runs a persona handler against fixture envelopes without executing external side effects
  • Each dispatched envelope now produces a run record with status, timing, trigger, logs, and a simulation summary
  • Unsupported fixture envelopes are reported and skipped instead of stopping the replay; handler failures are isolated to the envelope that caused them
  • File reads now require seeded content or a prior write, and simulation keeps written files available for later envelopes
  • Exported the simulation API from the runtime package root and a new ./simulate entry point

Impact

✅ Safer invocation dry-runs
✅ Clearer replay results for fixture runs
✅ Fewer surprises from unsupported events

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

…n records (#186)

P1 of workforce#186: a TRUE invocation dry-run, distinct from the
deploy-preflight --dry-run (which validates persona/config and exits
without invoking the handler).

- simulate/subsystems.ts: createSimulationSubsystems() — recording,
  no-side-effect implementations of every ctx channel a handler can
  reach (harness.run, llm.complete, sandbox exec/read/write, files
  read/write, memory save/recall, workflow run/status, schedule
  at/cancel, log). Nothing leaves the process; every call is recorded
  with the inert simulated result the handler received. File writes go
  to an in-memory VFS (seedable, persists across envelopes); reads of
  unseeded paths fail with a seeding hint.
- simulate/simulate.ts: simulateInvocation() — replays fixture
  RawGatewayEnvelopes through the same shimEnvelope the runner uses,
  dispatches to the handler with runner-equivalent semantics
  (unsupported envelopes skipped + reported, per-envelope error
  isolation), and emits one run record per dispatched envelope.
- Run records mirror Cloud's hosted compact run shape field-for-field
  (cloud#1788 compactBase + detail logs) with origin:"local_dry_run" —
  the origin value Cloud reserved for this ingestion. summary is
  populated from the handler's returned value (or the error message).
  failureClass uses Cloud's exact union; simulation-only data nests
  under an additive `simulation` key.
- Exported via @agentworkforce/runtime root and a ./simulate subpath.

11 new node:test cases; full workspace check (lint+typecheck+test)
green. P2 (CLI `workforce invoke --fixture` + docs) follows after P1
review.

Refs #186, AgentWorkforce/cloud#1783, AgentWorkforce/cloud#1788

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codeant-ai

codeant-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown

CodeAnt AI is reviewing your PR.

@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d53079dc-fd0f-4f70-8b26-a07b03581046

📥 Commits

Reviewing files that changed from the base of the PR and between 1e41e47 and 6eef4d1.

📒 Files selected for processing (8)
  • packages/runtime/package.json
  • packages/runtime/src/index.ts
  • packages/runtime/src/simulate/failure-class.ts
  • packages/runtime/src/simulate/index.ts
  • packages/runtime/src/simulate/simulate.test.ts
  • packages/runtime/src/simulate/simulate.ts
  • packages/runtime/src/simulate/subsystems.ts
  • packages/runtime/src/simulate/types.ts

📝 Walkthrough

Walkthrough

This PR adds a comprehensive local simulation/dry-run API to the runtime package. It enables deterministic envelope replay with side-effect recording, logging capture, and Cloud-compatible run records, including type contracts, recording subsystems, core invocation logic, and public exports.

Changes

Simulation and Dry-Run API

Layer / File(s) Summary
Type Contracts and Failure Classification
packages/runtime/src/simulate/types.ts, packages/runtime/src/simulate/failure-class.ts, simulate.test.ts (failure class unit test)
Core type contracts define RecordedSideEffect, CapturedLogLine, SimulatedRunRecord, UnsupportedEnvelope, SimulationResult, and SimulateInvocationOptions. Failure classification types and deriveSimulatedRunFailureClass function handle run success/failure inference from handler status and error presence.
Simulation Subsystems Factory
packages/runtime/src/simulate/subsystems.ts, simulate.test.ts (side effects, VFS, sink tests)
Recording-only implementations of sandbox, file, LLM, memory, workflow, schedule, logging, and harness contexts. Each records side effects with timestamps into an active sink; in-memory VFS is seeded and persisted; useSink swaps sink attribution per envelope; contexts return inert simulated results without executing dangerous operations.
Simulation Invocation and Run Recording
packages/runtime/src/simulate/simulate.ts, simulate.test.ts (invocation tests)
simulateInvocation accepts handler and replay envelopes, invokes handler per envelope with simulation subsystems, derives run summaries and trigger kinds, renders logs split by level, and aggregates Cloud-compatible run records. Unsupported envelopes are recorded separately without failing the overall result. Handler failures continue replay to subsequent envelopes.
Public API and Package Exports
packages/runtime/src/simulate/index.ts, packages/runtime/src/index.ts, packages/runtime/package.json
Simulate module entrypoint consolidates all public APIs; runtime index re-exports to top level; package.json declares ./simulate export path with TypeScript and JavaScript entry points.

Sequence Diagram

sequenceDiagram
  participant Handler as User Handler
  participant Invocation as simulateInvocation
  participant Subsystems as SimulationSubsystems
  participant Sink as SimulationSink
  participant Record as SimulatedRunRecord
  Invocation->>Invocation: Validate handler & materialize envelopes
  loop Per envelope
    Invocation->>Subsystems: Create/swap sink
    Invocation->>Subsystems: Shim envelope to WorkforceEvent
    Invocation->>Handler: Invoke with subsystems context
    Subsystems->>Sink: Record side effects & logs
    Handler-->>Invocation: Return summary or error
    Invocation->>Record: Aggregate run record with logs & side effects
  end
  Invocation->>Record: Include unsupported envelopes
  Invocation-->>Invocation: Return SimulationResult with exitCode
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The PR introduces a new, substantial simulation API with multiple interdependent modules, comprehensive type contracts, and detailed subsystem implementations. The logic is dense in places (especially simulateInvocation and subsystems factories) but well-structured across separate concerns. The test suite is thorough and covers both happy and failure paths, aiding review understanding.

Possibly related issues

  • #186: Implements the requested local dry-run/simulation invocation API with deterministic envelope replay, ctx.log capture, Cloud-compatible run records, and comprehensive side-effect recording across all subsystem channels.

Poem

🐰 A rabbit hops through simulations bright,
No side effects harmed in testing's light.
Envelopes shimmed, handlers invoked with care,
Cloud-compatible records float through the air!
Dry-runs complete—logs captured with delight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding invocation dry-run/simulation with Cloud-compatible run records. It is specific, on-topic, and reflects the core objective of the PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset. It explains the simulation mode, key components (subsystems, simulation invocation), Cloud-compatible records, and testing approach, all of which align with the actual changes in the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/invocation-simulation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai Bot added the size:XL This PR changes 500-999 lines, ignoring generated files label Jun 3, 2026
Comment on lines +69 to +73
const deployment: WorkforceDeploymentContext = {
id: options.deployment?.id ?? 'sim-deployment',
triggerKind: options.deployment?.triggerKind ?? 'inbox',
parentDeploymentId: options.deployment?.parentDeploymentId ?? null
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Architect Review — HIGH

When deployment metadata is omitted, ctx.deployment.triggerKind is always synthesized as 'inbox' even for cron envelopes, while the emitted run records derive trigger.kind as 'clock', so simulation produces an inconsistent trigger kind between handler context and run records compared to production.

Suggestion: Derive the synthesized deployment triggerKind from the replayed envelopes' sources (e.g. 'clock' for cron events, 'inbox' for provider events), or require an explicit deployment.triggerKind, so ctx.deployment.triggerKind and the run-record trigger.kind remain aligned with each other and with production semantics.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is an **Architect / Logical Review** comment left during a code review. These reviews are first-class, important findings — not optional suggestions. Do NOT dismiss this as a 'big architectural change' just because the title says architect review; most of these can be resolved with a small, localized fix once the intent is understood.

**Path:** packages/runtime/src/simulate/simulate.ts
**Line:** 69:73
**Comment:**
	*HIGH: When deployment metadata is omitted, `ctx.deployment.triggerKind` is always synthesized as `'inbox'` even for cron envelopes, while the emitted run records derive `trigger.kind` as `'clock'`, so simulation produces an inconsistent trigger kind between handler context and run records compared to production.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
If a suggested approach is provided above, use it as the authoritative instruction. If no explicit code suggestion is given, you MUST still draft and apply your own minimal, localized fix — do not punt back with 'no suggestion provided, review manually'. Keep the change as small as possible: add a guard clause, gate on a loading state, reorder an await, wrap in a conditional, etc. Do not refactor surrounding code or expand scope beyond the finding.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix

Comment on lines +91 to +94
record('sandbox.exec', { cmd, ...(opts?.cwd ? { cwd: opts.cwd } : {}) }, {
...result,
note: SIMULATED
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The simulated sandbox.exec recording ignores env and timeoutMs, so execution traces omit key execution context that callers passed. Record these options as part of the side-effect args to avoid incorrect or incomplete simulation observability. [incomplete implementation]

Severity Level: Major ⚠️
- ⚠️ Sandbox exec traces omit env and timeout configuration.
- ⚠️ Reproducing simulated shell behavior from logs becomes harder.
Steps of Reproduction ✅
1. The sandbox execution options are defined as `SandboxExecArgs` in
`packages/runtime/src/types.ts:202-206` with optional `cwd`, `env`, and `timeoutMs`
fields.

2. In simulation, `createSimulationSubsystems` defines `sandbox.exec` at
`subsystems.ts:87-96`, which runs `record('sandbox.exec', { cmd, ...(opts?.cwd ? { cwd:
opts.cwd } : {}) }, { ...result, note: SIMULATED })` at `subsystems.ts:91-94` to store
call metadata in `RecordedSideEffect.args` (see `record` at `subsystems.ts:61-71` and
`RecordedSideEffect` at `simulate/types.ts:16-37`).

3. When a simulated handler (following the pattern in `simulate.test.ts:122-138`, where
`await ctx.sandbox.exec('rm -rf / --no-preserve-root');` is used) is updated to call
`ctx.sandbox.exec('cmd', { cwd: '/x', env: { FOO: 'bar' }, timeoutMs: 5000 })` using the
`SandboxExecArgs` type, the `opts.env` and `opts.timeoutMs` values are visible to the
handler but never included in the `record` call at `subsystems.ts:91-94`.

4. Running `simulateInvocation` with such a handler and inspecting
`result.runs[0].simulation.sideEffects.find(e => e.kind === 'sandbox.exec')?.args` shows
only `cmd` (and optionally `cwd`), with no `env` or `timeoutMs`, so the dry-run
side-effect trace lacks key execution context the caller actually supplied.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** packages/runtime/src/simulate/subsystems.ts
**Line:** 91:94
**Comment:**
	*Incomplete Implementation: The simulated `sandbox.exec` recording ignores `env` and `timeoutMs`, so execution traces omit key execution context that callers passed. Record these options as part of the side-effect args to avoid incorrect or incomplete simulation observability.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +183 to +186
record('harness.run', {
promptChars: args.prompt.length,
...(args.cwd ? { cwd: args.cwd } : {})
}, { exitCode: 0, note: SIMULATED });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The simulated harness.run recording drops inputs and env, so the recorded args no longer reflect the actual invocation contract and can produce misleading run records. Include all provided harness arguments in the recorded payload so dry-run traces stay faithful to real handler calls. [incomplete implementation]

Severity Level: Major ⚠️
- ⚠️ Harness dry-run traces omit `inputs` and `env` configuration.
- ⚠️ Harder to debug harness behavior from simulation records.
Steps of Reproduction ✅
1. The harness invocation contract is defined in `packages/runtime/src/types.ts:191-199`
as `HarnessRunArgs` with fields `prompt`, optional `cwd`, `inputs`, and `env`.

2. In simulation, `createSimulationSubsystems` at `subsystems.ts:50-56` defines
`harnessRunner` at `subsystems.ts:181-188`, which the ctx builder wires into
`ctx.harness.run` via `buildCtx` usage at `simulate.ts:84-97`.

3. When `ctx.harness.run` is called in a simulated handler (pattern shown in
`simulate.test.ts:122-138`, where `await ctx.harness.run({ prompt: 'do the thing' });` is
used), the simulation's `harnessRunner` receives the full `HarnessRunArgs`, but the
recording call at `subsystems.ts:183-186` only persists `{ promptChars:
args.prompt.length, ...(args.cwd ? { cwd: args.cwd } : {}) }` into
`RecordedSideEffect.args` (see `record` implementation at `subsystems.ts:61-71` and
`RecordedSideEffect.args` definition at `simulate/types.ts:16-37`).

4. To observe the loss of fidelity, extend the existing side-effect test handler at
`simulate.test.ts:124-138` to pass `inputs` and `env` into `ctx.harness.run` (using the
contract from `types.ts:191-199`), run `simulateInvocation`, and inspect
`result.runs[0].simulation.sideEffects.find(e => e.kind === 'harness.run')?.args`: it will
only contain `promptChars` (and maybe `cwd`), with no representation of the `inputs` or
`env` actually used by the handler.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** packages/runtime/src/simulate/subsystems.ts
**Line:** 183:186
**Comment:**
	*Incomplete Implementation: The simulated `harness.run` recording drops `inputs` and `env`, so the recorded `args` no longer reflect the actual invocation contract and can produce misleading run records. Include all provided harness arguments in the recorded payload so dry-run traces stay faithful to real handler calls.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +199 to +201
useSink(next) {
sink = next;
},

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using a single mutable sink that is swapped between envelopes can misattribute side effects/logs when a handler leaves background async work running after it returns. Late async emissions from the previous invocation will be written into the next envelope's sink. Use per-invocation bound record/log functions (or an immutable sink captured at dispatch time) instead of a globally swapped sink reference. [race condition]

Severity Level: Major ⚠️
- ❌ Late async logs attached to the wrong simulated run.
- ⚠️ Multi-envelope dry-run debugging sees misleading per-run logs.
Steps of Reproduction ✅
1. Entry point `simulateInvocation` at `packages/runtime/src/simulate/simulate.ts:36-38`
accepts a `handler` and fixtures, then creates shared subsystems via
`createSimulationSubsystems` at `simulate.ts:76-79`.

2. A single `ctx` is built at `simulate.ts:84-97` with `log` and other channels wired to
the simulation subsystems from `subsystems.ts:87-198`; the log implementation closes over
the mutable `sink` variable defined at `subsystems.ts:56` and mutated in `useSink` at
`subsystems.ts:199-201`.

3. For each envelope, `simulateInvocation` iterates at `simulate.ts:103-156`, creating a
fresh `sink: SimulationSink` at `simulate.ts:110`, then calling `subsystems.useSink(sink)`
at `simulate.ts:111` before awaiting `handlerFn(ctx, event)` at `simulate.ts:117`.

4. If a user handler (following the pattern in `simulate.test.ts:52-62` where
`ctx.log('info', 'saw event', ...)` is used) starts background async work that logs after
it returns (e.g. `setTimeout(() => ctx.log('info','late'), 100)`), then: after the first
envelope's `await handlerFn` completes, the loop advances, `useSink` is called for the
second envelope (`simulate.ts:110-112`), mutating `sink`. When the background task from
the first envelope finally calls `ctx.log`, the log implementation at
`subsystems.ts:172-178` pushes into the *current* `sink.logs` (second envelope),
misattributing the side effect to the wrong run.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** packages/runtime/src/simulate/subsystems.ts
**Line:** 199:201
**Comment:**
	*Race Condition: Using a single mutable sink that is swapped between envelopes can misattribute side effects/logs when a handler leaves background async work running after it returns. Late async emissions from the previous invocation will be written into the next envelope's sink. Use per-invocation bound record/log functions (or an immutable sink captured at dispatch time) instead of a globally swapped sink reference.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +84 to +97
const ctx = buildCtx({
persona: options.persona,
agent,
deployment,
workspaceId,
sandbox: subsystems.sandbox,
files: subsystems.files,
llm: subsystems.llm,
memory: subsystems.memory,
workflow: subsystems.workflow,
schedule: subsystems.schedule,
log: subsystems.log,
harnessRunner: subsystems.harnessRunner
});

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Simulation still builds ctx through buildCtx without replacing the credentials subsystem, so ctx.credentials remains environment-backed and can throw missing-credential errors during dry-run. That breaks simulation determinism and violates the "recording/no-side-effect local run" behavior when handlers read credentials. Inject a simulated credentials context (or extend buildCtx to accept one) so simulation does not depend on host env secrets. [incomplete implementation]

Severity Level: Major ⚠️
- ❌ Simulation fails when handlers require credentials without env set.
- ⚠️ Local dry-run unusable on machines lacking Cloud tokens.
- ⚠️ Breaks documented no-side-effect local simulation expectations.
Steps of Reproduction ✅
1. In `packages/runtime/src/simulate/simulate.ts`, `simulateInvocation` constructs the
handler context once via `const ctx = buildCtx({ ... })` at lines 84–97, passing
`persona`, `agent`, `deployment`, `workspaceId`, `sandbox`, `files`, `llm`, `memory`,
`workflow`, `schedule`, `log`, and `harnessRunner`, but not providing any override for
credentials.

2. Inspect `buildCtx` in `packages/runtime/src/ctx.ts`: at lines 118–148 it builds the
`WorkforceCtx` and unconditionally sets `credentials: credentialsFromEnv()` at line 143,
meaning every context (including simulation contexts) uses process-environment-backed
credentials.

3. `credentialsFromEnv` in `ctx.ts` lines 203–219 returns an object whose getters delegate
to `requireRuntimeCredentials(processEnv)`, and `requireRuntimeCredentials` at lines
221–225 calls `readRuntimeCredentialSnapshot(processEnv)` and throws `new Error("Runtime
credentials are required: missing ...")` when any of the expected RELAYFILE_* or
CLOUD_API_* environment variables are absent (as exercised by the `ctx.credentials` tests
in `packages/runtime/src/ctx.test.ts:24–45`).

4. Run `simulateInvocation` (exported for consumers via
`packages/runtime/src/index.ts:5–22`) in a local environment that lacks full runtime
credentials (e.g. no `RELAYFILE_TOKEN` or `CLOUD_API_ACCESS_TOKEN`) with a handler that
reads credentials, such as `handler(async (ctx) => { ctx.credentials.require(); })`; when
`simulateInvocation` calls `buildCtx` (simulate.ts:84–97), `ctx.credentials` is bound to
`credentialsFromEnv`, and the first call to `ctx.credentials.require()` inside the
simulated handler immediately throws the same "Runtime credentials are required: missing
…" error as production, causing the simulation run to fail even though the rest of the
simulation subsystems (`llm`, `memory`, `workflow`, `schedule`, `sandbox`,
`harnessRunner`) are designed to be no-side-effect and independent of cloud configuration.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** packages/runtime/src/simulate/simulate.ts
**Line:** 84:97
**Comment:**
	*Incomplete Implementation: Simulation still builds `ctx` through `buildCtx` without replacing the credentials subsystem, so `ctx.credentials` remains environment-backed and can throw missing-credential errors during dry-run. That breaks simulation determinism and violates the "recording/no-side-effect local run" behavior when handlers read credentials. Inject a simulated credentials context (or extend `buildCtx` to accept one) so simulation does not depend on host env secrets.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Comment on lines +110 to +120
const sink: SimulationSink = { sideEffects: [], logs: [] };
subsystems.useSink(sink);

const runStarted = now();
let summary: string | null = null;
let error: string | null = null;
try {
const returned: unknown = await handlerFn(ctx, event);
summary = summaryFromHandlerReturn(returned);
} catch (err) {
error = err instanceof Error ? err.message : String(err);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Side-effect/log attribution is race-prone because a single mutable sink is swapped between envelopes and reused by the same ctx. If a handler starts background async work (e.g. timer/unawaited promise) that calls ctx.log or other ctx.* methods after the handler returns, those records will be written into the next envelope's sink, corrupting run records. Use per-envelope subsystem instances (or bind immutable per-run sink closures) so late async emissions cannot cross-contaminate runs. [race condition]

Severity Level: Major ⚠️
- ❌ Simulated run logs mis-attributed between envelopes in replay.
- ⚠️ Side-effect records from one run appear under another.
- ⚠️ Cloud run records become unreliable for debugging simulations.
Steps of Reproduction ✅
1. Observe the simulation subsystems implementation in
`packages/runtime/src/simulate/subsystems.ts`: `createSimulationSubsystems` declares a
single mutable variable `let sink: SimulationSink = { sideEffects: [], logs: [] };` at
line 56 and exposes `useSink(next)` at lines 199–201, which simply reassigns this shared
`sink` variable.

2. Note that all simulated side effects and logs are written through closures that capture
this mutable `sink`: `record(...)` at lines 61–71 pushes into `sink.sideEffects`, and the
`log` implementation at lines 172–179 pushes into `sink.logs`, both referencing whatever
`sink` object `useSink` last assigned.

3. In `packages/runtime/src/simulate/simulate.ts`, `simulateInvocation` builds a single
`ctx` once at lines 84–97 using `buildCtx(...)`, then in the per-envelope loop at lines
103–112 it creates a fresh `sink` object (`const sink: SimulationSink = { sideEffects: [],
logs: [] };`) and calls `subsystems.useSink(sink);` before awaiting `handlerFn(ctx,
event)` at lines 116–119, so the same `ctx` instance is reused for every envelope while
only the shared `sink` pointer is swapped.

4. Define a handler using the public API (patterned on `simulateInvocation` tests in
`packages/runtime/src/simulate/simulate.test.ts:52–63`) such as `handler(async (ctx,
event) => { setTimeout(() => ctx.log('info', 'late log', { id: event.id }), 0); })`, then
call `simulateInvocation` with two envelopes; after the first `handlerFn` returns,
`simulateInvocation` advances to the next envelope and calls
`subsystems.useSink(nextSink)` (simulate.ts:110–112), so when the `setTimeout` callback
eventually runs it uses the same `ctx.log` closure but now writes into the second run's
`sink.logs`, causing the first envelope's late log to appear under the second run's
`simulation.capturedLogs` and leaving the first run's logs incomplete.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** packages/runtime/src/simulate/simulate.ts
**Line:** 110:120
**Comment:**
	*Race Condition: Side-effect/log attribution is race-prone because a single mutable sink is swapped between envelopes and reused by the same `ctx`. If a handler starts background async work (e.g. timer/unawaited promise) that calls `ctx.log` or other `ctx.*` methods after the handler returns, those records will be written into the next envelope's sink, corrupting run records. Use per-envelope subsystem instances (or bind immutable per-run sink closures) so late async emissions cannot cross-contaminate runs.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

@codeant-ai

codeant-ai Bot commented Jun 3, 2026

Copy link
Copy Markdown

CodeAnt AI finished reviewing your PR.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 8 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/runtime/src/simulate/subsystems.ts">

<violation number="1" location="packages/runtime/src/simulate/subsystems.ts:173">
P2: Captured `ctx.log` attrs are stored as-is, so non-JSON values can crash log rendering and abort simulation replay.</violation>

<violation number="2" location="packages/runtime/src/simulate/subsystems.ts:200">
P2: Per-envelope attribution is fragile because a global mutable sink can misattribute late async writes to the wrong run.</violation>
</file>

<file name="packages/runtime/src/simulate/simulate.ts">

<violation number="1" location="packages/runtime/src/simulate/simulate.ts:71">
P2: Defaulting `ctx.deployment.triggerKind` to `"inbox"` makes cron simulations report the wrong trigger kind inside handler context.</violation>

<violation number="2" location="packages/runtime/src/simulate/simulate.ts:84">
P1: Inject a simulated credentials implementation when building the simulation context; otherwise handlers that call `ctx.credentials` can fail based on host environment secrets, which breaks deterministic dry-run behavior.</violation>
</file>

<file name="packages/runtime/src/simulate/types.ts">

<violation number="1" location="packages/runtime/src/simulate/types.ts:77">
P2: `trigger.kind` is typed too loosely (`string`) instead of the documented trigger-kind union, which weakens Cloud-shape compatibility guarantees.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

// buildCtx may throw for unresolved required persona inputs — that is a
// simulation setup error (seed inputs via `agent.inputValues`), surfaced
// to the caller before any run records exist.
const ctx = buildCtx({

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Inject a simulated credentials implementation when building the simulation context; otherwise handlers that call ctx.credentials can fail based on host environment secrets, which breaks deterministic dry-run behavior.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/runtime/src/simulate/simulate.ts, line 84:

<comment>Inject a simulated credentials implementation when building the simulation context; otherwise handlers that call `ctx.credentials` can fail based on host environment secrets, which breaks deterministic dry-run behavior.</comment>

<file context>
@@ -0,0 +1,236 @@
+  // buildCtx may throw for unresolved required persona inputs — that is a
+  // simulation setup error (seed inputs via `agent.inputValues`), surfaced
+  // to the caller before any run records exist.
+  const ctx = buildCtx({
+    persona: options.persona,
+    agent,
</file context>

log,
harnessRunner,
useSink(next) {
sink = next;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Per-envelope attribution is fragile because a global mutable sink can misattribute late async writes to the wrong run.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/runtime/src/simulate/subsystems.ts, line 200:

<comment>Per-envelope attribution is fragile because a global mutable sink can misattribute late async writes to the wrong run.</comment>

<file context>
@@ -0,0 +1,206 @@
+    log,
+    harnessRunner,
+    useSink(next) {
+      sink = next;
+    },
+    vfsSnapshot() {
</file context>

};

const log: WorkforceCtx['log'] = (level, message, attrs) => {
sink.logs.push({

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Captured ctx.log attrs are stored as-is, so non-JSON values can crash log rendering and abort simulation replay.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/runtime/src/simulate/subsystems.ts, line 173:

<comment>Captured `ctx.log` attrs are stored as-is, so non-JSON values can crash log rendering and abort simulation replay.</comment>

<file context>
@@ -0,0 +1,206 @@
+  };
+
+  const log: WorkforceCtx['log'] = (level, message, attrs) => {
+    sink.logs.push({
+      t: now().toISOString(),
+      level,
</file context>


const deployment: WorkforceDeploymentContext = {
id: options.deployment?.id ?? 'sim-deployment',
triggerKind: options.deployment?.triggerKind ?? 'inbox',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Defaulting ctx.deployment.triggerKind to "inbox" makes cron simulations report the wrong trigger kind inside handler context.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/runtime/src/simulate/simulate.ts, line 71:

<comment>Defaulting `ctx.deployment.triggerKind` to `"inbox"` makes cron simulations report the wrong trigger kind inside handler context.</comment>

<file context>
@@ -0,0 +1,236 @@
+
+  const deployment: WorkforceDeploymentContext = {
+    id: options.deployment?.id ?? 'sim-deployment',
+    triggerKind: options.deployment?.triggerKind ?? 'inbox',
+    parentDeploymentId: options.deployment?.parentDeploymentId ?? null
+  };
</file context>

durationMs: number;
trigger: {
/** Workforce trigger vocabulary (`inbox` | `clock` | `radio`). */
kind: string;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: trigger.kind is typed too loosely (string) instead of the documented trigger-kind union, which weakens Cloud-shape compatibility guarantees.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/runtime/src/simulate/types.ts, line 77:

<comment>`trigger.kind` is typed too loosely (`string`) instead of the documented trigger-kind union, which weakens Cloud-shape compatibility guarantees.</comment>

<file context>
@@ -0,0 +1,155 @@
+  durationMs: number;
+  trigger: {
+    /** Workforce trigger vocabulary (`inbox` | `clock` | `radio`). */
+    kind: string;
+    eventSource: string;
+  };
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant