Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions specs/auto-fix-synthetic-step-ids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Spec: auto-fix loop must not forward synthetic stage IDs as `startFromStep`

## Problem

When `ricky --mode local --spec-file <path> --run` (or any `runLocal` invocation that lands in the auto-fix loop) hits a launch-phase failure — i.e. the spawned workflow process exits non-zero before any Agent Relay SDK step starts — Ricky reports the failure with a synthetic step id and the auto-fix loop then forwards that id back to the SDK as `startFromStep` on every retry. The SDK rejects it because the id is not a real workflow step name, and all retries fail identically.

Concrete repro from a real run on `cloud@codex/msd-shared-sandbox-review-runtime-spec`:

```
ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more
Execution: blocked — MISSING_BINARY at runtime-launch
Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)
```
Comment on lines +9 to +18

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced code block to satisfy markdown linting.

This block should include a language (e.g., text or bash) to address MD040.

Suggested markdown fix
-```
+```text
 ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
 …
 Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
 …
 Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff   # ×6 more
 …
 Execution: blocked — MISSING_BINARY at runtime-launch
 Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/auto-fix-synthetic-step-ids.md` around lines 9 - 18, The fenced code
block showing the ricky run output is missing a language tag and triggers MD040;
update the triple-backtick fence that wraps the sample log to include a language
identifier (e.g., "text" or "bash") so it becomes ```text (or ```bash) and
ensure the block content remains unchanged so linting passes and formatting is
preserved.


The synthetic ids come from three places inside Ricky:

- `src/local/entrypoint.ts:2013` — `runtime-precheck` ("Local runtime precheck"), reported when the precheck phase fails.
- `src/local/entrypoint.ts:2085` — `runtime-launch` ("Local runtime execution"), reported when the launch phase fails.
- `src/local/auto-fix-loop.ts:1643` — `local-runtime` fallback in `localResponseToWorkflowRunEvidence` when no `failed_step.id` is reported.

All three are Ricky-internal labels for stage-level failures, not real Agent Relay SDK step names.

The auto-fix loop then propagates that id to the SDK:

- `src/local/auto-fix-loop.ts:148` — `failedStepFromEvidence` reads back the synthetic id as `failedStep`.
- `src/local/auto-fix-loop.ts:362, 387, 475` — passes `startFromStep: failedStep` into the retry request.
- `src/local/entrypoint.ts:401, 532, 798` — that becomes a `--start-from <synthetic>` CLI arg or a `START_FROM` env var on the next workflow run.
- `@agent-relay/sdk WorkflowRunner.execute` validates `startFrom` against the workflow's actual step names and throws `startFrom step "<x>" not found in workflow`. The seven retries all fail the same way; the user sees seven copies of the same stack trace and a misleading `MISSING_BINARY at runtime-launch` final blocker.

## Behavior we want

A launch-phase failure (process never reached an SDK step) must not produce a `startFromStep` on retry. The retry should re-run from the beginning — with `previousRunId` reused if available — exactly as the spec already says ("If no step granularity is reported … `--start-from` is omitted and we just retry the whole run with `--previous-run-id`." — `specs/cli-auto-fix-and-resume.md:59`).

After this fix, the same repro must:

- Not emit any `startFrom step "..." not found in workflow` errors on retry.
- Either succeed within the budget if the underlying failure is repairable, or terminate with the real classified blocker (e.g. `MISSING_BINARY` for an actually-missing binary) without burning retries on a phantom-step error.
- Surface the real failed step in the per-attempt summary only when one exists; otherwise omit `failed_step` instead of synthesizing `runtime-launch` / `local-runtime`.

## Implementation surface

The fix is contained and surgical. Two coordinated changes:

1. **Stop synthesizing fake step ids as resume targets.** Introduce a single source of truth listing the synthetic stage labels Ricky uses internally:

```ts
// src/local/synthetic-step-ids.ts
export const SYNTHETIC_LOCAL_STAGE_IDS = new Set([
'runtime-precheck',
'runtime-launch',
'local-runtime',
]);
export function isSyntheticStageId(id: string | undefined): boolean { … }
```

The label may still appear in human-facing evidence (`evidence.failed_step.name = 'Local runtime execution'`) and in attempt summary output for diagnostics, but it must not be forwarded to the SDK as a `startFrom` target.

2. **Filter at the boundary, not at the source.** The cleanest place is `failedStepFromEvidence` in `src/local/auto-fix-loop.ts`: return `undefined` when the only failed step id is synthetic. That keeps the existing evidence/event shape intact (other consumers, tests, and the persona-repair prompt may still want to know "the launch phase failed") while making the auto-fix retry path correctly treat it as a no-step-granularity failure.

Concretely:

```ts
// src/local/auto-fix-loop.ts
function failedStepFromEvidence(evidence: WorkflowRunEvidence): string | undefined {
const real = evidence.steps.find(
(step) => step.status === 'failed' && !isSyntheticStageId(step.stepId),
);
return real?.stepId;
}
```

Every existing call site that reads `failedStep` (`src/local/auto-fix-loop.ts:148, 362, 387, 475` and the attempt-summary builder at `:155`) keeps working unchanged — they already guard with `failedStep ? { startFromStep: failedStep } : {}`, and after this filter `failedStep` will be `undefined` for launch-phase failures.

3. **Keep the synthetic id out of evidence.steps[].stepId for resume purposes only.** Leave `src/local/entrypoint.ts:2104` (`workflow_steps[].id = 'runtime-launch'` in the success branch) untouched — that is observability metadata for a successful run and does not flow into auto-fix. The fallback at `src/local/auto-fix-loop.ts:1643` (`stepId: failedStepId ?? 'local-runtime'`) is the one that produces the bogus retry target; it should keep using the synthetic id for the *fallback step name* so output remains readable, but `failedStepFromEvidence` is the only place that decides resume targets, so this is already covered by change 2.

## Test cases

Unit tests in `src/local/auto-fix-loop.test.ts` (extend the existing file):

1. **Synthetic launch-phase failure does not propagate `startFromStep`** — first attempt returns a `LocalResponse` with `execution.evidence.failed_step = { id: 'runtime-launch', name: 'Local runtime execution' }`. The retry request built by the loop must have `retry.startFromStep === undefined`. The retry must still set `retry.previousRunId` when a run id is available.
2. **Synthetic fallback id does not propagate either** — `LocalResponse` with no `failed_step` field; the fallback path stamps `stepId: 'local-runtime'` into evidence. Same assertion as (1).
3. **Real failed step still propagates** — evidence has a real failed step (`stepId: 'aggregate-drift'`, `status: 'failed'`). Retry request must carry `startFromStep: 'aggregate-drift'`. (Regression guard for the existing happy path.)
4. **Mixed evidence prefers a real failed step over a synthetic one** — if `evidence.steps` contains both `{ stepId: 'runtime-launch', status: 'failed' }` and `{ stepId: 'aggregate-drift', status: 'failed' }`, the loop forwards `aggregate-drift`. (Defends against future changes that emit both.)
5. **Attempt summary still records the synthetic name** — so users / logs can still see "Local runtime execution failed" even though no `startFromStep` was forwarded. The `failed_step` field on the summary is allowed to be omitted, but the human-readable summary line / evidence narrative must still mention the launch-phase failure.

End-to-end (manual): rerun the original repro (`ricky --mode local --spec-file <some.md> --run` against a workflow that fails to launch — e.g. arrange for a `MISSING_BINARY`). Verify zero `startFrom step "..." not found in workflow` errors across all retries; verify the final summary cites the real underlying blocker.

## Out of scope

- Changing the `MISSING_BINARY` classification or the auto-fix attempt budget.
- Renaming or removing the synthetic stage labels in observability output (`evidence.failed_step.name`, attempt summary text). They remain useful for humans reading logs.
- Detecting the *real* failed step from stderr when the SDK never reported one. The retry just re-runs from the start, which is correct and safe.
- Cloud-run retry semantics. This is local-only.

## Acceptance

- The original repro (`ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run`, run from the `cloud` repo) emits no `startFrom step "..." not found in workflow` errors. The final blocker, if any, reflects the real classified failure (e.g. `MISSING_BINARY` with the genuinely missing binary), not `runtime-launch`.
- `npm test` passes, including the new auto-fix-loop test cases above.
- No regression in `specs/cli-auto-fix-and-resume.md` acceptance criteria — `--auto-fix` with `MISSING_BINARY` + `npm install` still resumes correctly when a real step fails.
63 changes: 56 additions & 7 deletions src/local/auto-fix-loop.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@ describe('runWithAutoFix', () => {
it('passes failed repair history into the next workflow repair attempt', async () => {
const runSingleAttempt = vi
.fn()
.mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'runtime-launch'))
.mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-2', 'runtime-launch'))
.mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'install-deps'))
.mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-2', 'install-deps'))
.mockResolvedValueOnce(successResponse('run-3'));
const workflowRepairer = vi
.fn()
Expand Down Expand Up @@ -123,7 +123,7 @@ describe('runWithAutoFix', () => {
retryAttempt: 2,
outcome: expect.objectContaining({
status: 'blocker',
failedStep: 'runtime-launch',
failedStep: 'install-deps',
blockerCode: 'MISSING_ENV_VAR',
runId: 'run-2',
debuggerSummary: 'Set TEST_TOKEN before retrying.',
Expand Down Expand Up @@ -521,7 +521,7 @@ describe('runWithAutoFix', () => {
it('uses the persona repair path even when the debugger recommends guided repair', async () => {
const runSingleAttempt = vi
.fn()
.mockResolvedValueOnce(blockerResponse('INVALID_ARTIFACT', 'run-1', 'runtime-launch'))
.mockResolvedValueOnce(blockerResponse('INVALID_ARTIFACT', 'run-1', 'install-deps'))
.mockResolvedValueOnce(successResponse('run-2'));
const workflowRepairer = vi.fn().mockResolvedValue(workflowRepair('guided repair workflow'));

Expand All @@ -542,7 +542,7 @@ describe('runWithAutoFix', () => {
it('repairs missing environment prerequisites in the workflow artifact before retrying', async () => {
const runSingleAttempt = vi
.fn()
.mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'runtime-launch'))
.mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'install-deps'))
.mockResolvedValueOnce(successResponse('run-2'));
const workflowRepairer = vi.fn().mockResolvedValue(workflowRepair('repaired env workflow'));

Expand All @@ -557,7 +557,7 @@ describe('runWithAutoFix', () => {

expect(runSingleAttempt).toHaveBeenCalledTimes(2);
expect(workflowRepairer).toHaveBeenCalledWith(expect.objectContaining({
failedStep: 'runtime-launch',
failedStep: 'install-deps',
runId: 'run-1',
response: expect.objectContaining({
execution: expect.objectContaining({
Expand All @@ -568,7 +568,7 @@ describe('runWithAutoFix', () => {
expect(runSingleAttempt.mock.calls[1][0].retry).toMatchObject({
previousRunId: 'run-1',
retryOfRunId: 'run-1',
startFromStep: 'runtime-launch',
startFromStep: 'install-deps',
});
expect(result.ok).toBe(true);
expect(result.auto_fix).toMatchObject({
Expand Down Expand Up @@ -718,6 +718,55 @@ describe('runWithAutoFix', () => {
expect(runSingleAttempt.mock.calls[1][0].retry.previousRunId).toBeUndefined();
expect(result.warnings).toContain('Auto-fix retry could not resolve a previous run id; retrying without step-level resume.');
});

it.each([
['runtime-launch'],
['local-runtime'],
['runtime-precheck'],
])('does not forward synthetic stage id %s as startFromStep on retry', async (syntheticId) => {
const runSingleAttempt = vi
.fn()
.mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', syntheticId))
.mockResolvedValueOnce(successResponse('run-2'));
const workflowRepairer = vi.fn().mockResolvedValue(workflowRepair('repaired workflow'));

const result = await runWithAutoFix(baseRequest, {
maxAttempts: 3,
runSingleAttempt,
classifyFailure: fakeClassification,
debugWorkflowRun: directDebugger,
workflowRepairer,
artifactWriter: vi.fn().mockResolvedValue(undefined),
});

expect(result.ok).toBe(true);
expect(runSingleAttempt).toHaveBeenCalledTimes(2);
expect(runSingleAttempt.mock.calls[1][0].retry).toMatchObject({
attempt: 2,
previousRunId: 'run-1',
});
expect(runSingleAttempt.mock.calls[1][0].retry.startFromStep).toBeUndefined();
expect(workflowRepairer.mock.calls[0][0].failedStep).toBeUndefined();
});

});

describe('isSyntheticStageId', () => {
it('matches the runtime-launch / local-runtime / runtime-precheck labels', async () => {
const { isSyntheticStageId, SYNTHETIC_LOCAL_STAGE_IDS } = await import('./synthetic-step-ids.js');
expect(isSyntheticStageId('runtime-launch')).toBe(true);
expect(isSyntheticStageId('local-runtime')).toBe(true);
expect(isSyntheticStageId('runtime-precheck')).toBe(true);
expect(SYNTHETIC_LOCAL_STAGE_IDS.size).toBe(3);
});

it('does not match real workflow step ids', async () => {
const { isSyntheticStageId } = await import('./synthetic-step-ids.js');
expect(isSyntheticStageId('install-deps')).toBe(false);
expect(isSyntheticStageId('aggregate-drift')).toBe(false);
expect(isSyntheticStageId(undefined)).toBe(false);
expect(isSyntheticStageId('')).toBe(false);
});
});

function successResponse(runId: string): LocalResponse {
Expand Down
6 changes: 5 additions & 1 deletion src/local/auto-fix-loop.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import type { FailureClassification } from '../runtime/failure/types.js';
import { debugWorkflowRun as defaultDebugWorkflowRun } from '../product/specialists/debugger/debugger.js';
import type { DebuggerResult } from '../product/specialists/debugger/types.js';
import type { WorkflowRunEvidence, WorkflowStepEvidence } from '../shared/models/workflow-evidence.js';
import { isSyntheticStageId } from './synthetic-step-ids.js';
import { repairWorkflowWithWorkforcePersona } from '../product/generation/workforce-persona-repairer.js';
import type { WorkforcePersonaRepairAttempt } from '../product/generation/workforce-persona-repairer.js';
import {
Expand Down Expand Up @@ -1628,7 +1629,10 @@ function resolveRunId(response: LocalResponse): string | undefined {
}

function failedStepFromEvidence(evidence: WorkflowRunEvidence): string | undefined {
return evidence.steps.find((step) => step.status === 'failed')?.stepId;
const real = evidence.steps.find(
(step) => step.status === 'failed' && !isSyntheticStageId(step.stepId),
);
return real?.stepId;
}

function localResponseToWorkflowRunEvidence(response: LocalResponse, attempt: number): WorkflowRunEvidence {
Expand Down
69 changes: 69 additions & 0 deletions src/local/entrypoint.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2381,6 +2381,75 @@ describe('runLocal', () => {
expect(result.logs.some((l) => l.includes('[local] workflow generation'))).toBe(false);
});

// Regression: when an auto-fix retry synthesizes a workflow-artifact
// handoff (retryBaseRequest sets metadata.autoFixGeneratedFrom) pointing
// at a path that matches isExecutableWorkflowPath but does not exist on
// disk, and the spec content routes to a non-'generate' intent like
// 'debug', the executor used to skip generation and let runtime-precheck
// fail INVALID_ARTIFACT every retry. The fix re-enters the generation
// block so the artifact is actually rendered and written, breaking the
// 7/7 INVALID_ARTIFACT loop.
it('regression: auto-fix retry re-enters generation when synthesized workflowFile is missing on disk', async () => {
const { mkdtemp, rm } = await import('node:fs/promises');
const { tmpdir } = await import('node:os');
const { join } = await import('node:path');
const tempDir = await mkdtemp(join(tmpdir(), 'ricky-missing-workflow-'));

try {
const localExecutor = memoryLocalExecutorOptions({ cwd: tempDir });
const debugSpec = [
'# Spec: handle failing checks',
'',
'When checks have failed or stack traces appear, route the failure',
'to the remediation pipeline. We need to fix the broken state.',
].join('\n');
const phantomArtifactPath = 'workflows/generated/never-written.workflow.ts';
const result = await runLocal(
{
source: 'workflow-artifact',
artifactPath: phantomArtifactPath,
// Marker that retryBaseRequest stamps on auto-fix-synthesized retries.
metadata: { autoFixGeneratedFrom: 'cli' },
},
{ localExecutor, artifactReader: mockArtifactReader(debugSpec) },
);

expect(localExecutor.writes.length).toBeGreaterThanOrEqual(1);
expect(result.logs.some((l) => l.includes('[local] workflow generation'))).toBe(true);
expect(result.logs.some((l) => l.includes('is not readable on disk after auto-fix retry'))).toBe(true);
} finally {
await rm(tempDir, { recursive: true, force: true });
}
});

// Companion: a user-provided workflow-artifact handoff that names a
// missing file should still surface the runtime-precheck blocker rather
// than silently regenerate. (This guards against over-broadening the fix.)
it('regression: user-provided missing workflow-artifact path does NOT regenerate', async () => {
const { mkdtemp, rm } = await import('node:fs/promises');
const { tmpdir } = await import('node:os');
const { join } = await import('node:path');
const tempDir = await mkdtemp(join(tmpdir(), 'ricky-user-missing-'));

try {
const localExecutor = memoryLocalExecutorOptions({ cwd: tempDir });
const result = await runLocal(
{
source: 'workflow-artifact',
artifactPath: 'workflows/generated/never-written.workflow.ts',
},
{ localExecutor, artifactReader: mockArtifactReader('import { workflow } from "@agent-relay/sdk/workflows";') },
);

// No generation, no write — the runtime path runs (commandRunner is
// mocked so precheck is bypassed), but the gate did not regenerate.
expect(result.logs.some((l) => l.includes('[local] workflow generation'))).toBe(false);
expect(localExecutor.writes).toHaveLength(0);
} finally {
await rm(tempDir, { recursive: true, force: true });
}
});

it('reads workflow artifact input and routes the artifact path to the local runtime without Cloud fallback', async () => {
const localExecutor = memoryLocalExecutorOptions({ stdout: ['artifact run ok'] });
const artifactReader = recordingArtifactReader('import { workflow } from "@agent-relay/sdk/workflows";');
Expand Down
Loading
Loading