diff --git a/specs/auto-fix-synthetic-step-ids.md b/specs/auto-fix-synthetic-step-ids.md new file mode 100644 index 00000000..e02c03e0 --- /dev/null +++ b/specs/auto-fix-synthetic-step-ids.md @@ -0,0 +1,104 @@ +# Spec: auto-fix loop must not forward synthetic stage IDs as `startFromStep` + +## Problem + +When `ricky --mode local --spec-file --run` (or any `runLocal` invocation that lands in the auto-fix loop) hits a launch-phase failure — i.e. the spawned workflow process exits non-zero before any Agent Relay SDK step starts — Ricky reports the failure with a synthetic step id and the auto-fix loop then forwards that id back to the SDK as `startFromStep` on every retry. The SDK rejects it because the id is not a real workflow step name, and all retries fail identically. + +Concrete repro from a real run on `cloud@codex/msd-shared-sandbox-review-runtime-spec`: + +``` +ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run +… +Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff +… +Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more +… +Execution: blocked — MISSING_BINARY at runtime-launch +Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY) +``` + +The synthetic ids come from three places inside Ricky: + +- `src/local/entrypoint.ts:2013` — `runtime-precheck` ("Local runtime precheck"), reported when the precheck phase fails. +- `src/local/entrypoint.ts:2085` — `runtime-launch` ("Local runtime execution"), reported when the launch phase fails. +- `src/local/auto-fix-loop.ts:1643` — `local-runtime` fallback in `localResponseToWorkflowRunEvidence` when no `failed_step.id` is reported. + +All three are Ricky-internal labels for stage-level failures, not real Agent Relay SDK step names. + +The auto-fix loop then propagates that id to the SDK: + +- `src/local/auto-fix-loop.ts:148` — `failedStepFromEvidence` reads back the synthetic id as `failedStep`. +- `src/local/auto-fix-loop.ts:362, 387, 475` — passes `startFromStep: failedStep` into the retry request. +- `src/local/entrypoint.ts:401, 532, 798` — that becomes a `--start-from ` CLI arg or a `START_FROM` env var on the next workflow run. +- `@agent-relay/sdk WorkflowRunner.execute` validates `startFrom` against the workflow's actual step names and throws `startFrom step "" not found in workflow`. The seven retries all fail the same way; the user sees seven copies of the same stack trace and a misleading `MISSING_BINARY at runtime-launch` final blocker. + +## Behavior we want + +A launch-phase failure (process never reached an SDK step) must not produce a `startFromStep` on retry. The retry should re-run from the beginning — with `previousRunId` reused if available — exactly as the spec already says ("If no step granularity is reported … `--start-from` is omitted and we just retry the whole run with `--previous-run-id`." — `specs/cli-auto-fix-and-resume.md:59`). + +After this fix, the same repro must: + +- Not emit any `startFrom step "..." not found in workflow` errors on retry. +- Either succeed within the budget if the underlying failure is repairable, or terminate with the real classified blocker (e.g. `MISSING_BINARY` for an actually-missing binary) without burning retries on a phantom-step error. +- Surface the real failed step in the per-attempt summary only when one exists; otherwise omit `failed_step` instead of synthesizing `runtime-launch` / `local-runtime`. + +## Implementation surface + +The fix is contained and surgical. Two coordinated changes: + +1. **Stop synthesizing fake step ids as resume targets.** Introduce a single source of truth listing the synthetic stage labels Ricky uses internally: + + ```ts + // src/local/synthetic-step-ids.ts + export const SYNTHETIC_LOCAL_STAGE_IDS = new Set([ + 'runtime-precheck', + 'runtime-launch', + 'local-runtime', + ]); + export function isSyntheticStageId(id: string | undefined): boolean { … } + ``` + + The label may still appear in human-facing evidence (`evidence.failed_step.name = 'Local runtime execution'`) and in attempt summary output for diagnostics, but it must not be forwarded to the SDK as a `startFrom` target. + +2. **Filter at the boundary, not at the source.** The cleanest place is `failedStepFromEvidence` in `src/local/auto-fix-loop.ts`: return `undefined` when the only failed step id is synthetic. That keeps the existing evidence/event shape intact (other consumers, tests, and the persona-repair prompt may still want to know "the launch phase failed") while making the auto-fix retry path correctly treat it as a no-step-granularity failure. + + Concretely: + + ```ts + // src/local/auto-fix-loop.ts + function failedStepFromEvidence(evidence: WorkflowRunEvidence): string | undefined { + const real = evidence.steps.find( + (step) => step.status === 'failed' && !isSyntheticStageId(step.stepId), + ); + return real?.stepId; + } + ``` + + Every existing call site that reads `failedStep` (`src/local/auto-fix-loop.ts:148, 362, 387, 475` and the attempt-summary builder at `:155`) keeps working unchanged — they already guard with `failedStep ? { startFromStep: failedStep } : {}`, and after this filter `failedStep` will be `undefined` for launch-phase failures. + +3. **Keep the synthetic id out of evidence.steps[].stepId for resume purposes only.** Leave `src/local/entrypoint.ts:2104` (`workflow_steps[].id = 'runtime-launch'` in the success branch) untouched — that is observability metadata for a successful run and does not flow into auto-fix. The fallback at `src/local/auto-fix-loop.ts:1643` (`stepId: failedStepId ?? 'local-runtime'`) is the one that produces the bogus retry target; it should keep using the synthetic id for the *fallback step name* so output remains readable, but `failedStepFromEvidence` is the only place that decides resume targets, so this is already covered by change 2. + +## Test cases + +Unit tests in `src/local/auto-fix-loop.test.ts` (extend the existing file): + +1. **Synthetic launch-phase failure does not propagate `startFromStep`** — first attempt returns a `LocalResponse` with `execution.evidence.failed_step = { id: 'runtime-launch', name: 'Local runtime execution' }`. The retry request built by the loop must have `retry.startFromStep === undefined`. The retry must still set `retry.previousRunId` when a run id is available. +2. **Synthetic fallback id does not propagate either** — `LocalResponse` with no `failed_step` field; the fallback path stamps `stepId: 'local-runtime'` into evidence. Same assertion as (1). +3. **Real failed step still propagates** — evidence has a real failed step (`stepId: 'aggregate-drift'`, `status: 'failed'`). Retry request must carry `startFromStep: 'aggregate-drift'`. (Regression guard for the existing happy path.) +4. **Mixed evidence prefers a real failed step over a synthetic one** — if `evidence.steps` contains both `{ stepId: 'runtime-launch', status: 'failed' }` and `{ stepId: 'aggregate-drift', status: 'failed' }`, the loop forwards `aggregate-drift`. (Defends against future changes that emit both.) +5. **Attempt summary still records the synthetic name** — so users / logs can still see "Local runtime execution failed" even though no `startFromStep` was forwarded. The `failed_step` field on the summary is allowed to be omitted, but the human-readable summary line / evidence narrative must still mention the launch-phase failure. + +End-to-end (manual): rerun the original repro (`ricky --mode local --spec-file --run` against a workflow that fails to launch — e.g. arrange for a `MISSING_BINARY`). Verify zero `startFrom step "..." not found in workflow` errors across all retries; verify the final summary cites the real underlying blocker. + +## Out of scope + +- Changing the `MISSING_BINARY` classification or the auto-fix attempt budget. +- Renaming or removing the synthetic stage labels in observability output (`evidence.failed_step.name`, attempt summary text). They remain useful for humans reading logs. +- Detecting the *real* failed step from stderr when the SDK never reported one. The retry just re-runs from the start, which is correct and safe. +- Cloud-run retry semantics. This is local-only. + +## Acceptance + +- The original repro (`ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run`, run from the `cloud` repo) emits no `startFrom step "..." not found in workflow` errors. The final blocker, if any, reflects the real classified failure (e.g. `MISSING_BINARY` with the genuinely missing binary), not `runtime-launch`. +- `npm test` passes, including the new auto-fix-loop test cases above. +- No regression in `specs/cli-auto-fix-and-resume.md` acceptance criteria — `--auto-fix` with `MISSING_BINARY` + `npm install` still resumes correctly when a real step fails. diff --git a/src/local/auto-fix-loop.test.ts b/src/local/auto-fix-loop.test.ts index aa52dd78..deddf2a4 100644 --- a/src/local/auto-fix-loop.test.ts +++ b/src/local/auto-fix-loop.test.ts @@ -93,8 +93,8 @@ describe('runWithAutoFix', () => { it('passes failed repair history into the next workflow repair attempt', async () => { const runSingleAttempt = vi .fn() - .mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'runtime-launch')) - .mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-2', 'runtime-launch')) + .mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'install-deps')) + .mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-2', 'install-deps')) .mockResolvedValueOnce(successResponse('run-3')); const workflowRepairer = vi .fn() @@ -123,7 +123,7 @@ describe('runWithAutoFix', () => { retryAttempt: 2, outcome: expect.objectContaining({ status: 'blocker', - failedStep: 'runtime-launch', + failedStep: 'install-deps', blockerCode: 'MISSING_ENV_VAR', runId: 'run-2', debuggerSummary: 'Set TEST_TOKEN before retrying.', @@ -521,7 +521,7 @@ describe('runWithAutoFix', () => { it('uses the persona repair path even when the debugger recommends guided repair', async () => { const runSingleAttempt = vi .fn() - .mockResolvedValueOnce(blockerResponse('INVALID_ARTIFACT', 'run-1', 'runtime-launch')) + .mockResolvedValueOnce(blockerResponse('INVALID_ARTIFACT', 'run-1', 'install-deps')) .mockResolvedValueOnce(successResponse('run-2')); const workflowRepairer = vi.fn().mockResolvedValue(workflowRepair('guided repair workflow')); @@ -542,7 +542,7 @@ describe('runWithAutoFix', () => { it('repairs missing environment prerequisites in the workflow artifact before retrying', async () => { const runSingleAttempt = vi .fn() - .mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'runtime-launch')) + .mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', 'install-deps')) .mockResolvedValueOnce(successResponse('run-2')); const workflowRepairer = vi.fn().mockResolvedValue(workflowRepair('repaired env workflow')); @@ -557,7 +557,7 @@ describe('runWithAutoFix', () => { expect(runSingleAttempt).toHaveBeenCalledTimes(2); expect(workflowRepairer).toHaveBeenCalledWith(expect.objectContaining({ - failedStep: 'runtime-launch', + failedStep: 'install-deps', runId: 'run-1', response: expect.objectContaining({ execution: expect.objectContaining({ @@ -568,7 +568,7 @@ describe('runWithAutoFix', () => { expect(runSingleAttempt.mock.calls[1][0].retry).toMatchObject({ previousRunId: 'run-1', retryOfRunId: 'run-1', - startFromStep: 'runtime-launch', + startFromStep: 'install-deps', }); expect(result.ok).toBe(true); expect(result.auto_fix).toMatchObject({ @@ -718,6 +718,55 @@ describe('runWithAutoFix', () => { expect(runSingleAttempt.mock.calls[1][0].retry.previousRunId).toBeUndefined(); expect(result.warnings).toContain('Auto-fix retry could not resolve a previous run id; retrying without step-level resume.'); }); + + it.each([ + ['runtime-launch'], + ['local-runtime'], + ['runtime-precheck'], + ])('does not forward synthetic stage id %s as startFromStep on retry', async (syntheticId) => { + const runSingleAttempt = vi + .fn() + .mockResolvedValueOnce(blockerResponse('MISSING_ENV_VAR', 'run-1', syntheticId)) + .mockResolvedValueOnce(successResponse('run-2')); + const workflowRepairer = vi.fn().mockResolvedValue(workflowRepair('repaired workflow')); + + const result = await runWithAutoFix(baseRequest, { + maxAttempts: 3, + runSingleAttempt, + classifyFailure: fakeClassification, + debugWorkflowRun: directDebugger, + workflowRepairer, + artifactWriter: vi.fn().mockResolvedValue(undefined), + }); + + expect(result.ok).toBe(true); + expect(runSingleAttempt).toHaveBeenCalledTimes(2); + expect(runSingleAttempt.mock.calls[1][0].retry).toMatchObject({ + attempt: 2, + previousRunId: 'run-1', + }); + expect(runSingleAttempt.mock.calls[1][0].retry.startFromStep).toBeUndefined(); + expect(workflowRepairer.mock.calls[0][0].failedStep).toBeUndefined(); + }); + +}); + +describe('isSyntheticStageId', () => { + it('matches the runtime-launch / local-runtime / runtime-precheck labels', async () => { + const { isSyntheticStageId, SYNTHETIC_LOCAL_STAGE_IDS } = await import('./synthetic-step-ids.js'); + expect(isSyntheticStageId('runtime-launch')).toBe(true); + expect(isSyntheticStageId('local-runtime')).toBe(true); + expect(isSyntheticStageId('runtime-precheck')).toBe(true); + expect(SYNTHETIC_LOCAL_STAGE_IDS.size).toBe(3); + }); + + it('does not match real workflow step ids', async () => { + const { isSyntheticStageId } = await import('./synthetic-step-ids.js'); + expect(isSyntheticStageId('install-deps')).toBe(false); + expect(isSyntheticStageId('aggregate-drift')).toBe(false); + expect(isSyntheticStageId(undefined)).toBe(false); + expect(isSyntheticStageId('')).toBe(false); + }); }); function successResponse(runId: string): LocalResponse { diff --git a/src/local/auto-fix-loop.ts b/src/local/auto-fix-loop.ts index f805867e..6c124f44 100644 --- a/src/local/auto-fix-loop.ts +++ b/src/local/auto-fix-loop.ts @@ -11,6 +11,7 @@ import type { FailureClassification } from '../runtime/failure/types.js'; import { debugWorkflowRun as defaultDebugWorkflowRun } from '../product/specialists/debugger/debugger.js'; import type { DebuggerResult } from '../product/specialists/debugger/types.js'; import type { WorkflowRunEvidence, WorkflowStepEvidence } from '../shared/models/workflow-evidence.js'; +import { isSyntheticStageId } from './synthetic-step-ids.js'; import { repairWorkflowWithWorkforcePersona } from '../product/generation/workforce-persona-repairer.js'; import type { WorkforcePersonaRepairAttempt } from '../product/generation/workforce-persona-repairer.js'; import { @@ -1628,7 +1629,10 @@ function resolveRunId(response: LocalResponse): string | undefined { } function failedStepFromEvidence(evidence: WorkflowRunEvidence): string | undefined { - return evidence.steps.find((step) => step.status === 'failed')?.stepId; + const real = evidence.steps.find( + (step) => step.status === 'failed' && !isSyntheticStageId(step.stepId), + ); + return real?.stepId; } function localResponseToWorkflowRunEvidence(response: LocalResponse, attempt: number): WorkflowRunEvidence { diff --git a/src/local/entrypoint.test.ts b/src/local/entrypoint.test.ts index c6e5fcdc..a3bd9753 100644 --- a/src/local/entrypoint.test.ts +++ b/src/local/entrypoint.test.ts @@ -2381,6 +2381,75 @@ describe('runLocal', () => { expect(result.logs.some((l) => l.includes('[local] workflow generation'))).toBe(false); }); + // Regression: when an auto-fix retry synthesizes a workflow-artifact + // handoff (retryBaseRequest sets metadata.autoFixGeneratedFrom) pointing + // at a path that matches isExecutableWorkflowPath but does not exist on + // disk, and the spec content routes to a non-'generate' intent like + // 'debug', the executor used to skip generation and let runtime-precheck + // fail INVALID_ARTIFACT every retry. The fix re-enters the generation + // block so the artifact is actually rendered and written, breaking the + // 7/7 INVALID_ARTIFACT loop. + it('regression: auto-fix retry re-enters generation when synthesized workflowFile is missing on disk', async () => { + const { mkdtemp, rm } = await import('node:fs/promises'); + const { tmpdir } = await import('node:os'); + const { join } = await import('node:path'); + const tempDir = await mkdtemp(join(tmpdir(), 'ricky-missing-workflow-')); + + try { + const localExecutor = memoryLocalExecutorOptions({ cwd: tempDir }); + const debugSpec = [ + '# Spec: handle failing checks', + '', + 'When checks have failed or stack traces appear, route the failure', + 'to the remediation pipeline. We need to fix the broken state.', + ].join('\n'); + const phantomArtifactPath = 'workflows/generated/never-written.workflow.ts'; + const result = await runLocal( + { + source: 'workflow-artifact', + artifactPath: phantomArtifactPath, + // Marker that retryBaseRequest stamps on auto-fix-synthesized retries. + metadata: { autoFixGeneratedFrom: 'cli' }, + }, + { localExecutor, artifactReader: mockArtifactReader(debugSpec) }, + ); + + expect(localExecutor.writes.length).toBeGreaterThanOrEqual(1); + expect(result.logs.some((l) => l.includes('[local] workflow generation'))).toBe(true); + expect(result.logs.some((l) => l.includes('is not readable on disk after auto-fix retry'))).toBe(true); + } finally { + await rm(tempDir, { recursive: true, force: true }); + } + }); + + // Companion: a user-provided workflow-artifact handoff that names a + // missing file should still surface the runtime-precheck blocker rather + // than silently regenerate. (This guards against over-broadening the fix.) + it('regression: user-provided missing workflow-artifact path does NOT regenerate', async () => { + const { mkdtemp, rm } = await import('node:fs/promises'); + const { tmpdir } = await import('node:os'); + const { join } = await import('node:path'); + const tempDir = await mkdtemp(join(tmpdir(), 'ricky-user-missing-')); + + try { + const localExecutor = memoryLocalExecutorOptions({ cwd: tempDir }); + const result = await runLocal( + { + source: 'workflow-artifact', + artifactPath: 'workflows/generated/never-written.workflow.ts', + }, + { localExecutor, artifactReader: mockArtifactReader('import { workflow } from "@agent-relay/sdk/workflows";') }, + ); + + // No generation, no write — the runtime path runs (commandRunner is + // mocked so precheck is bypassed), but the gate did not regenerate. + expect(result.logs.some((l) => l.includes('[local] workflow generation'))).toBe(false); + expect(localExecutor.writes).toHaveLength(0); + } finally { + await rm(tempDir, { recursive: true, force: true }); + } + }); + it('reads workflow artifact input and routes the artifact path to the local runtime without Cloud fallback', async () => { const localExecutor = memoryLocalExecutorOptions({ stdout: ['artifact run ok'] }); const artifactReader = recordingArtifactReader('import { workflow } from "@agent-relay/sdk/workflows";'); diff --git a/src/local/entrypoint.ts b/src/local/entrypoint.ts index 97b5dc05..1c3ea7e5 100644 --- a/src/local/entrypoint.ts +++ b/src/local/entrypoint.ts @@ -1033,10 +1033,31 @@ export function createLocalExecutor(options: LocalExecutorOptions = {}): LocalEx intakeResult.routing.target, intakeResult.routing.normalizedSpec.desiredAction.workflowFileHint, ); + // Auto-fix retries (retryBaseRequest) promote response.artifacts[0].path + // to request.specPath and flip source to 'workflow-artifact' even when + // the artifact was never written to disk. That combination matches + // isExecutableWorkflowPath, so workflowFile becomes non-null and the + // routing target stays away from 'generate' because the spec content + // still describes a feature, not a debug session. Without re-entering + // generation here, runtime-precheck fails INVALID_ARTIFACT every retry + // and the auto-fix budget burns identically. + // + // We only fall through for retries we synthesized ourselves + // (autoFixGeneratedFrom is set in retryBaseRequest); user-provided + // workflow-artifact handoffs that name a missing file should still + // surface the precheck blocker rather than silently regenerate. + const isAutoFixSynthesizedRetry = typeof activeRequest.metadata?.autoFixGeneratedFrom === 'string'; + const fallThroughForMissingArtifact = isAutoFixSynthesizedRetry + && !!workflowFile + && intakeResult.routing.target !== 'generate' + && !(await workflowFileReadable(workflowFile, cwd)); let artifact: RenderedArtifact | null = null; let generationResult: GenerationResult | null = null; - if (intakeResult.routing.target === 'generate' || !workflowFile) { + if (intakeResult.routing.target === 'generate' || !workflowFile || fallThroughForMissingArtifact) { + if (fallThroughForMissingArtifact) { + logs.push(`[local] workflow file ${workflowFile} is not readable on disk after auto-fix retry; re-entering generation instead of failing precheck`); + } const executionPreference: ExecutionPreference = activeRequest.mode === 'both' ? 'auto' : 'local'; const normalizedSpec = { ...intakeResult.routing.normalizedSpec, @@ -1607,6 +1628,17 @@ function isExecutableWorkflowPath(path: string): boolean { return /(?:^|\/)workflows\/.+\.(?:ts|js)$|\.workflow\.(?:ts|js|yaml|yml)$/i.test(path); } +async function workflowFileReadable(workflowFile: string, cwd: string): Promise { + const { access } = await import('node:fs/promises'); + const resolved = isAbsolute(workflowFile) ? workflowFile : resolve(cwd, workflowFile); + try { + await access(resolved); + return true; + } catch { + return false; + } +} + function mapCoordinatorLogs(result: CoordinatorResult): string[] { return [ `[local] runtime status: ${result.status}`, diff --git a/src/local/synthetic-step-ids.ts b/src/local/synthetic-step-ids.ts new file mode 100644 index 00000000..d9f2eef0 --- /dev/null +++ b/src/local/synthetic-step-ids.ts @@ -0,0 +1,13 @@ +// Synthetic stage IDs Ricky uses to label local-runtime phases that aren't real +// Agent Relay SDK workflow steps. Forwarding them to the SDK as a `startFrom` +// resume target produces `startFrom step "..." not found in workflow`, so the +// auto-fix loop must filter them out. +export const SYNTHETIC_LOCAL_STAGE_IDS = new Set([ + 'runtime-precheck', + 'runtime-launch', + 'local-runtime', +]); + +export function isSyntheticStageId(id: string | undefined): boolean { + return !!id && SYNTHETIC_LOCAL_STAGE_IDS.has(id); +}