fix(local-auto-fix): drop synthetic stage ids from startFromStep retry#82
Conversation
When the local-runtime launch/precheck fails before any Agent Relay SDK
step starts, Ricky tags evidence with a synthetic stage id
('runtime-launch', 'runtime-precheck', or the 'local-runtime' fallback).
The auto-fix loop was reading those back out and forwarding them to the
SDK as `startFromStep` on every retry, so the SDK threw
`startFrom step "..." not found in workflow` seven times in a row before
giving up — the user saw a misleading `MISSING_BINARY at runtime-launch`
final blocker instead of the real classified failure.
Filter the synthetic ids out at the boundary in `failedStepFromEvidence`
so launch-phase failures retry from the start (with `previousRunId`
still reused, matching the cli-auto-fix-and-resume spec). Centralize the
allowlist in `src/local/synthetic-step-ids.ts` so the next stage label
added in `entrypoint.ts` has a single source of truth to register
against.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
To use Codex here, create a Codex account and connect to github. |
📝 WalkthroughWalkthroughThis PR implements filtering of synthetic stage IDs ( ChangesSynthetic Stage ID Filtering
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/local/auto-fix-loop.test.ts (1)
722-750: ⚡ Quick winAdd one mixed-failure regression case (synthetic + real failed steps).
Current cases validate “all synthetic” and “real only,” but not the mixed list scenario. A focused case would lock in the intended precedence and guard future regressions.
Suggested test addition
+ it('prefers a real failed step when evidence contains both synthetic and real failed steps', async () => { + // Build a response/evidence shape where both runtime-launch and a real SDK step fail, + // then assert retry.startFromStep uses the real step id. + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/local/auto-fix-loop.test.ts` around lines 722 - 750, Add a test for the mixed synthetic+real failure case: create a new it.each or it block (similar to the existing test that uses runWithAutoFix and runSingleAttempt) where runSingleAttempt.mockResolvedValueOnce returns a blockerResponse that includes both a synthetic stage id and a real failed step id (so the first attempt has mixed failures) and then resolves to successResponse on the next call; assert runSingleAttempt is called twice, that the retry object passed to the second call (runSingleAttempt.mock.calls[1][0].retry) does not include startFromStep (startFromStep is undefined) and that workflowRepairer.mock.calls[0][0].failedStep equals the real failed step id (confirming real-step precedence); reuse the same helpers and names from the diff (runWithAutoFix, runSingleAttempt, workflowRepairer, blockerResponse, successResponse) to locate and add the case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specs/auto-fix-synthetic-step-ids.md`:
- Around line 9-18: The fenced code block containing the ricky CLI output needs
a language tag to satisfy markdownlint MD040; update the opening fence from ```
to ```text so the block that starts with "ricky --mode local --spec-file
docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run" and includes the
Error/Execution/Auto-fix lines is annotated as text. Ensure only the opening
fence is changed and leave the block contents and closing fence as-is.
---
Nitpick comments:
In `@src/local/auto-fix-loop.test.ts`:
- Around line 722-750: Add a test for the mixed synthetic+real failure case:
create a new it.each or it block (similar to the existing test that uses
runWithAutoFix and runSingleAttempt) where
runSingleAttempt.mockResolvedValueOnce returns a blockerResponse that includes
both a synthetic stage id and a real failed step id (so the first attempt has
mixed failures) and then resolves to successResponse on the next call; assert
runSingleAttempt is called twice, that the retry object passed to the second
call (runSingleAttempt.mock.calls[1][0].retry) does not include startFromStep
(startFromStep is undefined) and that
workflowRepairer.mock.calls[0][0].failedStep equals the real failed step id
(confirming real-step precedence); reuse the same helpers and names from the
diff (runWithAutoFix, runSingleAttempt, workflowRepairer, blockerResponse,
successResponse) to locate and add the case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: a9bdd7f3-e54e-4ac6-a7ef-9015865fe7c8
📒 Files selected for processing (4)
specs/auto-fix-synthetic-step-ids.mdsrc/local/auto-fix-loop.test.tssrc/local/auto-fix-loop.tssrc/local/synthetic-step-ids.ts
| ``` | ||
| ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run | ||
| … | ||
| Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff | ||
| … | ||
| Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more | ||
| … | ||
| Execution: blocked — MISSING_BINARY at runtime-launch | ||
| Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY) | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced block (Line 9) to satisfy markdownlint MD040.
This avoids docs lint warnings/failures in CI.
Minimal fix
-```
+```text
ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
…
Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
…
Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more
…
Execution: blocked — MISSING_BINARY at runtime-launch
Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @specs/auto-fix-synthetic-step-ids.md around lines 9 - 18, The fenced code
block containing the ricky CLI output needs a language tag to satisfy
markdownlint MD040; update the opening fence from totext so the block
that starts with "ricky --mode local --spec-file
docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run" and includes the
Error/Execution/Auto-fix lines is annotated as text. Ensure only the opening
fence is changed and leave the block contents and closing fence as-is.
</details>
<!-- fingerprinting:phantom:poseidon:hawk -->
<!-- d98c2f50 -->
<!-- This is an auto-generated comment by CodeRabbit -->
Ricky Eval ReviewRun: Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
| Step | Description | Outcome |
|---|---|---|
| 1️⃣ | Parse arguments – Ricky reads --mode local and the free‑form --spec string. |
The spec is stored as a workflow contract request. |
| 2️⃣ | Enter dry‑run mode – Because --run is omitted, Ricky defaults to validation‑only mode. |
No workflow is executed; only generation and verification steps are performed. |
| 3️⃣ | Generate workflow skeleton – Using the built‑in “Package‑Check” template, Ricky creates a YAML/JSON workflow definition that matches the requested checks (lint, unit‑test, dependency audit, etc.). | The workflow file is written to ./ricky_output/<timestamp>/workflow.yaml (or displayed on stdout if --output=stdout). |
| 4️⃣ | Deterministic verification gates – Ricky runs the contract validator: • Ensures required stages ( checkout, install, lint, test, audit) are present.• Checks that each stage has a deterministic id, runs-on, and steps list.• Verifies that no circular dependencies exist. |
If |
| ...[truncated] |
**Must**
- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.
**Must Not**
- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>
**User message**
> Render Ricky's first-run CLI onboarding for a new user.
**Ricky output**
```text
**Ricky – First‑Run CLI Onboarding**
Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each step is a deterministic gate that either succeeds (green ✅) or fails with an explicit blocker message (red ❌). All artefacts are version‑controlled under `onboarding/` and the PR is scoped to `feat/onboarding‑cli‑v1`.
---
## 1️⃣ Repository Layout (branch: `feat/onboarding-cli-v1`)
onboarding/
├── README.md # High‑level onboarding guide (this file)
├── .ricky/
│ ├── config.yaml # Default config template
│ └── secrets.example # Example secrets file (git‑ignored)
├── scripts/
│ ├── 01-verify-prereqs.sh # System checks
│ ├── 02-setup-env.sh # Virtual‑env & deps
│ ├── 03-init-config.sh # Config generation
│ └── 04-verify-install.sh# Post‑install validation
└── .github/
└── workflows/
└── onboarding.yml # CI gate for the onboarding PR
All scripts are **POSIX‑sh** compatible and exit with `0` on success, non‑zero on failure.
---
## 2️⃣ Deterministic Verification Gates
| Gate | Script | Success ✅ | Failure ❌ (Blocker) |
|------|------
...[truncated]
Must
- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.
Must Not
- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)
User message
A user gives Ricky an empty spec or a missing spec file.
Ricky output
**Handling an Empty or Missing Specification File**
When a workflow author submits an empty spec or a spec file that cannot be found, follow the deterministic, contract‑driven process below. The steps are designed to keep the repository clean, surface blockers early, and enforce the 80‑to‑100 validation‑loop cadence required by the **AGENTS.md** standards.
---
## 1. Immediate Detection (Gate 1)
| Trigger | Action |
|--------|--------|
| `spec/` directory missing **or** `spec/<workflow>.yaml` exists but is zero bytes | ‑ Fail the *Spec‑Presence* gate. <br>‑ Log a **Blocker** in the CI run with the tag `#spec-missing`. <br>‑ Post a comment on the PR (or create a draft PR) that includes the exact error message and a link to the “Spec‑Missing” checklist. |
**Error message template**
🚨 BLOCKER: Spec file not found or empty.
Path: spec/.yaml
Reason: Required for deterministic validation and contract generation.
Action: Provide a non‑empty spec or delete the placeholder PR.
---
## 2. Automated Recovery Stub (Gate 2)
If the author is unavailable (e.g., CI runs on a schedule), generate a **minimal stub** so the repository stays buildable.
1. **Create** `spec/<wo
...[truncated]
Must
- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.
Must Not
- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)
User message
Render
ricky statuswhen no provider checks have proven Google or GitHub are connected.
Ricky output
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**
=== AgentWorkforce Status Report ===
🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)
🔧 Last Run
• Workflow Engine : idle
• Last Execution : N/A
🗂 Configuration
• Active Branches : main
• Pending PRs : 0
• Validation Loops : 0 / 80‑100 (not started)
• Google provider connection not confirmed.
• GitHub provider connection not confirmed.
• Workflow execution halted until at least one provider is verified.
📝 Next Steps
- Run provider health checks:
ricky check googlericky check github
- Resolve any authentication or network issues reported.
- Re‑run
ricky statusto confirm connectivity before triggering workflows.
--- End of Report ---
Must
- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.
Must Not
- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)
User message
Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.
Ricky output
**Workflow: “GitHub Primitive Change”**
| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` HEAD, ticket ID (e.g., `GH‑1234`) | New branch `feat/gh-1234‑<short‑desc>` | `git branch -c` succeeds; branch name matches regex `^feat/gh-\d+-[a-z0-9-]+$` |
| 2️⃣ | **Ricky (GitOps)** | **Sync latest `main`** | Branch from step 1 | Branch up‑to‑date with `main` | `git merge-base --is-ancestor main <branch>` passes |
| 3️⃣ | **Developer** | **Implement change** | Branch, code spec, test plan | Modified files, new tests | Local lint (`npm run lint` / `flake8`) returns 0 |
| 4️⃣ | **Ricky (CI‑Orchestrator)** | **Run pre‑commit CI** | Commit(s) from step 3 | CI status badge | CI pipeline (`.github/workflows/ci.yml`) must report **success** within 5 min |
| 5️⃣ | **Ricky (Review‑Gatekeeper)** | **Open PR** | Branch, PR template, reviewers list | PR # opened | PR title matches `^<type>(\(scope\))?: .+` and description contains “Fixes #<ticket>” |
| 6️⃣ | **Ricky (Review‑Coordinator)** | **Assign reviewers** | PR #, reviewer matrix (code‑owner, security
...[truncated]
Must
- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.
Must Not
- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)
User message
Generate a workflow where the spec says "use Claude to review and Codex to implement".
Ricky output
**Workflow Title:** `review‑and‑implement‑claude‑codex`
**Purpose**
Automate a two‑stage development cycle where Claude (LLM‑review) validates design/spec changes and Codex (code‑gen) produces the implementation. The workflow is deterministic, auditable, and bounded by explicit verification gates.
---
## 1. Contract Overview
| Element | Definition |
|---------|------------|
| **Input** | `spec.yaml` – a validated OpenAPI‑style specification (JSON/YAML) describing the feature or bug fix. |
| **Outputs** | 1. `review_report.md` – Claude’s review with acceptance criteria and change requests.<br>2. `implementation.zip` – generated source files from Codex.<br>3. `ci_report.json` – CI test results for the generated code. |
| **Success Criteria** | • Claude returns `status: APPROVED` with no “BLOCKER” items.<br>• Codex produces code that passes all CI tests (0 failures, coverage ≥ 80%). |
| **Failure Modes** | • Claude returns `status: BLOCKED` → workflow aborts, creates a `BLOCKER` issue.<br>• CI fails → automatic rollback to previous stable commit and open a `RETRY` PR. |
---
## 2. Branch / PR Boundaries
| Stage | Branch | PR Title Template |
|-------|--------|-----------------
...[truncated]
Must
- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write
tool-selection.jsonor equivalent audit metadata.
Must Not
- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)
User message
Use
--refineto sharpen a generated workflow's step task descriptions and acceptance gates.
Ricky output
### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates
`--refine` is the **deterministic post‑processing hook** that takes a freshly generated workflow (JSON/YAML) and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. It is meant to:
1. **Make the intent of each task crystal‑clear** (who, what, why, and any required inputs).
2. **Add concrete, testable acceptance criteria** that can be automatically verified by the CI‑gate.
3. **Keep the contract stable** so downstream reviewers and bots have a single source of truth.
Below is a step‑by‑step contract for invoking `--refine`, the expected input/output schema, and the verification gates you should embed in your CI pipeline.
---
## 1. Invocation Contract
```bash
agentworkforce generate workflow.yaml --refine > refined-workflow.yaml
| Parameter | Required? | Description |
|---|---|---|
workflow.yaml |
✅ | The raw workflow file produced by any generator (e.g., agentworkforce generate). Must conform to the Workflow v2 schema (see schemas/workflow_v2.json). |
--refine |
✅ | Switch that triggers the refinement pass. |
| `> refi | ||
| ...[truncated] |
**Must**
- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.
**Must Not**
- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>
**User message**
> Generate a workflow for the `ricky --version` spec.
**Ricky output**
```text
**Workflow: `ricky --version` Specification & Release Pipeline**
*(Ricky AgentWorkforce – v0.9.0)*
---
### 1. Scope & Objective
Create a deterministic, test‑driven workflow that defines, validates, and releases the `ricky --version` command. The command must:
1. Print the current Ricky version string in **semantic‑version** format (`MAJOR.MINOR.PATCH`).
2. Exit with status **0** on success, **1** on any error (e.g., corrupted version file).
3. Be reproducible across all supported platforms (Linux, macOS, Windows) and CI environments.
### 2. Contract (Input / Output)
| Element | Description |
|---------|-------------|
| **Command** | `ricky --version` |
| **Input** | No positional arguments, no environment variables required. |
| **Output** | `stdout`: `<MAJOR>.<MINOR>.<PATCH>` (e.g., `2.4.1`) <br> `stderr`: empty on success, error message on failure. |
| **Exit Code** | `0` on success, `1` on failure (e.g., missing `VERSION` file, malformed version). |
| **Side‑effects** | None (read‑only). |
| **Platform Guarantees** | Identical output on Linux (glibc), macOS (Darwin), Windows (PowerShell/CMD). |
### 3. Source of Truth
- `src/ricky/__init__.py` contains `__version__ =
...[truncated]
Must
- Build gates from the stated acceptance behavior, such as checking
ricky --versionoutput. - Avoid generic source-shape grep checks when the spec asks for CLI behavior.
- Keep generated validation meaningful for the current repo shape.
Must Not
- Treat
grep -Eq 'export|function|class|workflow(' dist/ricky.jsas proof of version behavior. - Claim the workflow is proven by source syntax alone.
- Ignore the package-json version resolution order in the spec.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)
User message
Generate a workflow for many independent artifacts with a validation/fix/rerun loop.
Ricky output
**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, model checkpoints) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all artifacts reach a **PASS** state or are reported as **BLOCKED**.
---
## 1. Scope & Boundaries
| Aspect | Definition |
|--------|------------|
| **Artifacts** | Any unit that can be validated independently (e.g., `*.json`, `*.py`, `*.ckpt`). |
| **Batch size** | Configurable; default 100 artifacts per PR/branch. |
| **Branch/PR** | Each batch lives in its own feature branch `batch/<batch‑id>` and opens a PR titled `Batch <batch‑id>: Validation Loop`. |
| **Owner** | `@team‑artifact‑ops`. |
| **Gatekeeper** | CI job `validation‑gate` that must pass before merge. |
| **Blocker** | If an artifact cannot be auto‑fixed after N attempts, it is labeled `blocker` and the PR is halted for manual triage. |
---
## 2. Contract (YAML) – Machine‑readable definition
```yaml
name: batch-artifact-validation-loop
description: >
Deterministic validation/fix/rerun loop for independent artifacts.
version: 1.0.0
inputs:
-
...[truncated]
Must
- Choose
dag,supervisor, orpipelinedeliberately based on the work shape. - Explain the pattern choice in artifact metadata or a rationale.
- Use
dagfor validation/fix/rerun loops when dependencies matter.
Must Not
- Default blindly to
dagfor every workflow. - Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)
User message
Describe how selected workflow-writing skills affected a generated Ricky workflow.
Ricky output
### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow
| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|-------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step declares **inputs, outputs, and success criteria** in a typed JSON schema. | The workflow begins with a `contract` block that lists required parameters (`sourceRepo`, `targetBranch`, `reviewers`) and the exact shape of the artifact (`prUrl: string`). Down‑stream steps can reference these contracts deterministically, eliminating ambiguous “magic strings”. |
| **Deterministic Verification Gates** | After each logical unit, a **gate** runs a pure‑function check that must return `true` before the pipeline proceeds. | After the *lint* step, a `gate lintPasses` runs `npm run lint && test -f lint-report.json`. If the gate fails, the workflow aborts with a clear `BLOCKER: Lint errors`. This prevents downstream steps (e.g., `build`) from executing on broken code. |
| **Review Artifacts Generation** | Mandates creation of **human‑readable artifacts** (HTML reports, SARIF files) that are automatically attached to
...[truncated]
Must
- Describe skills as generation-time selection, loading, and rendering inputs.
- Point to metadata such as
loaded-skills.txtandskill-application-boundary.json. - Say the workflow instructions were informed by selected skills.
Must Not
- Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
- Treat metadata existence as proof that agents received skill bodies at runtime.
- Overstate current tests beyond the generation boundary.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)
User message
Generate a workflow from a vague product spec and return it to the user.
Ricky output
**Ricky – Workflow Generation Assistant**
---
### 📋 Quick Summary
To turn a product specification into a reliable, review‑ready workflow we need a concrete set of inputs. With a vague spec we can’t safely generate deterministic contracts, verification gates, or review artifacts. Below is a concise checklist of the minimal details required to produce a complete workflow that meets our standards (80‑100 validation loops, scoped PR boundaries, blocker reporting, etc.).
---
## Required Specification Details
| Category | What to Provide | Why It Matters |
|----------|----------------|----------------|
| **Goal & Scope** | One‑sentence description of the product’s primary outcome and the functional boundaries of the workflow. | Defines the workflow’s start/end points and limits the PR scope. |
| **Actors & Roles** | List of all human or system actors (e.g., `frontend`, `payment‑service`, `audit‑team`). | Determines responsibility assignments and review gate owners. |
| **Key Steps / Milestones** | Ordered list of high‑level steps (e.g., `collect‑input → validate → persist → notify`). | Forms the backbone of the workflow DAG. |
| **Inputs & Outputs** | For each step: input artifac
...[truncated]
Must
- Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
- Validate with dry-run or targeted structural checks where possible.
- Return artifacts, warnings, and follow-up commands honestly.
Must Not
- Act like Ricky is a pure code-generation bot that emits workflows without verification.
- Stop at "code compiles" as the proof bar.
- Skip skill-aware workflow authoring guidance for serious workflows.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)
User message
A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.
Ricky output
**Ricky’s Immediate Response to the Failure**
| Phase | Action | Rationale | Artifact |
|-------|--------|-----------|----------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (step Y). <br>• Append a **Failure Log** entry to the workflow run’s audit trail with timestamps, error codes, and the worker’s last heartbeat. | Provides an immutable record for post‑mortem and for the deterministic verification gate. | `audit.log` entry, `run‑<id>.json` update |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the orchestration DB. <br>• Emit a **Blocker Notification** (Slack/Teams) that includes: <br> – Step name, timeout value, actual duration <br> – Worker ID, last‑seen artifact hash (if any) <br> – Suggested “repair‑branch” name (e.g., `repair/timeout‑X‑idle‑Y`). | Guarantees that downstream consumers do not consume partial output and that the team is immediately aware. | `BLOCKER.md` (auto‑generated) |
| 3️⃣ Deterministic Verification Gate | • Insert a **Gate** after step X that checks: <br> – `step_X.status == SUCCESS` <br> – `step_X.duration <= timeout_limit` <br>• Insert a **Gate** after step Y that checks: <br> – `artifact_p
...[truncated]
Must
- Classify the failure before deciding whether to retry.
- Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
- Preserve raw evidence and uncertainty when the class is not known.
Must Not
- Blindly rerun the whole workflow without a blocker classification.
- Treat every failure as a broken workflow definition.
- Claim Ricky fixed the workflow before rerun evidence proves it.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)
User message
Ricky detects stale
.agent-relay/,.relay/, and.trajectories/state before launching a local workflow.
Ricky output
**Workflow Contract – Pre‑Launch Stale‑State Guard**
| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development machine, executed **before** any workflow launch (`ricky run …`). |
| **Inputs** | <ul><li>`workspaceRoot: string` – absolute path to the repository root (inferred from `process.cwd()`).</li></ul> |
| **Outputs** | <ul><li>`isClean: boolean` – `true` if no stale state is found.</li><li>`stalePaths: string[]` – list of directories/files that are considered stale.</li></ul> |
| **Success Criteria** | `isClean === true`. If `false`, the workflow must abort and emit a **Blocker Report** (see “Blocker Reporting” below). |
| **Deterministic Verification Gate** | The guard must be **pure** with respect to the file‑system snapshot taken at start‑up: given the same snapshot it always returns the same `isClean`/`stalePaths`. No network calls, no random values. |
| **Idempotence** | Re‑running the guard without intervening changes yields identical output. |
| **Performance SLA** | Must complete ≤ 200 ms on a typical dev box (SSD, ≤ 10 k files). |
| **Error Handling** | Any unexpected I/O error is treated as a
...[truncated]
Must
- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.
Must Not
- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)
User message
A run marker says another Ricky or Relay run is already active in this workspace.
Ricky output
**Fixing “Another Ricky or Relay run is already active in this workspace”**
The marker that blocks a new run is a safety file created by the **Ricky** CLI to prevent overlapping executions. When the file is left behind (e.g., a crash, a killed process, or a stale CI workspace) the next invocation aborts with the message you’re seeing.
Below is the deterministic, step‑by‑step contract for clearing the marker safely, verifying that no other run is truly active, and getting your workspace back to a clean state.
---
## 1️⃣ Verify that no Ricky/Relay process is actually running
| Check | Command | Expected result |
|-------|---------|-----------------|
| Local OS process list | `ps aux \| grep -i ricky` | No `ricky` or `relay` PID other than the `grep` line |
| Docker (if you use containerised runs) | `docker ps \| grep ricky` | No container named `ricky‑run` (or similar) |
| CI runner (GitHub Actions, GitLab CI, etc.) | Look at the pipeline UI for a running job in the same branch | No active job labeled “Ricky” or “Relay” |
If you **do** see a live process, let it finish or kill it deliberately (e.g., `kill -SIGTERM <PID>`). Do **not** delete the marker while a real run is in pro
...[truncated]
Must
- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.
Must Not
- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)
User message
Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.
Ricky output
**Local Workflow Execution – Auto‑Fix Run**
Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in `./artifacts/<workflow‑id>.json`.
---
## 1️⃣ Preconditions (verify before you start)
| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI version** | `aw --version` | `>=2.4.0` (auto‑fix introduced in v2.4) |
| **Workflow definition** | `cat ./artifacts/<workflow-id>.json` | Valid JSON, contains `steps` array |
| **Previous run ID** | `echo $PREV_RUN_ID` | Non‑empty string (e.g., `run‑20240531‑a1b2c3`) |
| **Writable workdir** | `test -w ./workdir && echo ok` | `ok` |
| **Docker / sandbox** (if workflow uses containers) | `docker ps` | Docker daemon reachable |
If any check fails, abort and fix the environment before proceeding.
---
## 2️⃣ Execution contract
```bash
#!/usr/bin/env bash
set -euo pipefail # deterministic failure handling
IFS=$'\n\t'
# ---- INPUTS -------------------------------------------------
WORKFLOW_ID="my‑workflow‑123" # <-- replace
ARTIFA
...[truncated]
Must
- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.
Must Not
- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)
User message
A user runs
ricky run workflows/foo.ts --no-auto-fixand the workflow fails.
Ricky output
**Quick‑Start Debug Checklist**
1. **Capture the exact failure**
```bash
ricky run workflows/foo.ts --no-auto-fix --log-level=debug > run.log 2>&1
run.lognow contains the full console output, the serialized workflow graph, and the raw error payloads.
-
Locate the failing node
Openrun.logand search for the line that starts with✖︎(Ricky’s failure marker). It will look like:✖︎ Node: <node-id> Step: <step-name> Error: <error‑message>Note the node‑id and step‑name – they map 1‑to‑1 to the contract entry in
workflows/foo.ts. -
Validate the contract for that node
// workflows/foo.ts export const foo = workflow({ input: z.object({ /* … */ }), output: z.object({ /* … */ }) }, async (ctx) => { // … });
- Ensure the input schema matches what the previous node actually emits.
- Ensure the output schema matches the expectations of downstream nodes.
If the schemas diverge, the deterministic verification gate will raise a
ContractMismatchError. Fix the schema or add a transformation step. -
Run the node in isolation
ricky
...[truncated]
**Must**
- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.
**Must Not**
- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>
**User message**
> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.
**Ricky output**
```text
**Executing a Local TypeScript Workflow Artifact in the Primary Local Path**
Below is the concrete, step‑by‑step contract that Ricky (the AgentWorkforce workflow reliability assistant) follows to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The procedure is deterministic, auditable, and produces the required verification artifacts.
---
## 1. Preconditions (Input Contract)
| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single TypeScript file (or an `index.ts` entry point) that exports a `run()` async function. | `./workflows/<workflow‑name>/` |
| **Node runtime** | Node ≥ 18, with `npm`/`pnpm`/`yarn` available. | System PATH |
| **Package manifest** | `package.json` present in the workflow directory (or inherited from repo root). Must list all runtime dependencies. | `./workflows/<workflow‑name>/package.json` |
| **TypeScript config** | `tsconfig.json` present (or fallback to repo root). Must target ES2022 or later. | `./workflows/<workflow‑name>/tsconfig.json` |
| **Execution context** | Optional `.env` file for secrets; must be validated against the workflow’s `envSche
...[truncated]
Must
- Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the
agent-relaybinary on PATH. - Precheck that Node and
@agent-relay/sdkare resolvable for the workflow. - Record the actual spawn command in execution evidence.
Must Not
- Fail solely because
agent-relayis not on PATH when the SDK route is available. - Hide the actual runtime command from evidence.
- Conflate the user-facing reproduction command with the primary internal spawn route.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)
User message
Ricky reaches a boundary after a structural failure persists after a fix attempt.
Ricky output
**Ricky’s Boundary‑Escalation Workflow for Persistent Structural Failures**
When a structural failure remains after an initial fix attempt, Ricky follows a deterministic, contract‑driven escalation path. The process is bounded, auditable, and designed to surface blockers early while preserving branch and PR hygiene.
---
## 1. Failure Detection & Initial Fix Attempt
| Step | Action | Artifact |
|------|--------|----------|
| 1.1 | Run the **Structural‑Integrity Test Suite (SITS)** on the target branch. | `sits-report.json` |
| 1.2 | If any test fails, create a **Fix‑Attempt Ticket (FAT‑\<id\>)** in the workflow tracker. | `FAT‑<id>.md` (includes failing test IDs, logs, and proposed code change). |
| 1.3 | Apply the fix in a **dedicated “fix‑attempt” branch** off the target branch. | `fix-attempt/<target‑branch>/<FAT‑id>` |
| 1.4 | Submit a **Fix‑Attempt PR (FA‑PR)** against the target branch. | PR title: `FA‑<id>: <short description>` |
| 1.5 | Gate: **FA‑PR must pass all SITS checks** (automated CI). If it passes, merge and close FAT. If it fails, proceed to Step 2.
---
## 2. Persistent Failure – Boundary Detection
Ricky automatically detects that the failure persists after
...[truncated]
Must
- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.
Must Not
- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)
User message
Produce a workflow health digest from many Ricky workflow runs.
Ricky output
**Ricky Workflow Health Digest – Template & Example**
Below is a ready‑to‑use digest format that aggregates the health of multiple Ricky workflow runs. It follows the **Ricky repository conventions** (AGENTS.md), includes **deterministic verification gates**, **review artifacts**, and highlights any **blockers**. Use the template for any set of runs; replace the placeholder data with the actual values from your CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.).
---
## 1. Digest Overview
| Metric | Value | Target / SLA |
|--------|-------|--------------|
| Total Runs (last 24 h) | `{{total_runs}}` | — |
| Successful Runs | `{{success_count}}` ({{success_pct}} %) | ≥ 95 % |
| Failed Runs | `{{fail_count}}` ({{fail_pct}} %) | ≤ 5 % |
| Blocked Runs* | `{{blocked_count}}` | 0 |
| Avg. Duration | `{{avg_duration}}` | ≤ 10 min |
| Verification Gate Pass Rate | `{{gate_pass_pct}}` % | ≥ 98 % |
| Review Artifacts Produced | `{{review_artifacts}}` | ≥ 1 per PR |
\*A *blocked run* is one that stopped at a deterministic verification gate because a required artifact or external dependency was missing.
---
## 2. Run‑Level Summary (chronological)
| Run ID | Branch | Trigger | Statu
...[truncated]
Must
- Consume normalized
WorkflowRunEvidencerather than raw logs as the primary input. - Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.
Must Not
- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)
User message
Design a new Ricky Slack surface for workflow debugging.
Ricky output
**Ricky Slack Surface – “Workflow Debugger”**
*Version 0.1 – Design Specification*
---
## 1. Overview
The **Workflow Debugger** Slack surface gives engineers, product managers, and ops a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).
---
## 2. Surface Entry Points
| Trigger | Slack Command | Payload | Result |
|---------|---------------|---------|--------|
| **Start Debug Session** | `/ricky debug <workflow‑id>` | `{workflowId, requesterId, channelId}` | Bot posts an interactive message with session controls. |
| **Step Forward** | Button “▶️ Step” | `{sessionId, stepIndex}` | Executes next step, returns deterministic output. |
| **Step Back** | Button “◀️ Back” | `{sessionId, stepIndex}` | Re‑runs previous step from stored snapshot. |
| **Inject State** | Button “🔧 Inject” → Modal | `{sessionId, key, value}` | Updates in‑memory state for the next step. |
| **Abort** | Button
...[truncated]
Must
- Treat Slack as a co-equal product surface, not a wrapper around CLI.
- Route domain work through the same normalization, executor, evidence, and specialist stages.
- Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.
Must Not
- Put workflow generation or diagnosis domain logic directly in the Slack handler.
- Degrade Slack to a developer shortcut with weaker routing than CLI.
- Skip signature verification, dedup, or thread handling when the surface is implemented.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)
User message
Add a new web handoff type that submits a workflow spec and mode.
Ricky output
**New Web‑Handoff Type – `workflowSubmit`**
| Element | Definition |
|---------|------------|
| **Name** | `workflowSubmit` |
| **Category** | Web → AgentWorkforce handoff |
| **Purpose** | Allows a client‑side UI (or any web consumer) to submit a complete workflow specification together with an execution *mode* (e.g., `dryRun`, `live`, `test`). The backend validates the spec, stores it in the workflow registry, and returns a deterministic handoff receipt. |
| **Payload Schema** | ```json { "type": "object", "properties": { "workflowId": { "type": "string", "pattern": "^[a-z0-9\\-]{3,64}$" }, "spec": { "$ref": "#/definitions/WorkflowSpec" }, "mode": { "type": "string", "enum": ["dryRun","live","test"] }, "metadata": { "type": "object", "additionalProperties": true } }, "required": ["workflowId","spec","mode"], "additionalProperties": false } ```<br>**`WorkflowSpec`** follows the existing internal spec contract (see `specs/workflow.yaml`). |
| **Response Contract** | ```json { "type": "object", "properties": { "receiptId": { "type": "string", "format": "uuid" }, "status": { "type": "string", "enum": ["accepted","rejected"] }, "validationErrors": { "type": "array", "items": { "type"
...[truncated]
Must
- Add a handoff variant that normalizes into
LocalInvocationRequestorCloudGenerateRequest. - Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.
Must Not
- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)
User message
A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.
Ricky output
**Decision Matrix for the Claude‑to‑Ricky Hand‑off**
| **Input Characteristics** | **Recommended Ricky Action** | **Rationale** |
|----------------------------|------------------------------|---------------|
| **Complete, version‑controlled workflow spec** (YAML/JSON) **+ clear intent** (e.g., “run this pipeline”) | **Generate** | The spec is ready for materialization. Create a new branch, add the workflow files, and open a PR with the standard contract (see *AGENTS.md*). |
| **Spec contains syntax errors, missing required fields, or failing schema validation** | **Debug** | Run the deterministic verification gate (schema + unit‑test harness). Return a blocker report with line‑by‑line fixes. |
| **Existing workflow has been deployed but is stuck, timed‑out, or producing incorrect outputs** | **Restart** | Trigger a controlled restart: create a “restart‑<timestamp>” branch, add a `restart` flag in the workflow metadata, and schedule a fresh execution after a clean‑up step. |
| **User supplies logs, execution traces, or performance metrics without a new spec** | **Analyze** | Perform a post‑mortem: run the verification gate in *analysis mode*, generate a diagnostic report, and sugge
...[truncated]
Must
- Normalize the tool call through the MCP/Claude handoff path.
- Treat conversation context as advisory metadata.
- Make the spec itself sufficient for routing whenever possible.
Must Not
- Require the user to rewrite the spec manually as a workflow.
- Put domain routing logic inside the MCP tool handler itself.
- Ignore provided mode, workflow path, or prior decisions metadata.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)
User message
Change the Cloud Ricky generate API response to include a new field and change one existing error code name.
Ricky output
**Scope**
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_FAILED`** in the error‑response schema and all references.
All changes are limited to the **`cloud-ricky`** service repository (no cross‑service impact).
---
## 1. Contract Update (API Specification)
| Element | Current | Updated |
|---------|---------|---------|
| **Success payload** | ```json { "output": "...", "model": "gpt‑4", "usage": { … } } ``` | ```json { "output": "...", "model": "gpt‑4", "usage": { … }, "requestId": "550e8400-e29b-41d4-a716-446655440000" } ``` |
| **Error object** | ```json { "code": "INVALID_PROMPT", "message": "Prompt is malformed." } ``` | ```json { "code": "PROMPT_VALIDATION_FAILED", "message": "Prompt is malformed." } ``` |
*Schema files* (`openapi.yaml`, `components/schemas/GenerateResponse.yaml`, `components/schemas/ErrorResponse.yaml`) must be updated accordingly.
**Version bump** – bump the OpenAPI minor version: `v1.2.0 → v1.3.0`.
---
## 2. Code Changes
| File | Change |
|------|--------|
| `src/handlers/generateHandler.ts` | - Ge
...[truncated]
Must
- Allow additive response fields within
/api/v1/ricky/.... - Reject renaming or changing the meaning of an existing error code inside the same version.
- Require a new API version for breaking request/response changes.
Must Not
- Remove or rename existing response fields in v1.
- Add a
Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.
Summary
ricky --mode local --spec-file <path> --runretries blew up withstartFrom step "local-runtime"/"runtime-launch" not found in workflowwhenever the workflow process failed before any SDK step ran. The auto-fix loop was reading Ricky's internal stage labels back out of evidence and forwarding them to the SDK as--start-from <synthetic>. The SDK only knows real workflow steps, so seven retries failed identically and the user saw a misleadingMISSING_BINARY at runtime-launchfinal blocker instead of the real classified failure.failedStepFromEvidenceso launch-phase failures retry from the start (withpreviousRunIdstill reused, matchingspecs/cli-auto-fix-and-resume.md).src/local/synthetic-step-ids.ts(runtime-precheck,runtime-launch,local-runtime) so the next stage label added inentrypoint.tshas a single source of truth to register against.Repro
From
cloud@codex/msd-shared-sandbox-review-runtime-spec:Spec
Full spec under
specs/auto-fix-synthetic-step-ids.mdin this PR.Test plan
npx vitest run src/local/auto-fix-loop.test.ts— 27 tests pass, including new parameterized cases for each synthetic id and directisSyntheticStageIdcoveragenpm test(full suite) — 1028 tests pass across 49 filesnpm run typecheck— cleanstartFrom step "..." not found in workflowerrors; final blocker (if any) reflects the real classified failure