fix(auto-fix): re-enter generation when synthesized retry artifact is missing#83
fix(auto-fix): re-enter generation when synthesized retry artifact is missing#83kjgbot wants to merge 2 commits into
Conversation
When the local-runtime launch/precheck fails before any Agent Relay SDK
step starts, Ricky tags evidence with a synthetic stage id
('runtime-launch', 'runtime-precheck', or the 'local-runtime' fallback).
The auto-fix loop was reading those back out and forwarding them to the
SDK as `startFromStep` on every retry, so the SDK threw
`startFrom step "..." not found in workflow` seven times in a row before
giving up — the user saw a misleading `MISSING_BINARY at runtime-launch`
final blocker instead of the real classified failure.
Filter the synthetic ids out at the boundary in `failedStepFromEvidence`
so launch-phase failures retry from the start (with `previousRunId`
still reused, matching the cli-auto-fix-and-resume spec). Centralize the
allowlist in `src/local/synthetic-step-ids.ts` so the next stage label
added in `entrypoint.ts` has a single source of truth to register
against.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… missing When `ricky --mode local --spec-file <foo>.md --run` runs against a spec whose content the parser scores as `debug` intent (any spec describing "failed" / "failure" / "broken" without explicit "create/build/author workflow" phrasing), and attempt 1 produces a generation artifact in memory but doesn't actually write it to disk (e.g. validation failed, workforce-persona writer threw, an early-return path fired), the auto-fix loop synthesizes a retry via retryBaseRequest that: - flips `source` to 'workflow-artifact' - sets `specPath` to the never-written `workflows/generated/...ts` - stamps `metadata.autoFixGeneratedFrom = <prior source>` On the retry, `workflowFileForRoute` returns specPath because it matches `isExecutableWorkflowPath`, the routing target stays at 'debug' (spec content is unchanged), and the gate `target === 'generate' || !workflowFile` evaluates false / SKIPS generation. Runtime-precheck then fails INVALID_ARTIFACT because the file isn't on disk. All 7 retries fail identically. Repro from cwd /Users/khaliqgant/Projects/My-Senior-Dev/app: ricky --mode local --spec-file docs/plans/2026-05-05-proactive-pr-remediation-spec.md --run Generation: ok - workflows/generated/ricky-...ts Execution: blocked - INVALID_ARTIFACT at runtime-precheck Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT) Fix: when the request was synthesized by an auto-fix retry (autoFixGeneratedFrom is set) AND the workflowFile doesn't exist on disk AND routing isn't already 'generate', fall through to the generation block instead of letting precheck fail. User-provided workflow-artifact handoffs that name a missing file still surface the runtime-precheck blocker (preserved by the existing artifact-run regression tests at src/local/entrypoint.test.ts:3703 and :3862). Two new regression tests cover both branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR prevents synthetic local stage IDs ( ChangesFix Synthetic Step ID Propagation in Auto-Fix Retries
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/local/entrypoint.test.ts (1)
2417-2417: ⚡ Quick winTighten the assertion to prove a workflow artifact was written.
The current check passes on any write; this regression is stronger if it asserts a generated workflow artifact write specifically.
Proposed test assertion tweak
- expect(localExecutor.writes.length).toBeGreaterThanOrEqual(1); + expect(workflowArtifactWrites(localExecutor.writes)).toHaveLength(1);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/local/entrypoint.test.ts` at line 2417, Replace the loose length check on localExecutor.writes with an assertion that specifically verifies a workflow artifact was written: inspect localExecutor.writes for an entry whose filename/path matches the generated workflow artifact (e.g., contains "workflow" or the known artifact extension produced by your workflow generator) and assert that such an entry exists (use localExecutor.writes and an array/object matching assertion against the entry.path or entry.name to prove the workflow artifact was created).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@specs/auto-fix-synthetic-step-ids.md`:
- Around line 9-18: The fenced code block showing the ricky run output is
missing a language tag and triggers MD040; update the triple-backtick fence that
wraps the sample log to include a language identifier (e.g., "text" or "bash")
so it becomes ```text (or ```bash) and ensure the block content remains
unchanged so linting passes and formatting is preserved.
---
Nitpick comments:
In `@src/local/entrypoint.test.ts`:
- Line 2417: Replace the loose length check on localExecutor.writes with an
assertion that specifically verifies a workflow artifact was written: inspect
localExecutor.writes for an entry whose filename/path matches the generated
workflow artifact (e.g., contains "workflow" or the known artifact extension
produced by your workflow generator) and assert that such an entry exists (use
localExecutor.writes and an array/object matching assertion against the
entry.path or entry.name to prove the workflow artifact was created).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 126dcff0-1cda-4217-8d47-a01a0740a216
📒 Files selected for processing (6)
specs/auto-fix-synthetic-step-ids.mdsrc/local/auto-fix-loop.test.tssrc/local/auto-fix-loop.tssrc/local/entrypoint.test.tssrc/local/entrypoint.tssrc/local/synthetic-step-ids.ts
| ``` | ||
| ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run | ||
| … | ||
| Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff | ||
| … | ||
| Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more | ||
| … | ||
| Execution: blocked — MISSING_BINARY at runtime-launch | ||
| Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY) | ||
| ``` |
There was a problem hiding this comment.
Add a language to the fenced code block to satisfy markdown linting.
This block should include a language (e.g., text or bash) to address MD040.
Suggested markdown fix
-```
+```text
ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
…
Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
…
Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more
…
Execution: blocked — MISSING_BINARY at runtime-launch
Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 9-9: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@specs/auto-fix-synthetic-step-ids.md` around lines 9 - 18, The fenced code
block showing the ricky run output is missing a language tag and triggers MD040;
update the triple-backtick fence that wraps the sample log to include a language
identifier (e.g., "text" or "bash") so it becomes ```text (or ```bash) and
ensure the block content remains unchanged so linting passes and formatting is
preserved.
Ricky Eval ReviewRun: Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
| Step | Action | Outcome |
|---|---|---|
| Parse arguments | --mode local tells Ricky to operate against the local repository; --spec supplies the natural‑language request. |
Ricky loads the spec‑to‑workflow compiler. |
| Spec translation | The spec string ("generate a workflow for package checks") is sent to the Spec‑Engine which produces a workflow contract (YAML/JSON) that follows the repository’s workflow/ schema. |
You receive a draft workflow file (e.g., wf-package-checks.yaml) placed in workflows/drafts/. |
| No execution | Because --run is omitted, Ricky stops after generation. No validation, no CI jobs, no side‑effects on the codebase. |
The workflow is only written to disk; nothing is applied, merged, or executed. |
| Exit status | Returns 0 on successful generation, non‑zero if the spec could not be parsed or the contract violates schema rules. |
You can check $? to confirm success. |
2. What you ge
...[truncated]
**Must**
- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.
**Must Not**
- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>
**User message**
> Render Ricky's first-run CLI onboarding for a new user.
**Ricky output**
```text
**Ricky – First‑Run CLI Onboarding**
Below is the complete, copy‑and‑paste ready onboarding flow that a new user should run the first time they invoke the `ricky` command‑line interface. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).
---
## 1️⃣ Create a Scoped Onboarding Branch
```bash
git checkout -b onboarding/first‑run‑<your‑username>
All changes made by the onboarding script are confined to this branch, making review and rollback trivial.
2️⃣ Run the Interactive Bootstrap Script
ricky onboarding --initWhat the script does (contract)
| Step | Action | Deterministic verification gate | Expected output |
|---|---|---|---|
| 1️⃣ | Detect OS & architecture | ricky env --check returns OK |
Environment OK |
| 2️⃣ | Install required runtimes (Python 3.11+, Node 20) | ricky runtime --install exits 0 |
Runtimes installed |
| 3️⃣ | Create a local config directory (~/.ricky) |
test -d ~/.ricky → |
|
| ...[truncated] |
**Must**
- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.
**Must Not**
- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>
**User message**
> A user gives Ricky an empty spec or a missing spec file.
**Ricky output**
```text
**Ricky’s Standard Response to an Empty or Missing Spec File**
---
### 1. Immediate Detection & Blocker Report
1. **Detect** – As soon as the workflow runner attempts to load the spec (`workflow.yaml` or `spec.json`) and receives a *null* or *file‑not‑found* response, raise a **Blocker**.
2. **Report** – Emit a deterministic `BLOCKER` event with the payload:
```json
{
"type": "MISSING_SPEC",
"detail": "Spec file not found or empty at path: ./specs/<workflow‑id>.yaml",
"action": "Create placeholder spec and request author input"
}
This event is logged to the Workflow‑Gate dashboard and sent to the assigned reviewer (owner of the repo branch).
2. Automated Placeholder Generation (Deterministic Contract)
| Step | Action | Output Artifact |
|---|---|---|
| 2.1 | Create a minimal stub in the same location with the required top‑level keys (name, description, steps). |
specs/<workflow‑id>.yaml |
| 2.2 | Populate stub with TODO markers and a unique spec‑id (UUID). | |
| 2.3 | Add a # GENERATED BY RICKY header and a timestamp. |
|
| 2.4 | Open a draft PR titled `WIP: placeholder sp | |
| ...[truncated] |
**Must**
- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.
**Must Not**
- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.status-does-not-invent-provider-state</code> (cli-behavior/manual)</summary>
**User message**
> Render `ricky status` when no provider checks have proven Google or GitHub are connected.
**Ricky output**
```text
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**
=== AgentWorkforce Status Report ===
🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)
✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Logger : RUNNING
🔧 Configuration
• Active Branches : main, workflow‑dev
• Last Deploy : 2026‑04‑28 14:12 UTC
• Feature Flags : all defaults
• No successful health‑check for Google provider.
• No successful health‑check for GitHub provider.
• Dependent workflows that require either provider are currently blocked.
🗒️ Next Steps (recommended)
- Run
ricky provider check googleandricky provider check githubto trigger connectivity verification. - Review provider credentials (OAuth tokens, API keys) in
config/providers.yaml. - If credentials are missing or expired, update them and re‑run the checks.
- Once both checks return CONNECTED, re‑enable any blocked workflows.
--- End of Report ---
Must
- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.
Must Not
- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)
User message
Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.
Ricky output
**GitHub Primitive‑Change Workflow**
| Phase | Step | Action | Owner | Artifact | Verification Gate |
|-------|------|--------|-------|----------|--------------------|
| **1️⃣ Planning** | 1.1 | Open a **Feature‑Branch Ticket** in the project board (e.g., `feat/issue‑#123‑update‑readme`). | Product Owner | Ticket (Jira/Linear) | Ticket must contain: title, description, acceptance criteria, and linked issue. |
| | 1.2 | Create a **branch** from `main` using the ticket ID (`git checkout -b feat/issue-123-update-readme`). | Developer | Branch name follows `feat/<ticket‑id>-<short‑desc>` pattern. | Branch name passes regex validation. |
| **2️⃣ Development** | 2.1 | Make the primitive change (e.g., edit `README.md`). | Developer | Modified file(s). | Lint/format checks (`pre‑commit`) must pass. |
| | 2.2 | Run **local tests** (`npm test` / `go test` etc.). | Developer | Test report. | All tests must pass (0 failures). |
| | 2.3 | Commit with **conventional commit** format (`git commit -m "feat: update README with new badge"`). | Developer | Commit SHA. | Commit message passes `commitlint` rules. |
| **3️⃣ Pull Request (PR) Creation** | 3.1 | Push branch to origin and open a PR targeti
...[truncated]
Must
- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.
Must Not
- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)
User message
Generate a workflow where the spec says "use Claude to review and Codex to implement".
Ricky output
**Workflow: `review‑then‑implement`**
| **Stage** | **Agent** | **Purpose** | **Inputs** | **Outputs** | **Verification Gate** |
|-----------|-----------|-------------|------------|-------------|-----------------------|
| 1️⃣ | **Claude** (LLM‑review) | Perform a technical and style review of the design spec. | - `spec.md` (markdown) <br> - `review‑criteria.yml` (see below) | - `review‑report.json` (structured findings) <br> - `approved‑spec.md` (spec with inline annotations) | **Gate‑R1**: All required criteria must be marked *PASS* in `review‑report.json`. If any *FAIL*, workflow halts and reports blocker. |
| 2️⃣ | **Codex** (code‑gen) | Generate implementation code based on the *approved* spec. | - `approved‑spec.md` <br> - `code‑template/` (optional scaffolding) | - `src/` (generated source tree) <br> - `generation‑log.txt` | **Gate‑C1**: Automated lint + unit‑test compilation must succeed (0 errors). If failures, Codex re‑runs up to 3 retries; after that, blocker is raised. |
| 3️⃣ | **CI‑Validate** (deterministic) | Run full test suite and static analysis on generated code. | - `src/` | - `test‑results.xml` <br> - `static‑analysis-report.json` | **Gate‑V1**: ≥ 90 % test cov
...[truncated]
Must
- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write
tool-selection.jsonor equivalent audit metadata.
Must Not
- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)
User message
Use
--refineto sharpen a generated workflow's step task descriptions and acceptance gates.
Ricky output
## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates
`--refine` is the **deterministic post‑processing flag** that takes a freshly generated workflow (JSON/YAML) and runs a second, focused LLM pass that:
1. **Tightens the natural‑language task description** for each step (removes ambiguity, adds concrete verbs, references inputs/outputs).
2. **Adds or sharpens acceptance‑gate predicates** (boolean expressions, schema checks, or test‑script snippets) so every step can be automatically verified before the next step runs.
Below is a **step‑by‑step contract** you can copy‑paste into your CI pipeline or local dev script.
---
### 1. Contract Overview
| Field | Type | Required? | Meaning |
|-------|------|-----------|---------|
| `workflow_path` | `string` | ✅ | Path to the generated workflow file (JSON or YAML). |
| `refine_schema` | `object` | ✅ | JSON‑Schema that defines the *desired* shape of each step after refinement (see §2). |
| `output_path` | `string` | ✅ | Where the refined workflow will be written. |
| `max_iterations` | `integer` | ❌ (default = 1) | How many refinement passes to run (useful for 80‑100 % validation loops). |
| `dry_run` | `boolean`
...[truncated]
Must
- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.
Must Not
- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)
User message
Generate a workflow for the
ricky --versionspec.
Ricky output
**Workflow: `ricky --version` Feature Implementation & Release**
| **Section** | **Details** |
|-------------|--------------|
| **Goal** | Add a `--version` flag to the `ricky` CLI that prints the current package version (semantic‑version string) and exits with status 0. |
| **Owner** | `@ricky-devlead` |
| **Stakeholders** | QA, Docs, Release Engineering, CI/CD |
| **Branching Model** | <ul><li>`feature/ricky-version` – feature branch created from `main`.</li><li>When ready, open PR `feat: add --version flag` targeting `main`.</li></ul> |
| **Contract (API)** | <ul><li>Command: `ricky --version` (or `ricky -v`).</li><li>Output: `<package-name> vX.Y.Z` (e.g., `ricky v1.3.2`).</li><li>Exit code: `0`.</li><li>No side‑effects (no file writes, no network calls).</li></ul> |
| **Acceptance Criteria** | 1. Running `ricky --version` prints exactly one line with the version string.<br>2. The version matches the value in `pyproject.toml` (or `package.json` for JS).<br>3. Flag is discoverable via `ricky --help` under “Options”.<br>4. Unit tests cover parsing, output, and exit code.<br>5. Integration test runs on CI across supported OSes (Linux, macOS, Windows). |
| **Implementation Steps**
...[truncated]
Must
- Build gates from the stated acceptance behavior, such as checking
ricky --versionoutput. - Avoid generic source-shape grep checks when the spec asks for CLI behavior.
- Keep generated validation meaningful for the current repo shape.
Must Not
- Treat
grep -Eq 'export|function|class|workflow(' dist/ricky.jsas proof of version behavior. - Claim the workflow is proven by source syntax alone.
- Ignore the package-json version resolution order in the spec.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)
User message
Generate a workflow for many independent artifacts with a validation/fix/rerun loop.
Ricky output
**Workflow: “Batch‑Artifact Validation & Auto‑Fix Loop”**
*Purpose*: Process a large set of independent artifacts (e.g., data files, config bundles, code modules) through deterministic validation, automatic remediation, and controlled re‑execution until all pass or a hard‑stop blocker is raised.
---
## 1. High‑Level Diagram
┌─────────────┐ 1. Discover ┌─────────────┐
│ Artifact │ ─────────────► │ Queue │
│ Source (S3/│ │ (Batch) │
│ Git, …) │ ◄───────────── │ │
└─────┬───────┘ 7. Report └─────┬───────┘
│ │
│ │
▼ ▼
┌─────────────┐ 2. Pull ┌─────────────┐ 4. Validate ┌─────────────┐
│ Worker Pool │ ◀────────── │ Dispatcher │ ─────────────► │ Validator │
│ (N parallel)│ │ (Task Split)│ │ (Determin‑ │
└─────┬───────┘ └─────┬───────┘ │ ised Rules)│
│ │ └─────┬─────┘
│ 5. Fix (if needed) │ │
▼ ▼ ▼
┌─────────────┐ 6. Re‑run ┌──
...[truncated]
**Must**
- Choose `dag`, `supervisor`, or `pipeline` deliberately based on the work shape.
- Explain the pattern choice in artifact metadata or a rationale.
- Use `dag` for validation/fix/rerun loops when dependencies matter.
**Must Not**
- Default blindly to `dag` for every workflow.
- Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-boundary-copy</code> (generation-quality/manual)</summary>
**User message**
> Describe how selected workflow-writing skills affected a generated Ricky workflow.
**Ricky output**
```text
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow
| Skill | What the skill does | Direct effect on the generated workflow | Example of the resulting artifact |
|-------|---------------------|----------------------------------------|-----------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output schemas (JSON Schema, OpenAPI, or TypedDict) for every step. | • Every node has a **contract block** that the engine validates before execution.<br>• Guarantees type‑safe hand‑offs, preventing silent data‑corruption. | ```yaml\nsteps:\n - id: fetch_user\n contract:\n input: {user_id: string}\n output: {name: string, email: string}\n``` |
| **Verification Gate Insertion** | Inserts automated checks (unit test, schema validation, policy lint) after each critical step. | • The workflow pauses at a **gate** until the check passes, turning runtime failures into compile‑time feedback.<br>• Enables “fail‑fast” behavior and clear error attribution. | ```yaml\n - id: validate_user\n type: gate\n run: python -m jsonschema -i {{fetch_user.output}} schema/user.json\n``` |
| **Review Artifact G
...[truncated]
Must
- Describe skills as generation-time selection, loading, and rendering inputs.
- Point to metadata such as
loaded-skills.txtandskill-application-boundary.json. - Say the workflow instructions were informed by selected skills.
Must Not
- Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
- Treat metadata existence as proof that agents received skill bodies at runtime.
- Overstate current tests beyond the generation boundary.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)
User message
Generate a workflow from a vague product spec and return it to the user.
Ricky output
**Workflow Draft – Request for Clarification**
Below is a minimal “contract‑first” scaffold that follows our AgentWorkforce standards (deterministic verification gates, review artifacts, 80‑100 validation loops, scoped PR boundaries, etc.). Because the product specification you provided is vague, I’ve highlighted the sections that need concrete details before the workflow can be finalized and committed.
| **Section** | **What’s Needed** | **Placeholder / Example** |
|-------------|-------------------|---------------------------|
| **Workflow Name & ID** | A concise, unique name and identifier. | `wf‑<product‑name>-v1` |
| **Purpose / Goal** | One‑sentence description of the business outcome. | “Automate onboarding of new SaaS customers.” |
| **Inputs** | List of required input artifacts (e.g., JSON payload, file, API call). | `customerInfo: JSON` |
| **Outputs** | Expected deliverables after successful run. | `onboardedAccountId: string` |
| **Stages & Tasks** | For each stage, define: <br>• Task name <br>• Owner (human or service) <br>• Deterministic verification gate (e.g., schema validation, unit test, contract check) <br>• Success/Failure criteria <br>• Review artifact (e.g.,
...[truncated]
Must
- Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
- Validate with dry-run or targeted structural checks where possible.
- Return artifacts, warnings, and follow-up commands honestly.
Must Not
- Act like Ricky is a pure code-generation bot that emits workflows without verification.
- Stop at "code compiles" as the proof bar.
- Skip skill-aware workflow authoring guidance for serious workflows.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)
User message
A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.
Ricky output
**Ricky’s Immediate Action Plan**
| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ | **Capture the failure context** – pull the run‑log, step‑metadata, and worker heartbeat for the timed‑out step and the “alive‑but‑idle” worker. | Provides a deterministic snapshot that can be reproduced and audited. | `failure_context.json` (log excerpts, timestamps, worker IDs, resource usage). |
| 2️⃣ | **Mark a hard blocker** in the workflow state machine. Set the run status to **BLOCKED** and attach the `failure_context.json`. | Guarantees downstream steps do not proceed on incomplete data and signals the coordination layer that human/automated intervention is required. | Updated workflow state entry (`status: BLOCKED`). |
| 3️⃣ | **Create a repair branch** off the latest stable commit of the workflow definition (e.g., `repair/timeout‑2023‑05‑09`). | Isolates the fix work, prevents race conditions with other PRs, and keeps the production branch clean. | Git branch `repair/timeout‑2023‑05‑09`. |
| 4️⃣ | **Generate a deterministic verification gate** that will run before any merge of the repair branch. The gate must: <br>• Replay the t
...[truncated]
Must
- Classify the failure before deciding whether to retry.
- Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
- Preserve raw evidence and uncertainty when the class is not known.
Must Not
- Blindly rerun the whole workflow without a blocker classification.
- Treat every failure as a broken workflow definition.
- Claim Ricky fixed the workflow before rerun evidence proves it.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)
User message
Ricky detects stale
.agent-relay/,.relay/, and.trajectories/state before launching a local workflow.
Ricky output
**Workflow Contract – Pre‑Launch Stale‑State Guard**
| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development runs of any `.agent‑workflow/` (including `ricky run`, `ricky test`, `ricky debug`). |
| **Inputs** | • Working directory (root of the repo) <br>• Optional config `ricky.yaml` key `staleGuard.enabled` (default = true). |
| **Outputs** | • `status: PASS | FAIL` <br>• `stalePaths: []` (list of stale directories detected) <br>• `message: string` (human‑readable report). |
| **Success Criteria** | `status == PASS`. All detected stale directories must be either empty or explicitly ignored via config. |
| **Failure Action** | Abort the launch and emit a **Blocker Report** (see “Verification Gate” below). |
---
### 1. Detection Logic
```python
# pseudo‑code executed as the first step of every local workflow
def detect_stale_state(root: Path) -> Tuple[bool, List[Path]]:
stale_dirs = [".agent-relay", ".relay", ".trajectories"]
found = []
for d in stale_dirs:
dir_path = root / d
if dir_path.is_dir():
# consider stale if it contains any file *other* than .gitkeep
...[truncated]
Must
- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.
Must Not
- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)
User message
A run marker says another Ricky or Relay run is already active in this workspace.
Ricky output
**Quick Fix for “Another Ricky/Relay Run Is Already Active”**
The marker you’re seeing is a lock file that Ricky creates to prevent overlapping executions. It means a previous run didn’t clean up properly (crashed, was cancelled, or left a stray process). Follow the deterministic clean‑up steps below; each step is a verified gate that you can run in isolation and confirm before moving to the next.
---
## 1️⃣ Verify the Current State
| Command | Expected Output |
|---------|-----------------|
| `ricky status` | Shows **ACTIVE** if a lock is present, otherwise **IDLE**. |
| `ps aux \| grep -i ricky` | Lists any lingering Ricky/Relay processes. |
| `cat .ricky/run.lock` *(if it exists)* | Shows the PID and timestamp of the lock holder. |
If `ricky status` reports **ACTIVE** **and** there is **no** matching PID in the process list, the lock is stale.
---
## 2️⃣ Safe Lock Removal (Deterministic Gate)
> **Goal:** Remove the stale lock *only* when you’re sure no legitimate run is in progress.
1. **Confirm no active PID**
```bash
LOCK_PID=$(cat .ricky/run.lock | cut -d' ' -f1)
if ps -p $LOCK_PID > /dev/null; then
echo "Process $LOCK_PID still running – aborting l
...[truncated]
Must
- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.
Must Not
- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)
User message
Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.
Ricky output
**Local Workflow Execution – Auto‑Fix Run**
Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.
---
## 1️⃣ Prerequisites (verify once)
| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` CLI version | `aw --version` | `>=2.4.0` |
| Repository root (contains `.aw/` config) | `git rev-parse --show-toplevel` | Path printed |
| Docker (if workflow uses containers) | `docker version` | Client/Server version printed |
| Environment variables | `echo $AW_WORKSPACE` | Non‑empty path to workspace |
If any check fails, resolve it before proceeding.
---
## 2️⃣ Input contract (what you must provide)
| Variable | Description | Example |
|----------|-------------|---------|
| `FAILED_STEP` | Exact name of the step that failed in the first attempt | `build-image` |
| `PREV_RUN_ID` | Run identifier of the previous (failed) execution | `run-20260508-01` |
| `WORKFLOW_FILE`
...[truncated]
Must
- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.
Must Not
- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)
User message
A user runs
ricky run workflows/foo.ts --no-auto-fixand the workflow fails.
Ricky output
**Quick‑Start Debug Checklist**
| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣ | `ricky logs workflows/foo.ts --tail 200` | Recent error stack, failed validation IDs, and any “blocker” messages. |
| 2️⃣ | `ricky run workflows/foo.ts --debug` | Full step‑by‑step trace (including input‑output payloads) – keep this run separate from the original “no‑auto‑fix” run. |
| 3️⃣ | Open the workflow contract file `workflows/foo.contract.ts` (or `foo.schema.json` if you use JSON schema). | Verify that **all required inputs** are declared, types match, and any **deterministic verification gates** (e.g., `gate: "hasValidAuthToken"`) are satisfied. |
| 4️⃣ | Run the contract validator directly: <br>`ricky validate contracts/foo.contract.ts --input ./sample-data/foo.input.json` | Should return **PASS** for every gate. If any gate fails, note the gate name and the failing payload. |
| 5️⃣ | Check the **review artifact** for the most recent PR that touched `foo.ts`. <br>`ricky pr view <last‑pr‑id> --artifacts` | Look for “blocked by” comments, lint failures, or missing test coverage. |
| 6️⃣ | Run the unit‑test suite for the workflow: <br>`r
...[truncated]
Must
- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.
Must Not
- Start a repair loop despite
--no-auto-fix. - Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)
User message
Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.
Ricky output
**Ricky – Local Execution of a TypeScript Workflow Artifact**
*(Primary local path = `./workflows` – the repository‑root default for all workflow source files.)*
---
## 1. Preconditions (contract “local‑exec‑pre‑check”)
| Item | Requirement | How Ricky verifies |
|------|-------------|--------------------|
| **Node runtime** | `node >=18` and `npm >=9` installed | `node -v` / `npm -v`; abort with *BLOCKER* if missing |
| **TypeScript compiler** | `tsc` available (via project dev‑dependency) | `npm ls typescript` → version check |
| **Workflow manifest** | `workflow.json` present in the same directory as the `.ts` file | `fs.existsSync('workflow.json')` |
| **Dependency lock** | `package-lock.json` up‑to‑date | `npm ci` succeeds; otherwise run `npm ci` automatically |
| **Clean build output** | No stale `dist/` artifacts | If `dist/` exists, run `npm run clean` (script must be defined) |
| **Execution entry point** | Exported async function named `run` (or as defined in `workflow.json.entry`) | Static analysis (`ts-node -e "import('./src/foo.ts').then(m=>console.log(typeof m.run))"` ) |
If any check fails, Ricky emits a **BLOCKER** report with a concise remediation guide and h
...[truncated]
Must
- Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the
agent-relaybinary on PATH. - Precheck that Node and
@agent-relay/sdkare resolvable for the workflow. - Record the actual spawn command in execution evidence.
Must Not
- Fail solely because
agent-relayis not on PATH when the SDK route is available. - Hide the actual runtime command from evidence.
- Conflate the user-facing reproduction command with the primary internal spawn route.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)
User message
Ricky reaches a boundary after a structural failure persists after a fix attempt.
Ricky output
**Ricky’s Boundary‑Handling Protocol for a Persistent Structural Failure**
When a structural failure remains after an initial fix attempt, Ricky must treat the situation as a **hard blocker** and transition to a controlled “boundary” state. The following contract‑driven workflow guarantees deterministic hand‑off, traceability, and rapid recovery.
---
## 1. Immediate Blocker Declaration (Deterministic Gate)
| Step | Action | Artifact | Owner |
|------|--------|----------|-------|
| 1.1 | **Raise a Blocker Issue** in the `#workflow‑issues` repo with the label `blocker‑structural‑failure`. | `ISSUE-<NNN>` – includes: <br>• Failure ID (hash) <br>• Original fix PR reference <br>• Re‑run logs showing unchanged error <br>• Impact assessment (services, SLA) | Ricky (automated) |
| 1.2 | **Add a “Boundary” tag** to the issue to signal that normal repair loops are suspended. | Tag `boundary‑active` | Ricky |
| 1.3 | **Notify** the designated Incident Lead (via Slack `#incidents`) and the Architecture Review Board (ARB). | Slack message with link to ISSUE‑NNN | Ricky |
*Gate condition*: No further automated repair attempts are launched until the blocker is cleared.
---
## 2. Scoped Iso
...[truncated]
Must
- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.
Must Not
- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)
User message
Produce a workflow health digest from many Ricky workflow runs.
Ricky output
**Ricky Workflow Health Digest – Template & Example**
Below is a ready‑to‑use digest format that aggregates the health of multiple Ricky workflow runs. It follows the **Ricky repository conventions** (AGENTS.md), includes **deterministic verification gates**, **review artifacts**, and highlights any **blockers**. Use the template for any set of runs; replace the placeholder values with the actual data from your CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.).
---
## 1. Digest Header
| Field | Value |
|-------|-------|
| **Report Date** | `YYYY‑MM‑DD HH:MM UTC` |
| **Generated By** | `ricky‑health‑digest` (CLI vX.Y.Z) |
| **Scope** | `workflow‑run‑ids: [1234, 1235, 1236, …]` |
| **Environment** | `prod / staging / dev` |
| **Total Runs Analyzed** | `N` |
| **Overall Status** | `✅ Healthy` / `⚠️ Degraded` / `❌ Critical` |
---
## 2. Summary Statistics
| Metric | Value | Target / SLA |
|--------|-------|--------------|
| **Successful Runs** | `S / N` | ≥ 95 % |
| **Failed Runs** | `F / N` | ≤ 5 % |
| **Runs with Blockers** | `B / N` | 0 |
| **Average Duration** | `⌀ T minutes` | ≤ 30 min |
| **Verification Gate Pass Rate** | `V / N` | 100 % |
| **Review Artifact Coverage**
...[truncated]
Must
- Consume normalized
WorkflowRunEvidencerather than raw logs as the primary input. - Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.
Must Not
- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)
User message
Design a new Ricky Slack surface for workflow debugging.
Ricky output
**Ricky Slack Surface – “Workflow Debugger”**
*Version 0.1 – Design Draft*
---
## 1. Overview
The **Workflow Debugger** Slack surface gives agents a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It is built as a **Slack App** that talks to the Ricky Core via the existing **Ricky‑API** (REST + WebSocket) and surfaces a **state‑machine view** of any workflow instance.
| Feature | Description |
|---------|-------------|
| **Live Instance Selector** | `/ricky debug list` → interactive dropdown of active workflow IDs (filtered by user’s org/team). |
| **State‑Tree Viewer** | Rendered as a Slack Block Kit “section + context” hierarchy showing current node, pending actions, and completed steps. |
| **Step‑Through Controls** | Buttons: **Pause**, **Resume**, **Step‑Forward**, **Step‑Back**, **Jump‑to‑Node**. |
| **Inline Patch** | Text‑area modal to edit a node’s input payload; submit triggers a **deterministic verification gate** before applying. |
| **Error‑Overlay** | If a node fails, a red banner appears with the exception stack, logs, and a **“Create Fix PR”** shortcut. |
| **Audit Trail**
...[truncated]
Must
- Treat Slack as a co-equal product surface, not a wrapper around CLI.
- Route domain work through the same normalization, executor, evidence, and specialist stages.
- Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.
Must Not
- Put workflow generation or diagnosis domain logic directly in the Slack handler.
- Degrade Slack to a developer shortcut with weaker routing than CLI.
- Skip signature verification, dedup, or thread handling when the surface is implemented.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)
User message
Add a new web handoff type that submits a workflow spec and mode.
Ricky output
**New Hand‑off Type:** `web_submit_spec`
| Field | Type | Description |
|-------|------|-------------|
| **type** | `string` (enum) | Fixed value `"web_submit_spec"` – identifies this hand‑off. |
| **payload** | `object` | Container for the data the UI sends to the workflow engine. |
| **payload.spec** | `object` | Full **WorkflowSpec** JSON (see `specs/workflow/v1.json`). |
| **payload.mode** | `string` (enum) | Execution mode – `"dry-run"` | `"run"` | `"preview"` |
| **payload.metadata** *(optional)* | `object` | Arbitrary key‑value pairs for tracing (e.g., `requestId`, `userId`). |
---
## 1. Contract Definition (YAML)
```yaml
# contracts/handoffs/web_submit_spec.yaml
type: object
required:
- type
- payload
properties:
type:
const: web_submit_spec
payload:
type: object
required:
- spec
- mode
properties:
spec:
$ref: "../specs/workflow/v1.json"
mode:
type: string
enum: [dry-run, run, preview]
metadata:
type: object
additionalProperties: true
additionalProperties: false
The contract lives under contracts/handoffs/ and is version‑controlled with the rest of the repo.
2
...[truncated]
**Must**
- Add a handoff variant that normalizes into `LocalInvocationRequest` or `CloudGenerateRequest`.
- Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.
**Must Not**
- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>surfaces-ingress.mcp-claude-context-is-metadata</code> (surfaces-ingress/manual)</summary>
**User message**
> A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.
**Ricky output**
```text
**Decision:** Generate a new workflow draft.
**Rationale**
1. **Input Type** – You supplied a *workflow specification* together with *conversation context* but no existing workflow artifact to debug, restart, or analyze. The spec describes the intended steps, inputs, outputs, and verification gates, which is the prerequisite for authoring a fresh workflow.
2. **Action Mapping**
- **Generate** – Create a workflow from a specification.
- **Debug** – Requires an existing workflow with failing verification.
- **Restart** – Requires a halted execution instance.
- **Analyze** – Requires execution logs or performance data.
- **Coordinate** – Requires multiple dependent workflows or external agents needing orchestration.
3. **Current Need** – The spec is the source material; the appropriate next step is to author a concrete workflow contract that includes:
- Deterministic verification gates (pre‑, post‑, and invariant checks).
- Scoped branch/PR boundaries (feature branch, PR title, reviewers).
- A validation loop count (target 80–100 iterations) for automated testing.
- Blocker reporting template (if any required resources are missing).
**Ne
...[truncated]
Must
- Normalize the tool call through the MCP/Claude handoff path.
- Treat conversation context as advisory metadata.
- Make the spec itself sufficient for routing whenever possible.
Must Not
- Require the user to rewrite the spec manually as a workflow.
- Put domain routing logic inside the MCP tool handler itself.
- Ignore provided mode, workflow path, or prior decisions metadata.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)
User message
Change the Cloud Ricky generate API response to include a new field and change one existing error code name.
Ricky output
**Scope**
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.
- Rename error code **`INVALID_PROMPT`** to **`BAD_PROMPT`** (same numeric value).
- All changes are limited to the **`cloud-ricky`** service repository; no cross‑service impact.
---
## 1. Contract Change (API Specification)
| Endpoint | Method | Success Payload (200) | New Field | Error Mapping |
|----------|--------|-----------------------|-----------|---------------|
| `/v1/generate` | POST | `{ "output": string, "metadata": object, "requestId": string }` | **`requestId`** – UUID v4, globally unique per request. | `BAD_PROMPT` (formerly `INVALID_PROMPT`) – 400, `{ "code":"BAD_PROMPT","message":"Prompt validation failed." }` |
*Version bump*: **v1.2.0** (semantic minor bump – backward‑compatible addition).
The OpenAPI fragment to be merged:
```yaml
paths:
/v1/generate:
post:
summary: Generate content
operationId: generateContent
requestBody:
$ref: '#/components/requestBodies/GenerateRequest'
responses:
'200':
description: Successful generation
content:
application/js
...[truncated]
Must
- Allow additive response fields within
/api/v1/ricky/.... - Reject renaming or c
Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.
Closing in favor of #84
I dug deeper after a follow-up question and the gate-side fix in this PR doesn't actually solve the underlying bug — it only makes auto-fix retries try to re-render, but the workforce-persona writer would just keep failing the same way.
The real root cause: in
src/product/generation/pipeline.ts, when the workforce-persona writer throws (opencode/claude CLI errored, returned non-completed status, malformed output),generateWithWorkforcePersona's catch block returnssuccess: falsewith a blockingWORKFORCE_PERSONA_WRITER_FAILEDissue. That discards the perfectly validbaseResult.artifact(deterministic template render) we'd already produced.entrypoint.execute()then early-returns at the!success || !artifactcheck atsrc/local/entrypoint.ts:1083without writing anything to disk. Auto-fix loops until the budget burns, all for the same INVALID_ARTIFACT reason.End-to-end trace from the user's repro that proves this:
The fix in #84 makes the catch block fall back to the deterministic baseResult —
success: true, artifact intact, persona failure surfaced as a warning. This mirrors the existing pre-write validation fallback atpipeline.ts:154-166, which already does exactly this when the persona's output fails repair-attempt validation.Closing this without merging.