fix(auto-fix): re-enter generation when synthesized retry artifact is missing by kjgbot · Pull Request #83 · AgentWorkforce/ricky

kjgbot · 2026-05-09T17:24:03Z

Closing in favor of #84

I dug deeper after a follow-up question and the gate-side fix in this PR doesn't actually solve the underlying bug — it only makes auto-fix retries try to re-render, but the workforce-persona writer would just keep failing the same way.

The real root cause: in src/product/generation/pipeline.ts, when the workforce-persona writer throws (opencode/claude CLI errored, returned non-completed status, malformed output), generateWithWorkforcePersona's catch block returns success: false with a blocking WORKFORCE_PERSONA_WRITER_FAILED issue. That discards the perfectly valid baseResult.artifact (deterministic template render) we'd already produced. entrypoint.execute() then early-returns at the !success || !artifact check at src/local/entrypoint.ts:1083 without writing anything to disk. Auto-fix loops until the budget burns, all for the same INVALID_ARTIFACT reason.

End-to-end trace from the user's repro that proves this:

[trace] gen call start workforcePersona=true
[trace] gen call returned success=false hasArtifact=true artifactPath=workflows/generated/ricky-...ts validErrors=1
[trace] gen EARLY-RETURN no write success=false hasArtifact=true firstErr=Workforce persona writer did not complete: failed.

The fix in #84 makes the catch block fall back to the deterministic baseResult — success: true, artifact intact, persona failure surfaced as a warning. This mirrors the existing pre-write validation fallback at pipeline.ts:154-166, which already does exactly this when the persona's output fails repair-attempt validation.

Closing this without merging.

When the local-runtime launch/precheck fails before any Agent Relay SDK step starts, Ricky tags evidence with a synthetic stage id ('runtime-launch', 'runtime-precheck', or the 'local-runtime' fallback). The auto-fix loop was reading those back out and forwarding them to the SDK as `startFromStep` on every retry, so the SDK threw `startFrom step "..." not found in workflow` seven times in a row before giving up — the user saw a misleading `MISSING_BINARY at runtime-launch` final blocker instead of the real classified failure. Filter the synthetic ids out at the boundary in `failedStepFromEvidence` so launch-phase failures retry from the start (with `previousRunId` still reused, matching the cli-auto-fix-and-resume spec). Centralize the allowlist in `src/local/synthetic-step-ids.ts` so the next stage label added in `entrypoint.ts` has a single source of truth to register against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… missing When `ricky --mode local --spec-file <foo>.md --run` runs against a spec whose content the parser scores as `debug` intent (any spec describing "failed" / "failure" / "broken" without explicit "create/build/author workflow" phrasing), and attempt 1 produces a generation artifact in memory but doesn't actually write it to disk (e.g. validation failed, workforce-persona writer threw, an early-return path fired), the auto-fix loop synthesizes a retry via retryBaseRequest that: - flips `source` to 'workflow-artifact' - sets `specPath` to the never-written `workflows/generated/...ts` - stamps `metadata.autoFixGeneratedFrom = <prior source>` On the retry, `workflowFileForRoute` returns specPath because it matches `isExecutableWorkflowPath`, the routing target stays at 'debug' (spec content is unchanged), and the gate `target === 'generate' || !workflowFile` evaluates false / SKIPS generation. Runtime-precheck then fails INVALID_ARTIFACT because the file isn't on disk. All 7 retries fail identically. Repro from cwd /Users/khaliqgant/Projects/My-Senior-Dev/app: ricky --mode local --spec-file docs/plans/2026-05-05-proactive-pr-remediation-spec.md --run Generation: ok - workflows/generated/ricky-...ts Execution: blocked - INVALID_ARTIFACT at runtime-precheck Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT) Fix: when the request was synthesized by an auto-fix retry (autoFixGeneratedFrom is set) AND the workflowFile doesn't exist on disk AND routing isn't already 'generate', fall through to the generation block instead of letting precheck fail. User-provided workflow-artifact handoffs that name a missing file still surface the runtime-precheck blocker (preserved by the existing artifact-run regression tests at src/local/entrypoint.test.ts:3703 and :3862). Two new regression tests cover both branches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-09T17:24:14Z

📝 Walkthrough

Walkthrough

This PR prevents synthetic local stage IDs (runtime-launch, local-runtime, runtime-precheck) from being forwarded as resume targets on auto-fix retries. It introduces a centralized synthetic ID definition, filters synthetic-only failures at retry boundary, and re-enters generation for missing workflow artifacts in auto-fix scenarios.

Changes

Fix Synthetic Step ID Propagation in Auto-Fix Retries

Layer / File(s)	Summary
Synthetic ID Contract `src/local/synthetic-step-ids.ts`	Exports `SYNTHETIC_LOCAL_STAGE_IDS` set and `isSyntheticStageId()` predicate as a centralized source-of-truth for detecting synthetic step IDs.
Specification `specs/auto-fix-synthetic-step-ids.md`	Documents the bug where synthetic IDs cause "step not found" errors on retry, defines target behavior (omit startFromStep for synthetic-only failures), and outlines the fix: filter at failedStepFromEvidence boundary.
Auto-Fix Loop Filter `src/local/auto-fix-loop.ts`	Updates `failedStepFromEvidence` to skip synthetic step IDs and return the first real failed step; omits startFromStep from retry when only synthetic failures exist.
Workflow Artifact Re-Entry `src/local/entrypoint.ts`	Adds `fallThroughForMissingArtifact` logic to re-enter generation phase when auto-fix generated retries reference unreadable workflow artifact files; introduces `workflowFileReadable()` helper.
Auto-Fix Loop Tests `src/local/auto-fix-loop.test.ts`	Replaces runtime-launch with install-deps in existing fixtures; adds parameterized test verifying synthetic IDs are not forwarded as startFromStep; introduces isSyntheticStageId unit test block.
Entrypoint Tests `src/local/entrypoint.test.ts`	Adds regression tests: auto-fix synthesized retries re-enter generation when artifact is missing; user-provided artifact paths do not trigger regeneration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

AgentWorkforce/ricky#49: Both PRs modify the local auto-fix retry logic and how failed steps drive retries and repairs.
AgentWorkforce/ricky#69: Both PRs modify src/local/auto-fix-loop.ts to improve the auto-fix retry loop logic.

Poem

🐰 Synthetic steps tried to masquerade,
As real ones in the retry parade,
But now we filter them out with care,
And artifacts re-gen when they're not there—
No "step not found" errors left to spare! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main fix: re-entering generation when a synthesized auto-fix retry's workflow artifact is missing on disk.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the bug, root cause, fix, and testing approach with clear context.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/skip-generation-only-when-workflow-on-disk

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/local/entrypoint.test.ts (1)

2417-2417: ⚡ Quick win

Tighten the assertion to prove a workflow artifact was written.

The current check passes on any write; this regression is stronger if it asserts a generated workflow artifact write specifically.

Proposed test assertion tweak

-      expect(localExecutor.writes.length).toBeGreaterThanOrEqual(1);
+      expect(workflowArtifactWrites(localExecutor.writes)).toHaveLength(1);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/entrypoint.test.ts` at line 2417, Replace the loose length check on
localExecutor.writes with an assertion that specifically verifies a workflow
artifact was written: inspect localExecutor.writes for an entry whose
filename/path matches the generated workflow artifact (e.g., contains "workflow"
or the known artifact extension produced by your workflow generator) and assert
that such an entry exists (use localExecutor.writes and an array/object matching
assertion against the entry.path or entry.name to prove the workflow artifact
was created).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/auto-fix-synthetic-step-ids.md`:
- Around line 9-18: The fenced code block showing the ricky run output is
missing a language tag and triggers MD040; update the triple-backtick fence that
wraps the sample log to include a language identifier (e.g., "text" or "bash")
so it becomes ```text (or ```bash) and ensure the block content remains
unchanged so linting passes and formatting is preserved.

---

Nitpick comments:
In `@src/local/entrypoint.test.ts`:
- Line 2417: Replace the loose length check on localExecutor.writes with an
assertion that specifically verifies a workflow artifact was written: inspect
localExecutor.writes for an entry whose filename/path matches the generated
workflow artifact (e.g., contains "workflow" or the known artifact extension
produced by your workflow generator) and assert that such an entry exists (use
localExecutor.writes and an array/object matching assertion against the
entry.path or entry.name to prove the workflow artifact was created).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 126dcff0-1cda-4217-8d47-a01a0740a216

📥 Commits

Reviewing files that changed from the base of the PR and between 36199fc and f1bb839.

📒 Files selected for processing (6)

specs/auto-fix-synthetic-step-ids.md
src/local/auto-fix-loop.test.ts
src/local/auto-fix-loop.ts
src/local/entrypoint.test.ts
src/local/entrypoint.ts
src/local/synthetic-step-ids.ts

coderabbitai · 2026-05-09T17:26:12Z

+```
+ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
+…
+Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
+…
+Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff   # ×6 more
+…
+Execution: blocked — MISSING_BINARY at runtime-launch
+Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced code block to satisfy markdown linting.

This block should include a language (e.g., text or bash) to address MD040.

Suggested markdown fix

-``` +```text ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run … Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff … Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more … Execution: blocked — MISSING_BINARY at runtime-launch Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@specs/auto-fix-synthetic-step-ids.md` around lines 9 - 18, The fenced code block showing the ricky run output is missing a language tag and triggers MD040; update the triple-backtick fence that wraps the sample log to include a language identifier (e.g., "text" or "bash") so it becomes ```text (or ```bash) and ensure the block content remains unchanged so linting passes and formatting is preserved.

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

github-actions · 2026-05-09T18:05:26Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-09T17-24-31-983Z-HEAD-provider
Mode: provider
Git SHA: c6f1625

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deep Integration with Agent Assistant**

Below is a complete, deterministic workflow for adding the “deep Agent Assistant integration” claim to the Ricky repository. The plan follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and includes concrete contracts, verification gates, review artifacts, and a bounded PR scope.

---

## 1. Scope of the Change
| Area | What is changed |
|------|-----------------|
| **README / docs** | Add a “Deep Integration with Agent Assistant” section describing how Ricky delegates reliability checks, coordination, and authoring to the Agent Assistant service. |
| **`ricky/__init__.py`** | Expose a new constant `AGENT_ASSISTANT_INTEGRATION_LEVEL = "deep"` and update the module docstring. |
| **`ricky/config.py`** | Add a config flag `USE_DEEP_AGENT_ASSISTANT = True` with a default that can be overridden via environment variable `RICKY_DEEP_AGENT_ASSISTANT`. |
| **`ricky/engine.py`** | Insert a call to `AgentAssistant.enrich_workflow()` before any workflow is executed. |
| **Tests** | Add unit‑test `test_deep_integration_flag.py` that asserts the flag is tru
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑09)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of agents must declare a contract. | Identify the 6 agents lacking contracts; add a `turn‑context` section (input schema, output schema, version). |
| **Deterministic verification gates** | 8 agents run a pre‑merge gate that validates `turn‑context` JSON against the schema. | Every PR affecting `turn‑context` must pass a gate. | Add the `verify-turn-context` GitHub Action to the remaining 10 repos. |
| **Review artifacts** | 65 % of PRs include a “Turn‑Context Impact” checklist item. | 100 % compliance. | Update the shared PR template (`.github/PULL_REQUEST_TEMPLATE.md`) to make the checklist mandatory. |
| **Validation loops** | Average of 42 validation loops per turn‑context change (target 80‑100). | 80‑100 loops to ensure stability. | Introduce a “stress‑test” stage that runs 50 additional synthetic turn sequences before merge. |
| **Blocker reporting** | 3 blockers log
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow generation** | Agent Assistant already owns the *authoring contract* (input schema → concrete workflow). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents can rely on deterministic contracts. |
| **Validation** | Validation gates (type‑checking, schema compliance, deterministic‑state checks) are the first line of defense before a workflow is persisted. | Early failure detection reduces downstream debugging cycles and keeps the 80‑100 validation loop tight. |
| **Debugging** | Debug utilities (step‑by‑step execution traces, state diffs, deterministic replay) are tightly coupled to the generation output. Housing them together enables the Assistant t
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: July 1 – September 30 2026*  

---

## 1. Slice Objective
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** (≈ 150 users) to author, test, and run reliability‑focused workflows with deterministic verification and built‑in blocker reporting.

Success criteria (KPIs):

| KPI | Target |
|-----|--------|
| **On‑boarded pilot teams** | 10 |
| **Workflows shipped to pilot** | ≥ 30 (average ≥ 3 per team) |
| **Deterministic verification pass rate** | ≥ 95 % on first gate |
| **Mean Time to Blocker Resolution** | ≤ 4 h |
| **User‑reported friction score** | ≤ 2/5 (Likert) |
| **Documentation coverage** | 100 % of new contracts & gates |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Ricky v2.0 core engine (stable)  <br>• New **Workflow Contract** DSL for reliability (Ricky‑Contract‑v1) <br>• Deterministic verification gate (VC‑Determinism) <br>• Blocker‑reporting UI component <br>• Pilot‑team onboarding playbook <br>• Automated 80‑loop validation harness (Ricky‑Loop‑80) | • Major engine
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, production‑ready package specification** that lets Ricky surface workflow‑reliability assistance inside Slack (or any web‑chat widget) while preserving the strict AGENTS.md contract model, deterministic verification gates, and the 80‑to‑100 validation‑loop cadence required for all AgentWorkforce components.

---

### 1. High‑Level Architecture  

| Component | Responsibility | Runtime | Key Interfaces |
|-----------|----------------|---------|----------------|
| **Ricky Core** | Workflow authoring, reliability checks, blocker reporting | Python 3.11 (AWS Lambda) | `ricky.core.*` |
| **Slack Bridge** | Event ingestion, message formatting, interactive UI (modals, buttons) | Node 20 (AWS Fargate) | Slack Events API, Interactivity API |
| **Web Bridge** | Same contract as Slack Bridge but uses WebSocket + REST for a custom web widget | Node 20 (AWS Fargate) | Socket.io, HTTP POST |
| **Agent‑Assistant SDK** | Shared client library for both bridges (type‑safe, OpenAPI‑generated) | TypeScript 5 | `assistClient.*` |
| **Verification Engine** | Deterministic gate runner, test harness, contract validat
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?

1. Ricky’s execution path in local mode

Step	Action	Outcome
Parse arguments	`--mode local` tells Ricky to operate against the local repository; `--spec` supplies the natural‑language request.	Ricky loads the spec‑to‑workflow compiler.
Spec translation	The spec string (`"generate a workflow for package checks"`) is sent to the Spec‑Engine which produces a workflow contract (YAML/JSON) that follows the repository’s `workflow/` schema.	You receive a draft workflow file (e.g., `wf-package-checks.yaml`) placed in `workflows/drafts/`.
No execution	Because `--run` is omitted, Ricky stops after generation. No validation, no CI jobs, no side‑effects on the codebase.	The workflow is only written to disk; nothing is applied, merged, or executed.
Exit status	Returns `0` on successful generation, non‑zero if the spec could not be parsed or the contract violates schema rules.	You can check `$?` to confirm success.

2. What you ge

...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, copy‑and‑paste ready onboarding flow that a new user should run the first time they invoke the `ricky` command‑line interface. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).

---

## 1️⃣  Create a Scoped Onboarding Branch

```bash
git checkout -b onboarding/first‑run‑<your‑username>

All changes made by the onboarding script are confined to this branch, making review and rollback trivial.

2️⃣ Run the Interactive Bootstrap Script

ricky onboarding --init

What the script does (contract)

Step	Action	Deterministic verification gate	Expected output
1️⃣	Detect OS & architecture	`ricky env --check` returns `OK`	`Environment OK`
2️⃣	Install required runtimes (Python 3.11+, Node 20)	`ricky runtime --install` exits `0`	`Runtimes installed`
3️⃣	Create a local config directory (`~/.ricky`)	`test -d ~/.ricky` →
...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
1. **Detect** – As soon as the workflow runner attempts to load the spec (`workflow.yaml` or `spec.json`) and receives a *null* or *file‑not‑found* response, raise a **Blocker**.
2. **Report** – Emit a deterministic `BLOCKER` event with the payload:  

   ```json
   {
     "type": "MISSING_SPEC",
     "detail": "Spec file not found or empty at path: ./specs/<workflow‑id>.yaml",
     "action": "Create placeholder spec and request author input"
   }

This event is logged to the Workflow‑Gate dashboard and sent to the assigned reviewer (owner of the repo branch).

2. Automated Placeholder Generation (Deterministic Contract)

Step	Action	Output Artifact
2.1	Create a minimal stub in the same location with the required top‑level keys (`name`, `description`, `steps`).	`specs/<workflow‑id>.yaml`
2.2	Populate stub with TODO markers and a unique spec‑id (UUID).
2.3	Add a `# GENERATED BY RICKY` header and a timestamp.
2.4	Open a draft PR titled `WIP: placeholder sp
...[truncated]


**Must**

- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.

**Must Not**

- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.status-does-not-invent-provider-state</code> (cli-behavior/manual)</summary>

**User message**

> Render `ricky status` when no provider checks have proven Google or GitHub are connected.

**Ricky output**

```text
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Logger : RUNNING

🔧 Configuration
• Active Branches : main, workflow‑dev
• Last Deploy : 2026‑04‑28 14:12 UTC
• Feature Flags : all defaults

⚠️ Alerts & Blockers
• No successful health‑check for Google provider.
• No successful health‑check for GitHub provider.
• Dependent workflows that require either provider are currently blocked.

🗒️ Next Steps (recommended)

Run ricky provider check google and ricky provider check github to trigger connectivity verification.
Review provider credentials (OAuth tokens, API keys) in config/providers.yaml.
If credentials are missing or expired, update them and re‑run the checks.
Once both checks return CONNECTED, re‑enable any blocked workflows.

--- End of Report ---

Must

Report unknown or not-connected provider state honestly.
Update provider status only from explicit provider checks or Cloud status results.
Give concrete setup guidance for Cloud when relevant.

Must Not

Mark Google or GitHub connected because guidance text was shown.
Invent a provider connection URL or OAuth flow.
Show empty fields with no recovery guidance when config is missing.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**GitHub Primitive‑Change Workflow**

| Phase | Step | Action | Owner | Artifact | Verification Gate |
|-------|------|--------|-------|----------|--------------------|
| **1️⃣ Planning** | 1.1 | Open a **Feature‑Branch Ticket** in the project board (e.g., `feat/issue‑#123‑update‑readme`). | Product Owner | Ticket (Jira/Linear) | Ticket must contain: title, description, acceptance criteria, and linked issue. |
| | 1.2 | Create a **branch** from `main` using the ticket ID (`git checkout -b feat/issue-123-update-readme`). | Developer | Branch name follows `feat/<ticket‑id>-<short‑desc>` pattern. | Branch name passes regex validation. |
| **2️⃣ Development** | 2.1 | Make the primitive change (e.g., edit `README.md`). | Developer | Modified file(s). | Lint/format checks (`pre‑commit`) must pass. |
| | 2.2 | Run **local tests** (`npm test` / `go test` etc.). | Developer | Test report. | All tests must pass (0 failures). |
| | 2.3 | Commit with **conventional commit** format (`git commit -m "feat: update README with new badge"`). | Developer | Commit SHA. | Commit message passes `commitlint` rules. |
| **3️⃣ Pull Request (PR) Creation** | 3.1 | Push branch to origin and open a PR targeti
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: `review‑then‑implement`**

| **Stage** | **Agent** | **Purpose** | **Inputs** | **Outputs** | **Verification Gate** |
|-----------|-----------|-------------|------------|-------------|-----------------------|
| 1️⃣ | **Claude** (LLM‑review) | Perform a technical and style review of the design spec. | - `spec.md` (markdown) <br> - `review‑criteria.yml` (see below) | - `review‑report.json` (structured findings) <br> - `approved‑spec.md` (spec with inline annotations) | **Gate‑R1**: All required criteria must be marked *PASS* in `review‑report.json`. If any *FAIL*, workflow halts and reports blocker. |
| 2️⃣ | **Codex** (code‑gen) | Generate implementation code based on the *approved* spec. | - `approved‑spec.md` <br> - `code‑template/` (optional scaffolding) | - `src/` (generated source tree) <br> - `generation‑log.txt` | **Gate‑C1**: Automated lint + unit‑test compilation must succeed (0 errors). If failures, Codex re‑runs up to 3 retries; after that, blocker is raised. |
| 3️⃣ | **CI‑Validate** (deterministic) | Run full test suite and static analysis on generated code. | - `src/` | - `test‑results.xml` <br> - `static‑analysis-report.json` | **Gate‑V1**: ≥ 90 % test cov
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing flag** that takes a freshly generated workflow (JSON/YAML) and runs a second, focused LLM pass that:

1. **Tightens the natural‑language task description** for each step (removes ambiguity, adds concrete verbs, references inputs/outputs).  
2. **Adds or sharpens acceptance‑gate predicates** (boolean expressions, schema checks, or test‑script snippets) so every step can be automatically verified before the next step runs.  

Below is a **step‑by‑step contract** you can copy‑paste into your CI pipeline or local dev script.

---

### 1. Contract Overview  

| Field | Type | Required? | Meaning |
|-------|------|-----------|---------|
| `workflow_path` | `string` | ✅ | Path to the generated workflow file (JSON or YAML). |
| `refine_schema` | `object` | ✅ | JSON‑Schema that defines the *desired* shape of each step after refinement (see §2). |
| `output_path` | `string` | ✅ | Where the refined workflow will be written. |
| `max_iterations` | `integer` | ❌ (default = 1) | How many refinement passes to run (useful for 80‑100 % validation loops). |
| `dry_run` | `boolean`
...[truncated]

Must

Keep the deterministic unrefined path as the default.
Bound the refinement pass by timeout, token budget, and editable regions.
Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

Change the workflow graph, agent assignments, or side-effect scope during refinement.
Fail the whole generation if optional refinement times out.
Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Feature Implementation & Release**

| **Section** | **Details** |
|-------------|--------------|
| **Goal** | Add a `--version` flag to the `ricky` CLI that prints the current package version (semantic‑version string) and exits with status 0. |
| **Owner** | `@ricky-devlead` |
| **Stakeholders** | QA, Docs, Release Engineering, CI/CD |
| **Branching Model** | <ul><li>`feature/ricky-version` – feature branch created from `main`.</li><li>When ready, open PR `feat: add --version flag` targeting `main`.</li></ul> |
| **Contract (API)** | <ul><li>Command: `ricky --version` (or `ricky -v`).</li><li>Output: `<package-name> vX.Y.Z` (e.g., `ricky v1.3.2`).</li><li>Exit code: `0`.</li><li>No side‑effects (no file writes, no network calls).</li></ul> |
| **Acceptance Criteria** | 1. Running `ricky --version` prints exactly one line with the version string.<br>2. The version matches the value in `pyproject.toml` (or `package.json` for JS).<br>3. Flag is discoverable via `ricky --help` under “Options”.<br>4. Unit tests cover parsing, output, and exit code.<br>5. Integration test runs on CI across supported OSes (Linux, macOS, Windows). |
| **Implementation Steps** 
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation & Auto‑Fix Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, config bundles, code modules) through deterministic validation, automatic remediation, and controlled re‑execution until all pass or a hard‑stop blocker is raised.

---

## 1. High‑Level Diagram

┌─────────────┐ 1. Discover ┌─────────────┐
│ Artifact │ ─────────────► │ Queue │
│ Source (S3/│ │ (Batch) │
│ Git, …) │ ◄───────────── │ │
└─────┬───────┘ 7. Report └─────┬───────┘
│ │
│ │
▼ ▼
┌─────────────┐ 2. Pull ┌─────────────┐ 4. Validate ┌─────────────┐
│ Worker Pool │ ◀────────── │ Dispatcher │ ─────────────► │ Validator │
│ (N parallel)│ │ (Task Split)│ │ (Determin‑ │
└─────┬───────┘ └─────┬───────┘ │ ised Rules)│
│ │ └─────┬─────┘
│ 5. Fix (if needed) │ │
▼ ▼ ▼
┌─────────────┐ 6. Re‑run ┌──
...[truncated]


**Must**

- Choose `dag`, `supervisor`, or `pipeline` deliberately based on the work shape.
- Explain the pattern choice in artifact metadata or a rationale.
- Use `dag` for validation/fix/rerun loops when dependencies matter.

**Must Not**

- Default blindly to `dag` for every workflow.
- Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-boundary-copy</code> (generation-quality/manual)</summary>

**User message**

> Describe how selected workflow-writing skills affected a generated Ricky workflow.

**Ricky output**

```text
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the skill does | Direct effect on the generated workflow | Example of the resulting artifact |
|-------|---------------------|----------------------------------------|-----------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output schemas (JSON Schema, OpenAPI, or TypedDict) for every step. | • Every node has a **contract block** that the engine validates before execution.<br>• Guarantees type‑safe hand‑offs, preventing silent data‑corruption. | ```yaml\nsteps:\n  - id: fetch_user\n    contract:\n      input: {user_id: string}\n      output: {name: string, email: string}\n``` |
| **Verification Gate Insertion** | Inserts automated checks (unit test, schema validation, policy lint) after each critical step. | • The workflow pauses at a **gate** until the check passes, turning runtime failures into compile‑time feedback.<br>• Enables “fail‑fast” behavior and clear error attribution. | ```yaml\n  - id: validate_user\n    type: gate\n    run: python -m jsonschema -i {{fetch_user.output}} schema/user.json\n``` |
| **Review Artifact G
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Workflow Draft – Request for Clarification**

Below is a minimal “contract‑first” scaffold that follows our AgentWorkforce standards (deterministic verification gates, review artifacts, 80‑100 validation loops, scoped PR boundaries, etc.). Because the product specification you provided is vague, I’ve highlighted the sections that need concrete details before the workflow can be finalized and committed.

| **Section** | **What’s Needed** | **Placeholder / Example** |
|-------------|-------------------|---------------------------|
| **Workflow Name & ID** | A concise, unique name and identifier. | `wf‑<product‑name>-v1` |
| **Purpose / Goal** | One‑sentence description of the business outcome. | “Automate onboarding of new SaaS customers.” |
| **Inputs** | List of required input artifacts (e.g., JSON payload, file, API call). | `customerInfo: JSON` |
| **Outputs** | Expected deliverables after successful run. | `onboardedAccountId: string` |
| **Stages & Tasks** | For each stage, define: <br>• Task name <br>• Owner (human or service) <br>• Deterministic verification gate (e.g., schema validation, unit test, contract check) <br>• Success/Failure criteria <br>• Review artifact (e.g.,
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Action Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣  | **Capture the failure context** – pull the run‑log, step‑metadata, and worker heartbeat for the timed‑out step and the “alive‑but‑idle” worker. | Provides a deterministic snapshot that can be reproduced and audited. | `failure_context.json` (log excerpts, timestamps, worker IDs, resource usage). |
| 2️⃣  | **Mark a hard blocker** in the workflow state machine. Set the run status to **BLOCKED** and attach the `failure_context.json`. | Guarantees downstream steps do not proceed on incomplete data and signals the coordination layer that human/automated intervention is required. | Updated workflow state entry (`status: BLOCKED`). |
| 3️⃣  | **Create a repair branch** off the latest stable commit of the workflow definition (e.g., `repair/timeout‑2023‑05‑09`). | Isolates the fix work, prevents race conditions with other PRs, and keeps the production branch clean. | Git branch `repair/timeout‑2023‑05‑09`. |
| 4️⃣  | **Generate a deterministic verification gate** that will run before any merge of the repair branch. The gate must: <br>• Replay the t
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Pre‑Launch Stale‑State Guard**

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development runs of any `.agent‑workflow/` (including `ricky run`, `ricky test`, `ricky debug`). |
| **Inputs** | • Working directory (root of the repo) <br>• Optional config `ricky.yaml` key `staleGuard.enabled` (default = true). |
| **Outputs** | • `status: PASS | FAIL` <br>• `stalePaths: []` (list of stale directories detected) <br>• `message: string` (human‑readable report). |
| **Success Criteria** | `status == PASS`. All detected stale directories must be either empty or explicitly ignored via config. |
| **Failure Action** | Abort the launch and emit a **Blocker Report** (see “Verification Gate” below). |

---

### 1. Detection Logic

```python
# pseudo‑code executed as the first step of every local workflow
def detect_stale_state(root: Path) -> Tuple[bool, List[Path]]:
    stale_dirs = [".agent-relay", ".relay", ".trajectories"]
    found = []
    for d in stale_dirs:
        dir_path = root / d
        if dir_path.is_dir():
            # consider stale if it contains any file *other* than .gitkeep
   
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Quick Fix for “Another Ricky/Relay Run Is Already Active”**

The marker you’re seeing is a lock file that Ricky creates to prevent overlapping executions. It means a previous run didn’t clean up properly (crashed, was cancelled, or left a stray process). Follow the deterministic clean‑up steps below; each step is a verified gate that you can run in isolation and confirm before moving to the next.

---

## 1️⃣ Verify the Current State

| Command | Expected Output |
|---------|-----------------|
| `ricky status` | Shows **ACTIVE** if a lock is present, otherwise **IDLE**. |
| `ps aux \| grep -i ricky` | Lists any lingering Ricky/Relay processes. |
| `cat .ricky/run.lock` *(if it exists)* | Shows the PID and timestamp of the lock holder. |

If `ricky status` reports **ACTIVE** **and** there is **no** matching PID in the process list, the lock is stale.

---

## 2️⃣ Safe Lock Removal (Deterministic Gate)

> **Goal:** Remove the stale lock *only* when you’re sure no legitimate run is in progress.

1. **Confirm no active PID**  
   ```bash
   LOCK_PID=$(cat .ricky/run.lock | cut -d' ' -f1)
   if ps -p $LOCK_PID > /dev/null; then
       echo "Process $LOCK_PID still running – aborting l
...[truncated]

Must

Report the active marker, run id, or status path when available.
Ask the user to inspect, wait for, or explicitly clear the active run.
Avoid launching a competing run that could corrupt evidence.

Must Not

Silently start another run.
Hide the existing run marker from the user.
Treat the conflict as a generic failure with no recovery path.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Prerequisites (verify once)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` CLI version | `aw --version` | `>=2.4.0` |
| Repository root (contains `.aw/` config) | `git rev-parse --show-toplevel` | Path printed |
| Docker (if workflow uses containers) | `docker version` | Client/Server version printed |
| Environment variables | `echo $AW_WORKSPACE` | Non‑empty path to workspace |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Input contract (what you must provide)

| Variable | Description | Example |
|----------|-------------|---------|
| `FAILED_STEP` | Exact name of the step that failed in the first attempt | `build-image` |
| `PREV_RUN_ID` | Run identifier of the previous (failed) execution | `run-20260508-01` |
| `WORKFLOW_FILE`
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts --tail 200` | Recent error stack, failed validation IDs, and any “blocker” messages. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Full step‑by‑step trace (including input‑output payloads) – keep this run separate from the original “no‑auto‑fix” run. |
| 3️⃣  | Open the workflow contract file `workflows/foo.contract.ts` (or `foo.schema.json` if you use JSON schema). | Verify that **all required inputs** are declared, types match, and any **deterministic verification gates** (e.g., `gate: "hasValidAuthToken"`) are satisfied. |
| 4️⃣  | Run the contract validator directly: <br>`ricky validate contracts/foo.contract.ts --input ./sample-data/foo.input.json` | Should return **PASS** for every gate. If any gate fails, note the gate name and the failing payload. |
| 5️⃣  | Check the **review artifact** for the most recent PR that touched `foo.ts`. <br>`ricky pr view <last‑pr‑id> --artifacts` | Look for “blocked by” comments, lint failures, or missing test coverage. |
| 6️⃣  | Run the unit‑test suite for the workflow: <br>`r
...[truncated]

Must

Preserve one-attempt behavior when auto-fix is disabled.
Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
Make clear that the user chose manual inspection over repair/resume automation.

Must Not

Start a repair loop despite --no-auto-fix.
Suppress the diagnosis because no repair was attempted.
Present the failure as a completed repair attempt.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky – Local Execution of a TypeScript Workflow Artifact**  
*(Primary local path = `./workflows` – the repository‑root default for all workflow source files.)*

---

## 1. Preconditions (contract “local‑exec‑pre‑check”)

| Item | Requirement | How Ricky verifies |
|------|-------------|--------------------|
| **Node runtime** | `node >=18` and `npm >=9` installed | `node -v` / `npm -v`; abort with *BLOCKER* if missing |
| **TypeScript compiler** | `tsc` available (via project dev‑dependency) | `npm ls typescript` → version check |
| **Workflow manifest** | `workflow.json` present in the same directory as the `.ts` file | `fs.existsSync('workflow.json')` |
| **Dependency lock** | `package-lock.json` up‑to‑date | `npm ci` succeeds; otherwise run `npm ci` automatically |
| **Clean build output** | No stale `dist/` artifacts | If `dist/` exists, run `npm run clean` (script must be defined) |
| **Execution entry point** | Exported async function named `run` (or as defined in `workflow.json.entry`) | Static analysis (`ts-node -e "import('./src/foo.ts').then(m=>console.log(typeof m.run))"` ) |

If any check fails, Ricky emits a **BLOCKER** report with a concise remediation guide and h
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for a Persistent Structural Failure**

When a structural failure remains after an initial fix attempt, Ricky must treat the situation as a **hard blocker** and transition to a controlled “boundary” state. The following contract‑driven workflow guarantees deterministic hand‑off, traceability, and rapid recovery.

---

## 1. Immediate Blocker Declaration (Deterministic Gate)

| Step | Action | Artifact | Owner |
|------|--------|----------|-------|
| 1.1 | **Raise a Blocker Issue** in the `#workflow‑issues` repo with the label `blocker‑structural‑failure`. | `ISSUE-<NNN>` – includes: <br>• Failure ID (hash) <br>• Original fix PR reference <br>• Re‑run logs showing unchanged error <br>• Impact assessment (services, SLA) | Ricky (automated) |
| 1.2 | **Add a “Boundary” tag** to the issue to signal that normal repair loops are suspended. | Tag `boundary‑active` | Ricky |
| 1.3 | **Notify** the designated Incident Lead (via Slack `#incidents`) and the Architecture Review Board (ARB). | Slack message with link to ISSUE‑NNN | Ricky |

*Gate condition*: No further automated repair attempts are launched until the blocker is cleared.

---

## 2. Scoped Iso
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Template & Example**

Below is a ready‑to‑use digest format that aggregates the health of multiple Ricky workflow runs. It follows the **Ricky repository conventions** (AGENTS.md), includes **deterministic verification gates**, **review artifacts**, and highlights any **blockers**. Use the template for any set of runs; replace the placeholder values with the actual data from your CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.).

---

## 1. Digest Header
| Field | Value |
|-------|-------|
| **Report Date** | `YYYY‑MM‑DD HH:MM UTC` |
| **Generated By** | `ricky‑health‑digest` (CLI vX.Y.Z) |
| **Scope** | `workflow‑run‑ids: [1234, 1235, 1236, …]` |
| **Environment** | `prod / staging / dev` |
| **Total Runs Analyzed** | `N` |
| **Overall Status** | `✅ Healthy` / `⚠️ Degraded` / `❌ Critical` |

---

## 2. Summary Statistics
| Metric | Value | Target / SLA |
|--------|-------|--------------|
| **Successful Runs** | `S / N` | ≥ 95 % |
| **Failed Runs** | `F / N` | ≤ 5 % |
| **Runs with Blockers** | `B / N` | 0 |
| **Average Duration** | `⌀ T minutes` | ≤ 30 min |
| **Verification Gate Pass Rate** | `V / N` | 100 % |
| **Review Artifact Coverage**
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives agents a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It is built as a **Slack App** that talks to the Ricky Core via the existing **Ricky‑API** (REST + WebSocket) and surfaces a **state‑machine view** of any workflow instance.

| Feature | Description |
|---------|-------------|
| **Live Instance Selector** | `/ricky debug list` → interactive dropdown of active workflow IDs (filtered by user’s org/team). |
| **State‑Tree Viewer** | Rendered as a Slack Block Kit “section + context” hierarchy showing current node, pending actions, and completed steps. |
| **Step‑Through Controls** | Buttons: **Pause**, **Resume**, **Step‑Forward**, **Step‑Back**, **Jump‑to‑Node**. |
| **Inline Patch** | Text‑area modal to edit a node’s input payload; submit triggers a **deterministic verification gate** before applying. |
| **Error‑Overlay** | If a node fails, a red banner appears with the exception stack, logs, and a **“Create Fix PR”** shortcut. |
| **Audit Trail**
...[truncated]

Must

Treat Slack as a co-equal product surface, not a wrapper around CLI.
Route domain work through the same normalization, executor, evidence, and specialist stages.
Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

Put workflow generation or diagnosis domain logic directly in the Slack handler.
Degrade Slack to a developer shortcut with weaker routing than CLI.
Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **type** | `string` (enum) | Fixed value `"web_submit_spec"` – identifies this hand‑off. |
| **payload** | `object` | Container for the data the UI sends to the workflow engine. |
| **payload.spec** | `object` | Full **WorkflowSpec** JSON (see `specs/workflow/v1.json`). |
| **payload.mode** | `string` (enum) | Execution mode – `"dry-run"` | `"run"` | `"preview"` |
| **payload.metadata** *(optional)* | `object` | Arbitrary key‑value pairs for tracing (e.g., `requestId`, `userId`). |

---

## 1. Contract Definition (YAML)

```yaml
# contracts/handoffs/web_submit_spec.yaml
type: object
required:
  - type
  - payload
properties:
  type:
    const: web_submit_spec
  payload:
    type: object
    required:
      - spec
      - mode
    properties:
      spec:
        $ref: "../specs/workflow/v1.json"
      mode:
        type: string
        enum: [dry-run, run, preview]
      metadata:
        type: object
        additionalProperties: true
additionalProperties: false

The contract lives under contracts/handoffs/ and is version‑controlled with the rest of the repo.

2

...[truncated]


**Must**

- Add a handoff variant that normalizes into `LocalInvocationRequest` or `CloudGenerateRequest`.
- Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.

**Must Not**

- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>surfaces-ingress.mcp-claude-context-is-metadata</code> (surfaces-ingress/manual)</summary>

**User message**

> A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

**Ricky output**

```text
**Decision:** Generate a new workflow draft.

**Rationale**

1. **Input Type** – You supplied a *workflow specification* together with *conversation context* but no existing workflow artifact to debug, restart, or analyze. The spec describes the intended steps, inputs, outputs, and verification gates, which is the prerequisite for authoring a fresh workflow.

2. **Action Mapping**  
   - **Generate** – Create a workflow from a specification.  
   - **Debug** – Requires an existing workflow with failing verification.  
   - **Restart** – Requires a halted execution instance.  
   - **Analyze** – Requires execution logs or performance data.  
   - **Coordinate** – Requires multiple dependent workflows or external agents needing orchestration.

3. **Current Need** – The spec is the source material; the appropriate next step is to author a concrete workflow contract that includes:
   - Deterministic verification gates (pre‑, post‑, and invariant checks).  
   - Scoped branch/PR boundaries (feature branch, PR title, reviewers).  
   - A validation loop count (target 80–100 iterations) for automated testing.  
   - Blocker reporting template (if any required resources are missing).

**Ne
...[truncated]

Must

Normalize the tool call through the MCP/Claude handoff path.
Treat conversation context as advisory metadata.
Make the spec itself sufficient for routing whenever possible.

Must Not

Require the user to rewrite the spec manually as a workflow.
Put domain routing logic inside the MCP tool handler itself.
Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`BAD_PROMPT`** (same numeric value).  
- All changes are limited to the **`cloud-ricky`** service repository; no cross‑service impact.

---

## 1. Contract Change (API Specification)

| Endpoint | Method | Success Payload (200) | New Field | Error Mapping |
|----------|--------|-----------------------|-----------|---------------|
| `/v1/generate` | POST | `{ "output": string, "metadata": object, "requestId": string }` | **`requestId`** – UUID v4, globally unique per request. | `BAD_PROMPT` (formerly `INVALID_PROMPT`) – 400, `{ "code":"BAD_PROMPT","message":"Prompt validation failed." }` |

*Version bump*: **v1.2.0** (semantic minor bump – backward‑compatible addition).

The OpenAPI fragment to be merged:

```yaml
paths:
  /v1/generate:
    post:
      summary: Generate content
      operationId: generateContent
      requestBody:
        $ref: '#/components/requestBodies/GenerateRequest'
      responses:
        '200':
          description: Successful generation
          content:
            application/js
...[truncated]

Must

Allow additive response fields within /api/v1/ricky/....
Reject renaming or c

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

kjgbot and others added 2 commits May 9, 2026 19:10

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 9, 2026

View reviewed changes

kjgbot closed this May 9, 2026

kjgbot mentioned this pull request May 9, 2026

fix(generation): fall back to deterministic renderer when persona writer throws #84

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auto-fix): re-enter generation when synthesized retry artifact is missing#83

fix(auto-fix): re-enter generation when synthesized retry artifact is missing#83
kjgbot wants to merge 2 commits into
mainfrom
fix/skip-generation-only-when-workflow-on-disk

kjgbot commented May 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 9, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

github-actions Bot commented May 9, 2026

1. Ricky’s execution path in local mode

2. What you ge

2️⃣ Run the Interactive Bootstrap Script

What the script does (contract)

2. Automated Placeholder Generation (Deterministic Contract)

2

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kjgbot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Closing in favor of #84

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

github-actions Bot commented May 9, 2026

Ricky Eval Review

Human Review Cases

1. Ricky’s execution path in local mode

2. What you ge

2️⃣ Run the Interactive Bootstrap Script

What the script does (contract)

2. Automated Placeholder Generation (Deterministic Contract)

2

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kjgbot commented May 9, 2026 •

edited

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading