Skip to content

fix(local-auto-fix): drop synthetic stage ids from startFromStep retry#82

Merged
khaliqgant merged 1 commit into
mainfrom
fix/auto-fix-loop-synthetic-step-ids
May 9, 2026
Merged

fix(local-auto-fix): drop synthetic stage ids from startFromStep retry#82
khaliqgant merged 1 commit into
mainfrom
fix/auto-fix-loop-synthetic-step-ids

Conversation

@kjgbot

@kjgbot kjgbot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • ricky --mode local --spec-file <path> --run retries blew up with startFrom step "local-runtime"/"runtime-launch" not found in workflow whenever the workflow process failed before any SDK step ran. The auto-fix loop was reading Ricky's internal stage labels back out of evidence and forwarding them to the SDK as --start-from <synthetic>. The SDK only knows real workflow steps, so seven retries failed identically and the user saw a misleading MISSING_BINARY at runtime-launch final blocker instead of the real classified failure.
  • Filter synthetic stage ids out at the boundary in failedStepFromEvidence so launch-phase failures retry from the start (with previousRunId still reused, matching specs/cli-auto-fix-and-resume.md).
  • Centralize the allowlist in src/local/synthetic-step-ids.ts (runtime-precheck, runtime-launch, local-runtime) so the next stage label added in entrypoint.ts has a single source of truth to register against.

Repro

From cloud@codex/msd-shared-sandbox-review-runtime-spec:

ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
…
Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff   # ×6 more
…
Execution: blocked — MISSING_BINARY at runtime-launch
Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)

Spec

Full spec under specs/auto-fix-synthetic-step-ids.md in this PR.

Test plan

  • npx vitest run src/local/auto-fix-loop.test.ts — 27 tests pass, including new parameterized cases for each synthetic id and direct isSyntheticStageId coverage
  • npm test (full suite) — 1028 tests pass across 49 files
  • npm run typecheck — clean
  • Manual rerun of the original repro: zero startFrom step "..." not found in workflow errors; final blocker (if any) reflects the real classified failure

When the local-runtime launch/precheck fails before any Agent Relay SDK
step starts, Ricky tags evidence with a synthetic stage id
('runtime-launch', 'runtime-precheck', or the 'local-runtime' fallback).
The auto-fix loop was reading those back out and forwarding them to the
SDK as `startFromStep` on every retry, so the SDK threw
`startFrom step "..." not found in workflow` seven times in a row before
giving up — the user saw a misleading `MISSING_BINARY at runtime-launch`
final blocker instead of the real classified failure.

Filter the synthetic ids out at the boundary in `failedStepFromEvidence`
so launch-phase failures retry from the start (with `previousRunId`
still reused, matching the cli-auto-fix-and-resume spec). Centralize the
allowlist in `src/local/synthetic-step-ids.ts` so the next stage label
added in `entrypoint.ts` has a single source of truth to register
against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@coderabbitai

coderabbitai Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements filtering of synthetic stage IDs (runtime-launch, local-runtime, runtime-precheck) to prevent their propagation as retry resume targets in the auto-fix loop. It adds a canonical synthetic-ID set, filters failed-step derivation to exclude synthetic IDs, updates existing tests, and adds new validation tests for the filtering behavior and synthetic-ID predicate.

Changes

Synthetic Stage ID Filtering

Layer / File(s) Summary
Specification
specs/auto-fix-synthetic-step-ids.md
Specification defining the issue, desired behavior, implementation approach with a canonical synthetic-ID set and filtering, unit/acceptance test cases, and acceptance criteria.
Data Contract
src/local/synthetic-step-ids.ts
Exports SYNTHETIC_LOCAL_STAGE_IDS set and isSyntheticStageId(id) predicate as single source of truth for synthetic local-runtime stage identifiers.
Core Implementation
src/local/auto-fix-loop.ts
Imports isSyntheticStageId and updates failedStepFromEvidence to return only non-synthetic failed steps, preventing synthetic IDs from becoming startFromStep resume targets.
Tests & Validation
src/local/auto-fix-loop.test.ts
Updates existing test scenarios from runtime-launch to install-deps as failed step, adds parameterized test ensuring synthetic IDs are not forwarded as startFromStep or failedStep, and adds test suite validating isSyntheticStageId predicate.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • AgentWorkforce/ricky#47: Both modify auto-fix loop failure propagation; this PR filters synthetic stage IDs while retrieved PR stops retries for external setup blockers.
  • AgentWorkforce/ricky#49: Both modify src/local/auto-fix-loop.ts and its test suite with changes to failure handling and retry logic.

Poem

🐰 ✨
Synthetic steps that shouldn't resume,
Now filtered out before the boom—
Real steps shine through the haze,
No phantom step errors for days! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: filtering synthetic stage IDs from the startFromStep retry parameter.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining the bug, the fix, and the test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/auto-fix-loop-synthetic-step-ids

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/local/auto-fix-loop.test.ts (1)

722-750: ⚡ Quick win

Add one mixed-failure regression case (synthetic + real failed steps).

Current cases validate “all synthetic” and “real only,” but not the mixed list scenario. A focused case would lock in the intended precedence and guard future regressions.

Suggested test addition
+  it('prefers a real failed step when evidence contains both synthetic and real failed steps', async () => {
+    // Build a response/evidence shape where both runtime-launch and a real SDK step fail,
+    // then assert retry.startFromStep uses the real step id.
+  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/auto-fix-loop.test.ts` around lines 722 - 750, Add a test for the
mixed synthetic+real failure case: create a new it.each or it block (similar to
the existing test that uses runWithAutoFix and runSingleAttempt) where
runSingleAttempt.mockResolvedValueOnce returns a blockerResponse that includes
both a synthetic stage id and a real failed step id (so the first attempt has
mixed failures) and then resolves to successResponse on the next call; assert
runSingleAttempt is called twice, that the retry object passed to the second
call (runSingleAttempt.mock.calls[1][0].retry) does not include startFromStep
(startFromStep is undefined) and that
workflowRepairer.mock.calls[0][0].failedStep equals the real failed step id
(confirming real-step precedence); reuse the same helpers and names from the
diff (runWithAutoFix, runSingleAttempt, workflowRepairer, blockerResponse,
successResponse) to locate and add the case.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/auto-fix-synthetic-step-ids.md`:
- Around line 9-18: The fenced code block containing the ricky CLI output needs
a language tag to satisfy markdownlint MD040; update the opening fence from ```
to ```text so the block that starts with "ricky --mode local --spec-file
docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run" and includes the
Error/Execution/Auto-fix lines is annotated as text. Ensure only the opening
fence is changed and leave the block contents and closing fence as-is.

---

Nitpick comments:
In `@src/local/auto-fix-loop.test.ts`:
- Around line 722-750: Add a test for the mixed synthetic+real failure case:
create a new it.each or it block (similar to the existing test that uses
runWithAutoFix and runSingleAttempt) where
runSingleAttempt.mockResolvedValueOnce returns a blockerResponse that includes
both a synthetic stage id and a real failed step id (so the first attempt has
mixed failures) and then resolves to successResponse on the next call; assert
runSingleAttempt is called twice, that the retry object passed to the second
call (runSingleAttempt.mock.calls[1][0].retry) does not include startFromStep
(startFromStep is undefined) and that
workflowRepairer.mock.calls[0][0].failedStep equals the real failed step id
(confirming real-step precedence); reuse the same helpers and names from the
diff (runWithAutoFix, runSingleAttempt, workflowRepairer, blockerResponse,
successResponse) to locate and add the case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a9bdd7f3-e54e-4ac6-a7ef-9015865fe7c8

📥 Commits

Reviewing files that changed from the base of the PR and between 36199fc and 28b55c2.

📒 Files selected for processing (4)
  • specs/auto-fix-synthetic-step-ids.md
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts
  • src/local/synthetic-step-ids.ts

Comment on lines +9 to +18
```
ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more
Execution: blocked — MISSING_BINARY at runtime-launch
Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced block (Line 9) to satisfy markdownlint MD040.

This avoids docs lint warnings/failures in CI.

Minimal fix
-```
+```text
 ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
 …
 Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
 …
 Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff   # ×6 more
 …
 Execution: blocked — MISSING_BINARY at runtime-launch
 Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @specs/auto-fix-synthetic-step-ids.md around lines 9 - 18, The fenced code
block containing the ricky CLI output needs a language tag to satisfy
markdownlint MD040; update the opening fence from totext so the block
that starts with "ricky --mode local --spec-file
docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run" and includes the
Error/Execution/Auto-fix lines is annotated as text. Ensure only the opening
fence is changed and leave the block contents and closing fence as-is.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- d98c2f50 -->

<!-- This is an auto-generated comment by CodeRabbit -->

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-09T17-10-52-003Z-HEAD-provider
Mode: provider
Git SHA: 99d9ec3

Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope & Goal**  
Update the Ricky repository so that the public documentation, inline code comments, and the `README.md` all state that Ricky now **leverages the Agent Assistant (AA) framework as a core dependency** for workflow orchestration, validation, and self‑healing. The change must be deterministic, testable, and isolated to a single feature branch.

---

## 1️⃣ Workflow Contract (RFC‑001)

| Item | Description |
|------|-------------|
| **Feature** | Deep integration of Agent Assistant (AA) into Ricky |
| **Owner** | `ricky-dev` |
| **Branch** | `feature/aa‑integration` |
| **PR Target** | `main` |
| **Artifacts** | 1. Updated `README.md` 2. Updated `docs/architecture.md` 3. Updated module doc‑strings in `ricky/engine.py` 4. New unit test `tests/test_aa_integration.py` |
| **Verification Gates** | 1. CI lint passes (`flake8`, `black --check`) 2. Unit test suite ≥ 95 % pass 3. Integration test `test_aa_integration` asserts that `RickyEngine` loads `AgentAssistant` via `AAProvider` 4. Documentation build (`mkdocs build`) succeeds |
| **Rollback** | If any gate fails, the PR is blocked; revert branch to `origin/main` and open a blocker issue (see §5). |
| **Reviewers** | `@a
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑09)**  

| Metric | Current State | Target / Trend | Comments |
|--------|---------------|----------------|----------|
| **Workflow contracts defined** | 12 of 18 active agents have a formal `turn‑context` contract (JSON‑Schema v1.3) | 100 % by Q4 2026 | 6 agents still use ad‑hoc context passing; they lack versioned schema and validation hooks. |
| **Deterministic verification gates** | 9 agents enforce a pre‑turn gate (`validateTurnContext`) and post‑turn gate (`assertContextIntegrity`) | 12 agents by Q3 2026 | Missing gates cause occasional “stale‑context” bugs in the recommendation engine. |
| **Review artifacts** | 78 % of PRs include a `turn‑context` impact checklist; 22 % omit it | 100 % compliance | The checklist is auto‑generated by the `ricky‑ci` plugin; failures are flagged as blockers. |
| **Validation loops** | Average of 62 validation loops per release cycle (target 80‑100) | Increase to ≥80 by adding “context‑drift” tests | Current loops focus on schema compliance; add runtime drift detection (e.g., stale user ID). |
| **Blocker reporting** | 3 blockers logged in the last sprint (all related to missi
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move the listed capabilities into the Agent Assistant**

| Capability | Fit with Agent Assistant | Rationale | Migration Path |
|------------|--------------------------|-----------|----------------|
| **Workflow generation** | ✅ Core | The Assistant already owns the “author‑as‑you‑type” experience and can emit concrete workflow contracts (YAML/JSON) on demand. Adding generation keeps the authoring loop in a single place and reduces hand‑off latency. | - Extend the `generateWorkflow` contract to accept a high‑level intent and return a validated contract.<br>- Add deterministic unit‑test scaffolding (e.g., `workflow.validate()` hook). |
| **Workflow validation** | ✅ Core | Validation is the deterministic gate that the Assistant already enforces before presenting a workflow to the user. Centralising it eliminates duplicate validators in downstream tools. | - Implement a `validateWorkflow` step that runs the 80‑100 loop internally and returns a structured report (pass/fail, line numbers, suggestions). |
| **Workflow debugging** | ✅ Core | Debugging is the natural follow‑up to validation. The Assistant can surface blocker evidence, suggest fixes, and optionally auto‑ap
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (Q4 2026)*  

| Phase | Duration | Goal | Concrete Deliverables | Contract & Verification | Review Artifacts | Blocker Policy |
|-------|----------|------|-----------------------|--------------------------|------------------|----------------|
| **1️⃣ Scope & Baseline** | wk 1‑2 | Define the slice scope, capture current reliability metrics, and lock the contract surface. | • Scope doc (features, metrics, stakeholder list)  <br>• Baseline KPI report (MTTR, success‑rate, latency)  <br>• Contract file `ricky/v2.0_contract.yaml` (inputs, outputs, SLAs) | Automated test‑suite validates that `ricky/v2.0_contract.yaml` is syntactically correct and all required fields are present. | • Scope review PR (owner → PM, QA)  <br>• Baseline KPI dashboard (shared link) | Any missing KPI source or undefined SLA → raise **Blocker‑001** and halt progression. |
| **2️⃣ Architecture & CI/CD** | wk 3‑4 | Build deterministic pipelines and verification gates. | • Terraform/Helm manifests for a dedicated “Ricky‑v2” namespace.  <br>• GitHub Actions workflow `ricky-ci.yml` with: <br> - Lint → `yamllint`, `golint` <br> - Unit 
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Slack / Web Support Architecture  
*Using the Agent Assistant (AA) package ecosystem and adhering to the Ricky repository conventions (AGENTS.md, workflow standards, shared authoring rules).*

---

### 1. High‑Level Overview  

| Component | Role | AA Package | Primary Interface |
|-----------|------|------------|-------------------|
| **Ricky‑Core** | Deterministic workflow engine (state machine, validation gates) | `aa-workflow` | Internal API |
| **Ricky‑Slack‑Bot** | Real‑time Slack interaction (slash commands, message actions) | `aa-slack` | Slack Events API / Bolt |
| **Ricky‑Web‑Portal** | Self‑service UI for ticket creation, status, and review | `aa-web` (React + FastAPI) | HTTPS (REST/GraphQL) |
| **Ricky‑Auth** | Central auth & RBAC (OAuth2, SSO) | `aa-auth` | JWT / OIDC |
| **Ricky‑Docs** | Auto‑generated contract & run‑book docs | `aa-docgen` | Markdown/HTML |
| **Ricky‑CI** | CI/CD pipeline with deterministic verification gates | `aa-ci` | GitHub Actions / GitLab CI |
| **Ricky‑Observability** | Metrics, logs, trace, alerting | `aa-telemetry` | Prometheus, Loki, Grafana |

All components are version‑locked via a **monorepo** (`/ricky`) with a **single
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?

Step Description Outcome
1️⃣ Parse arguments – Ricky reads --mode local and the free‑form --spec string. The spec is stored as a workflow contract request.
2️⃣ Enter dry‑run mode – Because --run is omitted, Ricky defaults to validation‑only mode. No workflow is executed; only generation and verification steps are performed.
3️⃣ Generate workflow skeleton – Using the built‑in “Package‑Check” template, Ricky creates a YAML/JSON workflow definition that matches the requested checks (lint, unit‑test, dependency audit, etc.). The workflow file is written to ./ricky_output/<timestamp>/workflow.yaml (or displayed on stdout if --output=stdout).
4️⃣ Deterministic verification gates – Ricky runs the contract validator:
• Ensures required stages (checkout, install, lint, test, audit) are present.
• Checks that each stage has a deterministic id, runs-on, and steps list.
• Verifies that no circular dependencies exist.
If
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each step is a deterministic gate that either succeeds (green ✅) or fails with an explicit blocker message (red ❌). All artefacts are version‑controlled under `onboarding/` and the PR is scoped to `feat/onboarding‑cli‑v1`.

---

## 1️⃣ Repository Layout (branch: `feat/onboarding-cli-v1`)

onboarding/
├── README.md # High‑level onboarding guide (this file)
├── .ricky/
│ ├── config.yaml # Default config template
│ └── secrets.example # Example secrets file (git‑ignored)
├── scripts/
│ ├── 01-verify-prereqs.sh # System checks
│ ├── 02-setup-env.sh # Virtual‑env & deps
│ ├── 03-init-config.sh # Config generation
│ └── 04-verify-install.sh# Post‑install validation
└── .github/
└── workflows/
└── onboarding.yml # CI gate for the onboarding PR


All scripts are **POSIX‑sh** compatible and exit with `0` on success, non‑zero on failure.

---

## 2️⃣ Deterministic Verification Gates

| Gate | Script | Success ✅ | Failure ❌ (Blocker) |
|------|------
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Handling an Empty or Missing Specification File**

When a workflow author submits an empty spec or a spec file that cannot be found, follow the deterministic, contract‑driven process below. The steps are designed to keep the repository clean, surface blockers early, and enforce the 80‑to‑100 validation‑loop cadence required by the **AGENTS.md** standards.

---

## 1. Immediate Detection (Gate 1)

| Trigger | Action |
|--------|--------|
| `spec/` directory missing **or** `spec/<workflow>.yaml` exists but is zero bytes | ‑ Fail the *Spec‑Presence* gate. <br>‑ Log a **Blocker** in the CI run with the tag `#spec-missing`. <br>‑ Post a comment on the PR (or create a draft PR) that includes the exact error message and a link to the “Spec‑Missing” checklist. |

**Error message template**

🚨 BLOCKER: Spec file not found or empty.
Path: spec/.yaml
Reason: Required for deterministic validation and contract generation.
Action: Provide a non‑empty spec or delete the placeholder PR.


---

## 2. Automated Recovery Stub (Gate 2)

If the author is unavailable (e.g., CI runs on a schedule), generate a **minimal stub** so the repository stays buildable.

1. **Create** `spec/<wo
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Last Run
• Workflow Engine : idle
• Last Execution : N/A

🗂 Configuration
• Active Branches : main
• Pending PRs : 0
• Validation Loops : 0 / 80‑100 (not started)

⚠️ Blockers
• Google provider connection not confirmed.
• GitHub provider connection not confirmed.
• Workflow execution halted until at least one provider is verified.

📝 Next Steps

  1. Run provider health checks:
    • ricky check google
    • ricky check github
  2. Resolve any authentication or network issues reported.
  3. Re‑run ricky status to confirm connectivity before triggering workflows.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` HEAD, ticket ID (e.g., `GH‑1234`) | New branch `feat/gh-1234‑<short‑desc>` | `git branch -c` succeeds; branch name matches regex `^feat/gh-\d+-[a-z0-9-]+$` |
| 2️⃣ | **Ricky (GitOps)** | **Sync latest `main`** | Branch from step 1 | Branch up‑to‑date with `main` | `git merge-base --is-ancestor main <branch>` passes |
| 3️⃣ | **Developer** | **Implement change** | Branch, code spec, test plan | Modified files, new tests | Local lint (`npm run lint` / `flake8`) returns 0 |
| 4️⃣ | **Ricky (CI‑Orchestrator)** | **Run pre‑commit CI** | Commit(s) from step 3 | CI status badge | CI pipeline (`.github/workflows/ci.yml`) must report **success** within 5 min |
| 5️⃣ | **Ricky (Review‑Gatekeeper)** | **Open PR** | Branch, PR template, reviewers list | PR # opened | PR title matches `^<type>(\(scope\))?: .+` and description contains “Fixes #<ticket>” |
| 6️⃣ | **Ricky (Review‑Coordinator)** | **Assign reviewers** | PR #, reviewer matrix (code‑owner, security
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** `review‑and‑implement‑claude‑codex`

**Purpose**  
Automate a two‑stage development cycle where Claude (LLM‑review) validates design/spec changes and Codex (code‑gen) produces the implementation. The workflow is deterministic, auditable, and bounded by explicit verification gates.

---

## 1. Contract Overview

| Element | Definition |
|---------|------------|
| **Input** | `spec.yaml` – a validated OpenAPI‑style specification (JSON/YAML) describing the feature or bug fix. |
| **Outputs** | 1. `review_report.md` – Claude’s review with acceptance criteria and change requests.<br>2. `implementation.zip` – generated source files from Codex.<br>3. `ci_report.json` – CI test results for the generated code. |
| **Success Criteria** | • Claude returns `status: APPROVED` with no “BLOCKER” items.<br>• Codex produces code that passes all CI tests (0 failures, coverage ≥ 80%). |
| **Failure Modes** | • Claude returns `status: BLOCKED` → workflow aborts, creates a `BLOCKER` issue.<br>• CI fails → automatic rollback to previous stable commit and open a `RETRY` PR. |

---

## 2. Branch / PR Boundaries

| Stage | Branch | PR Title Template |
|-------|--------|-----------------
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing hook** that takes a freshly generated workflow (JSON/YAML) and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. It is meant to:

1. **Make the intent of each task crystal‑clear** (who, what, why, and any required inputs).  
2. **Add concrete, testable acceptance criteria** that can be automatically verified by the CI‑gate.  
3. **Keep the contract stable** so downstream reviewers and bots have a single source of truth.

Below is a step‑by‑step contract for invoking `--refine`, the expected input/output schema, and the verification gates you should embed in your CI pipeline.

---

## 1. Invocation Contract  

```bash
agentworkforce generate workflow.yaml --refine > refined-workflow.yaml
Parameter Required? Description
workflow.yaml The raw workflow file produced by any generator (e.g., agentworkforce generate). Must conform to the Workflow v2 schema (see schemas/workflow_v2.json).
--refine Switch that triggers the refinement pass.
`> refi
...[truncated]

**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Release Pipeline**  
*(Ricky AgentWorkforce – v0.9.0)*  

---  

### 1. Scope & Objective
Create a deterministic, test‑driven workflow that defines, validates, and releases the `ricky --version` command. The command must:

1. Print the current Ricky version string in **semantic‑version** format (`MAJOR.MINOR.PATCH`).
2. Exit with status **0** on success, **1** on any error (e.g., corrupted version file).
3. Be reproducible across all supported platforms (Linux, macOS, Windows) and CI environments.

### 2. Contract (Input / Output)

| Element | Description |
|---------|-------------|
| **Command** | `ricky --version` |
| **Input** | No positional arguments, no environment variables required. |
| **Output** | `stdout`: `<MAJOR>.<MINOR>.<PATCH>` (e.g., `2.4.1`) <br> `stderr`: empty on success, error message on failure. |
| **Exit Code** | `0` on success, `1` on failure (e.g., missing `VERSION` file, malformed version). |
| **Side‑effects** | None (read‑only). |
| **Platform Guarantees** | Identical output on Linux (glibc), macOS (Darwin), Windows (PowerShell/CMD). |

### 3. Source of Truth
- `src/ricky/__init__.py` contains `__version__ = 
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, model checkpoints) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all artifacts reach a **PASS** state or are reported as **BLOCKED**.

---

## 1. Scope & Boundaries
| Aspect | Definition |
|--------|------------|
| **Artifacts** | Any unit that can be validated independently (e.g., `*.json`, `*.py`, `*.ckpt`). |
| **Batch size** | Configurable; default 100 artifacts per PR/branch. |
| **Branch/PR** | Each batch lives in its own feature branch `batch/<batch‑id>` and opens a PR titled `Batch <batch‑id>: Validation Loop`. |
| **Owner** | `@team‑artifact‑ops`. |
| **Gatekeeper** | CI job `validation‑gate` that must pass before merge. |
| **Blocker** | If an artifact cannot be auto‑fixed after N attempts, it is labeled `blocker` and the PR is halted for manual triage. |

---

## 2. Contract (YAML) – Machine‑readable definition

```yaml
name: batch-artifact-validation-loop
description: >
  Deterministic validation/fix/rerun loop for independent artifacts.
version: 1.0.0
inputs:
  -
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|-------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step declares **inputs, outputs, and success criteria** in a typed JSON schema. | The workflow begins with a `contract` block that lists required parameters (`sourceRepo`, `targetBranch`, `reviewers`) and the exact shape of the artifact (`prUrl: string`). Down‑stream steps can reference these contracts deterministically, eliminating ambiguous “magic strings”. |
| **Deterministic Verification Gates** | After each logical unit, a **gate** runs a pure‑function check that must return `true` before the pipeline proceeds. | After the *lint* step, a `gate lintPasses` runs `npm run lint && test -f lint-report.json`. If the gate fails, the workflow aborts with a clear `BLOCKER: Lint errors`. This prevents downstream steps (e.g., `build`) from executing on broken code. |
| **Review Artifacts Generation** | Mandates creation of **human‑readable artifacts** (HTML reports, SARIF files) that are automatically attached to 
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

---

### 📋 Quick Summary  
To turn a product specification into a reliable, review‑ready workflow we need a concrete set of inputs. With a vague spec we can’t safely generate deterministic contracts, verification gates, or review artifacts. Below is a concise checklist of the minimal details required to produce a complete workflow that meets our standards (80‑100 validation loops, scoped PR boundaries, blocker reporting, etc.).

---

## Required Specification Details

| Category | What to Provide | Why It Matters |
|----------|----------------|----------------|
| **Goal & Scope** | One‑sentence description of the product’s primary outcome and the functional boundaries of the workflow. | Defines the workflow’s start/end points and limits the PR scope. |
| **Actors & Roles** | List of all human or system actors (e.g., `frontend`, `payment‑service`, `audit‑team`). | Determines responsibility assignments and review gate owners. |
| **Key Steps / Milestones** | Ordered list of high‑level steps (e.g., `collect‑input → validate → persist → notify`). | Forms the backbone of the workflow DAG. |
| **Inputs & Outputs** | For each step: input artifac
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response to the Failure**

| Phase | Action | Rationale | Artifact |
|-------|--------|-----------|----------|
| 1️⃣ Detect & Log | • Capture the timeout event (step X) and the “alive‑but‑idle” worker (step Y). <br>• Append a **Failure Log** entry to the workflow run’s audit trail with timestamps, error codes, and the worker’s last heartbeat. | Provides an immutable record for post‑mortem and for the deterministic verification gate. | `audit.log` entry, `run‑<id>.json` update |
| 2️⃣ Blocker Report | • Mark the run as **BLOCKED** in the orchestration DB. <br>• Emit a **Blocker Notification** (Slack/Teams) that includes: <br>  – Step name, timeout value, actual duration <br>  – Worker ID, last‑seen artifact hash (if any) <br>  – Suggested “repair‑branch” name (e.g., `repair/timeout‑X‑idle‑Y`). | Guarantees that downstream consumers do not consume partial output and that the team is immediately aware. | `BLOCKER.md` (auto‑generated) |
| 3️⃣ Deterministic Verification Gate | • Insert a **Gate** after step X that checks: <br>  – `step_X.status == SUCCESS` <br>  – `step_X.duration <= timeout_limit` <br>• Insert a **Gate** after step Y that checks: <br>  – `artifact_p
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Pre‑Launch Stale‑State Guard**

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development machine, executed **before** any workflow launch (`ricky run …`). |
| **Inputs** | <ul><li>`workspaceRoot: string` – absolute path to the repository root (inferred from `process.cwd()`).</li></ul> |
| **Outputs** | <ul><li>`isClean: boolean` – `true` if no stale state is found.</li><li>`stalePaths: string[]` – list of directories/files that are considered stale.</li></ul> |
| **Success Criteria** | `isClean === true`. If `false`, the workflow must abort and emit a **Blocker Report** (see “Blocker Reporting” below). |
| **Deterministic Verification Gate** | The guard must be **pure** with respect to the file‑system snapshot taken at start‑up: given the same snapshot it always returns the same `isClean`/`stalePaths`. No network calls, no random values. |
| **Idempotence** | Re‑running the guard without intervening changes yields identical output. |
| **Performance SLA** | Must complete ≤ 200 ms on a typical dev box (SSD, ≤ 10 k files). |
| **Error Handling** | Any unexpected I/O error is treated as a
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a safety file created by the **Ricky** CLI to prevent overlapping executions. When the file is left behind (e.g., a crash, a killed process, or a stale CI workspace) the next invocation aborts with the message you’re seeing.

Below is the deterministic, step‑by‑step contract for clearing the marker safely, verifying that no other run is truly active, and getting your workspace back to a clean state.

---

## 1️⃣ Verify that no Ricky/Relay process is actually running

| Check | Command | Expected result |
|-------|---------|-----------------|
| Local OS process list | `ps aux \| grep -i ricky` | No `ricky` or `relay` PID other than the `grep` line |
| Docker (if you use containerised runs) | `docker ps \| grep ricky` | No container named `ricky‑run` (or similar) |
| CI runner (GitHub Actions, GitLab CI, etc.) | Look at the pipeline UI for a running job in the same branch | No active job labeled “Ricky” or “Relay” |

If you **do** see a live process, let it finish or kill it deliberately (e.g., `kill -SIGTERM <PID>`). Do **not** delete the marker while a real run is in pro
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script). It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failing workflow artifact is stored in `./artifacts/<workflow‑id>.json`.  

---

## 1️⃣ Preconditions (verify before you start)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI version** | `aw --version` | `>=2.4.0` (auto‑fix introduced in v2.4) |
| **Workflow definition** | `cat ./artifacts/<workflow-id>.json` | Valid JSON, contains `steps` array |
| **Previous run ID** | `echo $PREV_RUN_ID` | Non‑empty string (e.g., `run‑20240531‑a1b2c3`) |
| **Writable workdir** | `test -w ./workdir && echo ok` | `ok` |
| **Docker / sandbox** (if workflow uses containers) | `docker ps` | Docker daemon reachable |

If any check fails, abort and fix the environment before proceeding.

---

## 2️⃣ Execution contract

```bash
#!/usr/bin/env bash
set -euo pipefail   # deterministic failure handling
IFS=$'\n\t'

# ---- INPUTS -------------------------------------------------
WORKFLOW_ID="my‑workflow‑123"                     # <-- replace
ARTIFA
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

1. **Capture the exact failure**  
   ```bash
   ricky run workflows/foo.ts --no-auto-fix --log-level=debug > run.log 2>&1
  • run.log now contains the full console output, the serialized workflow graph, and the raw error payloads.
  1. Locate the failing node
    Open run.log and search for the line that starts with ✖︎ (Ricky’s failure marker). It will look like:

    ✖︎  Node: <node-id>  Step: <step-name>
    Error: <error‑message>
    

    Note the node‑id and step‑name – they map 1‑to‑1 to the contract entry in workflows/foo.ts.

  2. Validate the contract for that node

    // workflows/foo.ts
    export const foo = workflow({
      input: z.object({ /* … */ }),
      output: z.object({ /* … */ })
    }, async (ctx) => {
      // …
    });
    • Ensure the input schema matches what the previous node actually emits.
    • Ensure the output schema matches the expectations of downstream nodes.

    If the schemas diverge, the deterministic verification gate will raise a ContractMismatchError. Fix the schema or add a transformation step.

  3. Run the node in isolation

    ricky

...[truncated]


**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Executing a Local TypeScript Workflow Artifact in the Primary Local Path**

Below is the concrete, step‑by‑step contract that Ricky (the AgentWorkforce workflow reliability assistant) follows to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The procedure is deterministic, auditable, and produces the required verification artifacts.

---

## 1. Preconditions (Input Contract)

| Item | Requirement | Source |
|------|-------------|--------|
| **Workflow source** | A single TypeScript file (or an `index.ts` entry point) that exports a `run()` async function. | `./workflows/<workflow‑name>/` |
| **Node runtime** | Node ≥ 18, with `npm`/`pnpm`/`yarn` available. | System PATH |
| **Package manifest** | `package.json` present in the workflow directory (or inherited from repo root). Must list all runtime dependencies. | `./workflows/<workflow‑name>/package.json` |
| **TypeScript config** | `tsconfig.json` present (or fallback to repo root). Must target ES2022 or later. | `./workflows/<workflow‑name>/tsconfig.json` |
| **Execution context** | Optional `.env` file for secrets; must be validated against the workflow’s `envSche
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky follows a deterministic, contract‑driven escalation path. The process is bounded, auditable, and designed to surface blockers early while preserving branch and PR hygiene.

---

## 1. Failure Detection & Initial Fix Attempt

| Step | Action | Artifact |
|------|--------|----------|
| 1.1 | Run the **Structural‑Integrity Test Suite (SITS)** on the target branch. | `sits-report.json` |
| 1.2 | If any test fails, create a **Fix‑Attempt Ticket (FAT‑\<id\>)** in the workflow tracker. | `FAT‑<id>.md` (includes failing test IDs, logs, and proposed code change). |
| 1.3 | Apply the fix in a **dedicated “fix‑attempt” branch** off the target branch. | `fix-attempt/<target‑branch>/<FAT‑id>` |
| 1.4 | Submit a **Fix‑Attempt PR (FA‑PR)** against the target branch. | PR title: `FA‑<id>: <short description>` |
| 1.5 | Gate: **FA‑PR must pass all SITS checks** (automated CI). If it passes, merge and close FAT. If it fails, proceed to Step 2.

---

## 2. Persistent Failure – Boundary Detection

Ricky automatically detects that the failure persists after 
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Template & Example**

Below is a ready‑to‑use digest format that aggregates the health of multiple Ricky workflow runs. It follows the **Ricky repository conventions** (AGENTS.md), includes **deterministic verification gates**, **review artifacts**, and highlights any **blockers**. Use the template for any set of runs; replace the placeholder data with the actual values from your CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.).

---

## 1. Digest Overview

| Metric | Value | Target / SLA |
|--------|-------|--------------|
| Total Runs (last 24 h) | `{{total_runs}}` | — |
| Successful Runs | `{{success_count}}` ({{success_pct}} %) | ≥ 95 % |
| Failed Runs | `{{fail_count}}` ({{fail_pct}} %) | ≤ 5 % |
| Blocked Runs* | `{{blocked_count}}` | 0 |
| Avg. Duration | `{{avg_duration}}` | ≤ 10 min |
| Verification Gate Pass Rate | `{{gate_pass_pct}}` % | ≥ 98 % |
| Review Artifacts Produced | `{{review_artifacts}}` | ≥ 1 per PR |

\*A *blocked run* is one that stopped at a deterministic verification gate because a required artifact or external dependency was missing.

---

## 2. Run‑Level Summary (chronological)

| Run ID | Branch | Trigger | Statu
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Specification*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers, product managers, and ops a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. Surface Entry Points

| Trigger | Slack Command | Payload | Result |
|---------|---------------|---------|--------|
| **Start Debug Session** | `/ricky debug <workflow‑id>` | `{workflowId, requesterId, channelId}` | Bot posts an interactive message with session controls. |
| **Step Forward** | Button “▶️ Step” | `{sessionId, stepIndex}` | Executes next step, returns deterministic output. |
| **Step Back** | Button “◀️ Back” | `{sessionId, stepIndex}` | Re‑runs previous step from stored snapshot. |
| **Inject State** | Button “🔧 Inject” → Modal | `{sessionId, key, value}` | Updates in‑memory state for the next step. |
| **Abort** | Button
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Web‑Handoff Type – `workflowSubmit`**

| Element | Definition |
|---------|------------|
| **Name** | `workflowSubmit` |
| **Category** | Web → AgentWorkforce handoff |
| **Purpose** | Allows a client‑side UI (or any web consumer) to submit a complete workflow specification together with an execution *mode* (e.g., `dryRun`, `live`, `test`). The backend validates the spec, stores it in the workflow registry, and returns a deterministic handoff receipt. |
| **Payload Schema** | ```json { "type": "object", "properties": { "workflowId": { "type": "string", "pattern": "^[a-z0-9\\-]{3,64}$" }, "spec": { "$ref": "#/definitions/WorkflowSpec" }, "mode": { "type": "string", "enum": ["dryRun","live","test"] }, "metadata": { "type": "object", "additionalProperties": true } }, "required": ["workflowId","spec","mode"], "additionalProperties": false } ```<br>**`WorkflowSpec`** follows the existing internal spec contract (see `specs/workflow.yaml`). |
| **Response Contract** | ```json { "type": "object", "properties": { "receiptId": { "type": "string", "format": "uuid" }, "status": { "type": "string", "enum": ["accepted","rejected"] }, "validationErrors": { "type": "array", "items": { "type"
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Claude‑to‑Ricky Hand‑off**

| **Input Characteristics** | **Recommended Ricky Action** | **Rationale** |
|----------------------------|------------------------------|---------------|
| **Complete, version‑controlled workflow spec** (YAML/JSON) **+ clear intent** (e.g., “run this pipeline”) | **Generate** | The spec is ready for materialization. Create a new branch, add the workflow files, and open a PR with the standard contract (see *AGENTS.md*). |
| **Spec contains syntax errors, missing required fields, or failing schema validation** | **Debug** | Run the deterministic verification gate (schema + unit‑test harness). Return a blocker report with line‑by‑line fixes. |
| **Existing workflow has been deployed but is stuck, timed‑out, or producing incorrect outputs** | **Restart** | Trigger a controlled restart: create a “restart‑<timestamp>” branch, add a `restart` flag in the workflow metadata, and schedule a fresh execution after a clean‑up step. |
| **User supplies logs, execution traces, or performance metrics without a new spec** | **Analyze** | Perform a post‑mortem: run the verification gate in *analysis mode*, generate a diagnostic report, and sugge
...[truncated]

Must

  • Normalize the tool call through the MCP/Claude handoff path.
  • Treat conversation context as advisory metadata.
  • Make the spec itself sufficient for routing whenever possible.

Must Not

  • Require the user to rewrite the spec manually as a workflow.
  • Put domain routing logic inside the MCP tool handler itself.
  • Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_FAILED`** in the error‑response schema and all references.  

All changes are limited to the **`cloud-ricky`** service repository (no cross‑service impact).  

---  

## 1. Contract Update (API Specification)

| Element | Current | Updated |
|---------|---------|---------|
| **Success payload** | ```json { "output": "...", "model": "gpt‑4", "usage": { … } } ``` | ```json { "output": "...", "model": "gpt‑4", "usage": { … }, "requestId": "550e8400-e29b-41d4-a716-446655440000" } ``` |
| **Error object** | ```json { "code": "INVALID_PROMPT", "message": "Prompt is malformed." } ``` | ```json { "code": "PROMPT_VALIDATION_FAILED", "message": "Prompt is malformed." } ``` |

*Schema files* (`openapi.yaml`, `components/schemas/GenerateResponse.yaml`, `components/schemas/ErrorResponse.yaml`) must be updated accordingly.  

**Version bump** – bump the OpenAPI minor version: `v1.2.0 → v1.3.0`.  

---  

## 2. Code Changes  

| File | Change |
|------|--------|
| `src/handlers/generateHandler.ts` | - Ge
...[truncated]

Must

  • Allow additive response fields within /api/v1/ricky/....
  • Reject renaming or changing the meaning of an existing error code inside the same version.
  • Require a new API version for breaking request/response changes.

Must Not

  • Remove or rename existing response fields in v1.
  • Add a

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

@khaliqgant khaliqgant merged commit bffb063 into main May 9, 2026
3 checks passed
@khaliqgant khaliqgant deleted the fix/auto-fix-loop-synthetic-step-ids branch May 9, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants