Skip to content

fix(auto-fix): re-enter generation when synthesized retry artifact is missing#83

Closed
kjgbot wants to merge 2 commits into
mainfrom
fix/skip-generation-only-when-workflow-on-disk
Closed

fix(auto-fix): re-enter generation when synthesized retry artifact is missing#83
kjgbot wants to merge 2 commits into
mainfrom
fix/skip-generation-only-when-workflow-on-disk

Conversation

@kjgbot

@kjgbot kjgbot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Closing in favor of #84

I dug deeper after a follow-up question and the gate-side fix in this PR doesn't actually solve the underlying bug — it only makes auto-fix retries try to re-render, but the workforce-persona writer would just keep failing the same way.

The real root cause: in src/product/generation/pipeline.ts, when the workforce-persona writer throws (opencode/claude CLI errored, returned non-completed status, malformed output), generateWithWorkforcePersona's catch block returns success: false with a blocking WORKFORCE_PERSONA_WRITER_FAILED issue. That discards the perfectly valid baseResult.artifact (deterministic template render) we'd already produced. entrypoint.execute() then early-returns at the !success || !artifact check at src/local/entrypoint.ts:1083 without writing anything to disk. Auto-fix loops until the budget burns, all for the same INVALID_ARTIFACT reason.

End-to-end trace from the user's repro that proves this:

[trace] gen call start workforcePersona=true
[trace] gen call returned success=false hasArtifact=true artifactPath=workflows/generated/ricky-...ts validErrors=1
[trace] gen EARLY-RETURN no write success=false hasArtifact=true firstErr=Workforce persona writer did not complete: failed.

The fix in #84 makes the catch block fall back to the deterministic baseResult — success: true, artifact intact, persona failure surfaced as a warning. This mirrors the existing pre-write validation fallback at pipeline.ts:154-166, which already does exactly this when the persona's output fails repair-attempt validation.

Closing this without merging.

kjgbot and others added 2 commits May 9, 2026 19:10
When the local-runtime launch/precheck fails before any Agent Relay SDK
step starts, Ricky tags evidence with a synthetic stage id
('runtime-launch', 'runtime-precheck', or the 'local-runtime' fallback).
The auto-fix loop was reading those back out and forwarding them to the
SDK as `startFromStep` on every retry, so the SDK threw
`startFrom step "..." not found in workflow` seven times in a row before
giving up — the user saw a misleading `MISSING_BINARY at runtime-launch`
final blocker instead of the real classified failure.

Filter the synthetic ids out at the boundary in `failedStepFromEvidence`
so launch-phase failures retry from the start (with `previousRunId`
still reused, matching the cli-auto-fix-and-resume spec). Centralize the
allowlist in `src/local/synthetic-step-ids.ts` so the next stage label
added in `entrypoint.ts` has a single source of truth to register
against.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… missing

When `ricky --mode local --spec-file <foo>.md --run` runs against a spec
whose content the parser scores as `debug` intent (any spec describing
"failed" / "failure" / "broken" without explicit "create/build/author
workflow" phrasing), and attempt 1 produces a generation artifact in
memory but doesn't actually write it to disk (e.g. validation failed,
workforce-persona writer threw, an early-return path fired), the
auto-fix loop synthesizes a retry via retryBaseRequest that:

- flips `source` to 'workflow-artifact'
- sets `specPath` to the never-written `workflows/generated/...ts`
- stamps `metadata.autoFixGeneratedFrom = <prior source>`

On the retry, `workflowFileForRoute` returns specPath because it
matches `isExecutableWorkflowPath`, the routing target stays at 'debug'
(spec content is unchanged), and the gate
`target === 'generate' || !workflowFile` evaluates false / SKIPS
generation. Runtime-precheck then fails INVALID_ARTIFACT because the
file isn't on disk. All 7 retries fail identically.

Repro from cwd /Users/khaliqgant/Projects/My-Senior-Dev/app:
  ricky --mode local --spec-file docs/plans/2026-05-05-proactive-pr-remediation-spec.md --run
  Generation: ok - workflows/generated/ricky-...ts
  Execution: blocked - INVALID_ARTIFACT at runtime-precheck
  Auto-fix: stopped after 7/7 attempt(s) (INVALID_ARTIFACT)

Fix: when the request was synthesized by an auto-fix retry
(autoFixGeneratedFrom is set) AND the workflowFile doesn't exist on
disk AND routing isn't already 'generate', fall through to the
generation block instead of letting precheck fail. User-provided
workflow-artifact handoffs that name a missing file still surface the
runtime-precheck blocker (preserved by the existing artifact-run
regression tests at src/local/entrypoint.test.ts:3703 and :3862).

Two new regression tests cover both branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR prevents synthetic local stage IDs (runtime-launch, local-runtime, runtime-precheck) from being forwarded as resume targets on auto-fix retries. It introduces a centralized synthetic ID definition, filters synthetic-only failures at retry boundary, and re-enters generation for missing workflow artifacts in auto-fix scenarios.

Changes

Fix Synthetic Step ID Propagation in Auto-Fix Retries

Layer / File(s) Summary
Synthetic ID Contract
src/local/synthetic-step-ids.ts
Exports SYNTHETIC_LOCAL_STAGE_IDS set and isSyntheticStageId() predicate as a centralized source-of-truth for detecting synthetic step IDs.
Specification
specs/auto-fix-synthetic-step-ids.md
Documents the bug where synthetic IDs cause "step not found" errors on retry, defines target behavior (omit startFromStep for synthetic-only failures), and outlines the fix: filter at failedStepFromEvidence boundary.
Auto-Fix Loop Filter
src/local/auto-fix-loop.ts
Updates failedStepFromEvidence to skip synthetic step IDs and return the first real failed step; omits startFromStep from retry when only synthetic failures exist.
Workflow Artifact Re-Entry
src/local/entrypoint.ts
Adds fallThroughForMissingArtifact logic to re-enter generation phase when auto-fix generated retries reference unreadable workflow artifact files; introduces workflowFileReadable() helper.
Auto-Fix Loop Tests
src/local/auto-fix-loop.test.ts
Replaces runtime-launch with install-deps in existing fixtures; adds parameterized test verifying synthetic IDs are not forwarded as startFromStep; introduces isSyntheticStageId unit test block.
Entrypoint Tests
src/local/entrypoint.test.ts
Adds regression tests: auto-fix synthesized retries re-enter generation when artifact is missing; user-provided artifact paths do not trigger regeneration.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • AgentWorkforce/ricky#49: Both PRs modify the local auto-fix retry logic and how failed steps drive retries and repairs.
  • AgentWorkforce/ricky#69: Both PRs modify src/local/auto-fix-loop.ts to improve the auto-fix retry loop logic.

Poem

🐰 Synthetic steps tried to masquerade,
As real ones in the retry parade,
But now we filter them out with care,
And artifacts re-gen when they're not there—
No "step not found" errors left to spare!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main fix: re-entering generation when a synthesized auto-fix retry's workflow artifact is missing on disk.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the bug, root cause, fix, and testing approach with clear context.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/skip-generation-only-when-workflow-on-disk

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/local/entrypoint.test.ts (1)

2417-2417: ⚡ Quick win

Tighten the assertion to prove a workflow artifact was written.

The current check passes on any write; this regression is stronger if it asserts a generated workflow artifact write specifically.

Proposed test assertion tweak
-      expect(localExecutor.writes.length).toBeGreaterThanOrEqual(1);
+      expect(workflowArtifactWrites(localExecutor.writes)).toHaveLength(1);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/local/entrypoint.test.ts` at line 2417, Replace the loose length check on
localExecutor.writes with an assertion that specifically verifies a workflow
artifact was written: inspect localExecutor.writes for an entry whose
filename/path matches the generated workflow artifact (e.g., contains "workflow"
or the known artifact extension produced by your workflow generator) and assert
that such an entry exists (use localExecutor.writes and an array/object matching
assertion against the entry.path or entry.name to prove the workflow artifact
was created).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@specs/auto-fix-synthetic-step-ids.md`:
- Around line 9-18: The fenced code block showing the ricky run output is
missing a language tag and triggers MD040; update the triple-backtick fence that
wraps the sample log to include a language identifier (e.g., "text" or "bash")
so it becomes ```text (or ```bash) and ensure the block content remains
unchanged so linting passes and formatting is preserved.

---

Nitpick comments:
In `@src/local/entrypoint.test.ts`:
- Line 2417: Replace the loose length check on localExecutor.writes with an
assertion that specifically verifies a workflow artifact was written: inspect
localExecutor.writes for an entry whose filename/path matches the generated
workflow artifact (e.g., contains "workflow" or the known artifact extension
produced by your workflow generator) and assert that such an entry exists (use
localExecutor.writes and an array/object matching assertion against the
entry.path or entry.name to prove the workflow artifact was created).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 126dcff0-1cda-4217-8d47-a01a0740a216

📥 Commits

Reviewing files that changed from the base of the PR and between 36199fc and f1bb839.

📒 Files selected for processing (6)
  • specs/auto-fix-synthetic-step-ids.md
  • src/local/auto-fix-loop.test.ts
  • src/local/auto-fix-loop.ts
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts
  • src/local/synthetic-step-ids.ts

Comment on lines +9 to +18
```
ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff # ×6 more
Execution: blocked — MISSING_BINARY at runtime-launch
Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language to the fenced code block to satisfy markdown linting.

This block should include a language (e.g., text or bash) to address MD040.

Suggested markdown fix
-```
+```text
 ricky --mode local --spec-file docs/runtimes/specs/msd-shared-sandbox-review-runtime.md --run
 …
 Error: startFrom step "local-runtime" not found in workflow. Available steps: prepare-context, …, final-signoff
 …
 Error: startFrom step "runtime-launch" not found in workflow. Available steps: prepare-context, …, final-signoff   # ×6 more
 …
 Execution: blocked — MISSING_BINARY at runtime-launch
 Auto-fix: stopped after 7/7 attempt(s) (MISSING_BINARY)
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 9-9: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@specs/auto-fix-synthetic-step-ids.md` around lines 9 - 18, The fenced code
block showing the ricky run output is missing a language tag and triggers MD040;
update the triple-backtick fence that wraps the sample log to include a language
identifier (e.g., "text" or "bash") so it becomes ```text (or ```bash) and
ensure the block content remains unchanged so linting passes and formatting is
preserved.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-09T17-24-31-983Z-HEAD-provider
Mode: provider
Git SHA: c6f1625

Passed: 5 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deep Integration with Agent Assistant**

Below is a complete, deterministic workflow for adding the “deep Agent Assistant integration” claim to the Ricky repository. The plan follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and includes concrete contracts, verification gates, review artifacts, and a bounded PR scope.

---

## 1. Scope of the Change
| Area | What is changed |
|------|-----------------|
| **README / docs** | Add a “Deep Integration with Agent Assistant” section describing how Ricky delegates reliability checks, coordination, and authoring to the Agent Assistant service. |
| **`ricky/__init__.py`** | Expose a new constant `AGENT_ASSISTANT_INTEGRATION_LEVEL = "deep"` and update the module docstring. |
| **`ricky/config.py`** | Add a config flag `USE_DEEP_AGENT_ASSISTANT = True` with a default that can be overridden via environment variable `RICKY_DEEP_AGENT_ASSISTANT`. |
| **`ricky/engine.py`** | Insert a call to `AgentAssistant.enrich_workflow()` before any workflow is executed. |
| **Tests** | Add unit‑test `test_deep_integration_flag.py` that asserts the flag is tru
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑09)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of agents must declare a contract. | Identify the 6 agents lacking contracts; add a `turn‑context` section (input schema, output schema, version). |
| **Deterministic verification gates** | 8 agents run a pre‑merge gate that validates `turn‑context` JSON against the schema. | Every PR affecting `turn‑context` must pass a gate. | Add the `verify-turn-context` GitHub Action to the remaining 10 repos. |
| **Review artifacts** | 65 % of PRs include a “Turn‑Context Impact” checklist item. | 100 % compliance. | Update the shared PR template (`.github/PULL_REQUEST_TEMPLATE.md`) to make the checklist mandatory. |
| **Validation loops** | Average of 42 validation loops per turn‑context change (target 80‑100). | 80‑100 loops to ensure stability. | Introduce a “stress‑test” stage that runs 50 additional synthetic turn sequences before merge. |
| **Blocker reporting** | 3 blockers log
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow generation** | Agent Assistant already owns the *authoring contract* (input schema → concrete workflow). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents can rely on deterministic contracts. |
| **Validation** | Validation gates (type‑checking, schema compliance, deterministic‑state checks) are the first line of defense before a workflow is persisted. | Early failure detection reduces downstream debugging cycles and keeps the 80‑100 validation loop tight. |
| **Debugging** | Debug utilities (step‑by‑step execution traces, state diffs, deterministic replay) are tightly coupled to the generation output. Housing them together enables the Assistant t
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: July 1 – September 30 2026*  

---

## 1. Slice Objective
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** (≈ 150 users) to author, test, and run reliability‑focused workflows with deterministic verification and built‑in blocker reporting.

Success criteria (KPIs):

| KPI | Target |
|-----|--------|
| **On‑boarded pilot teams** | 10 |
| **Workflows shipped to pilot** | ≥ 30 (average ≥ 3 per team) |
| **Deterministic verification pass rate** | ≥ 95 % on first gate |
| **Mean Time to Blocker Resolution** | ≤ 4 h |
| **User‑reported friction score** | ≤ 2/5 (Likert) |
| **Documentation coverage** | 100 % of new contracts & gates |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Ricky v2.0 core engine (stable)  <br>• New **Workflow Contract** DSL for reliability (Ricky‑Contract‑v1) <br>• Deterministic verification gate (VC‑Determinism) <br>• Blocker‑reporting UI component <br>• Pilot‑team onboarding playbook <br>• Automated 80‑loop validation harness (Ricky‑Loop‑80) | • Major engine
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, production‑ready package specification** that lets Ricky surface workflow‑reliability assistance inside Slack (or any web‑chat widget) while preserving the strict AGENTS.md contract model, deterministic verification gates, and the 80‑to‑100 validation‑loop cadence required for all AgentWorkforce components.

---

### 1. High‑Level Architecture  

| Component | Responsibility | Runtime | Key Interfaces |
|-----------|----------------|---------|----------------|
| **Ricky Core** | Workflow authoring, reliability checks, blocker reporting | Python 3.11 (AWS Lambda) | `ricky.core.*` |
| **Slack Bridge** | Event ingestion, message formatting, interactive UI (modals, buttons) | Node 20 (AWS Fargate) | Slack Events API, Interactivity API |
| **Web Bridge** | Same contract as Slack Bridge but uses WebSocket + REST for a custom web widget | Node 20 (AWS Fargate) | Socket.io, HTTP POST |
| **Agent‑Assistant SDK** | Shared client library for both bridges (type‑safe, OpenAPI‑generated) | TypeScript 5 | `assistClient.*` |
| **Verification Engine** | Deterministic gate runner, test harness, contract validat
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?


1. Ricky’s execution path in local mode

Step Action Outcome
Parse arguments --mode local tells Ricky to operate against the local repository; --spec supplies the natural‑language request. Ricky loads the spec‑to‑workflow compiler.
Spec translation The spec string ("generate a workflow for package checks") is sent to the Spec‑Engine which produces a workflow contract (YAML/JSON) that follows the repository’s workflow/ schema. You receive a draft workflow file (e.g., wf-package-checks.yaml) placed in workflows/drafts/.
No execution Because --run is omitted, Ricky stops after generation. No validation, no CI jobs, no side‑effects on the codebase. The workflow is only written to disk; nothing is applied, merged, or executed.
Exit status Returns 0 on successful generation, non‑zero if the spec could not be parsed or the contract violates schema rules. You can check $? to confirm success.

2. What you ge

...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, copy‑and‑paste ready onboarding flow that a new user should run the first time they invoke the `ricky` command‑line interface. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, honest blocker reporting).

---

## 1️⃣  Create a Scoped Onboarding Branch

```bash
git checkout -b onboarding/first‑run‑<your‑username>

All changes made by the onboarding script are confined to this branch, making review and rollback trivial.


2️⃣ Run the Interactive Bootstrap Script

ricky onboarding --init

What the script does (contract)

Step Action Deterministic verification gate Expected output
1️⃣ Detect OS & architecture ricky env --check returns OK Environment OK
2️⃣ Install required runtimes (Python 3.11+, Node 20) ricky runtime --install exits 0 Runtimes installed
3️⃣ Create a local config directory (~/.ricky) test -d ~/.ricky
...[truncated]

**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
1. **Detect** – As soon as the workflow runner attempts to load the spec (`workflow.yaml` or `spec.json`) and receives a *null* or *file‑not‑found* response, raise a **Blocker**.
2. **Report** – Emit a deterministic `BLOCKER` event with the payload:  

   ```json
   {
     "type": "MISSING_SPEC",
     "detail": "Spec file not found or empty at path: ./specs/<workflow‑id>.yaml",
     "action": "Create placeholder spec and request author input"
   }

This event is logged to the Workflow‑Gate dashboard and sent to the assigned reviewer (owner of the repo branch).


2. Automated Placeholder Generation (Deterministic Contract)

Step Action Output Artifact
2.1 Create a minimal stub in the same location with the required top‑level keys (name, description, steps). specs/<workflow‑id>.yaml
2.2 Populate stub with TODO markers and a unique spec‑id (UUID).
2.3 Add a # GENERATED BY RICKY header and a timestamp.
2.4 Open a draft PR titled `WIP: placeholder sp
...[truncated]

**Must**

- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.

**Must Not**

- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.status-does-not-invent-provider-state</code> (cli-behavior/manual)</summary>

**User message**

> Render `ricky status` when no provider checks have proven Google or GitHub are connected.

**Ricky output**

```text
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Logger : RUNNING

🔧 Configuration
• Active Branches : main, workflow‑dev
• Last Deploy : 2026‑04‑28 14:12 UTC
• Feature Flags : all defaults

⚠️ Alerts & Blockers
• No successful health‑check for Google provider.
• No successful health‑check for GitHub provider.
• Dependent workflows that require either provider are currently blocked.

🗒️ Next Steps (recommended)

  1. Run ricky provider check google and ricky provider check github to trigger connectivity verification.
  2. Review provider credentials (OAuth tokens, API keys) in config/providers.yaml.
  3. If credentials are missing or expired, update them and re‑run the checks.
  4. Once both checks return CONNECTED, re‑enable any blocked workflows.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**GitHub Primitive‑Change Workflow**

| Phase | Step | Action | Owner | Artifact | Verification Gate |
|-------|------|--------|-------|----------|--------------------|
| **1️⃣ Planning** | 1.1 | Open a **Feature‑Branch Ticket** in the project board (e.g., `feat/issue‑#123‑update‑readme`). | Product Owner | Ticket (Jira/Linear) | Ticket must contain: title, description, acceptance criteria, and linked issue. |
| | 1.2 | Create a **branch** from `main` using the ticket ID (`git checkout -b feat/issue-123-update-readme`). | Developer | Branch name follows `feat/<ticket‑id>-<short‑desc>` pattern. | Branch name passes regex validation. |
| **2️⃣ Development** | 2.1 | Make the primitive change (e.g., edit `README.md`). | Developer | Modified file(s). | Lint/format checks (`pre‑commit`) must pass. |
| | 2.2 | Run **local tests** (`npm test` / `go test` etc.). | Developer | Test report. | All tests must pass (0 failures). |
| | 2.3 | Commit with **conventional commit** format (`git commit -m "feat: update README with new badge"`). | Developer | Commit SHA. | Commit message passes `commitlint` rules. |
| **3️⃣ Pull Request (PR) Creation** | 3.1 | Push branch to origin and open a PR targeti
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: `review‑then‑implement`**

| **Stage** | **Agent** | **Purpose** | **Inputs** | **Outputs** | **Verification Gate** |
|-----------|-----------|-------------|------------|-------------|-----------------------|
| 1️⃣ | **Claude** (LLM‑review) | Perform a technical and style review of the design spec. | - `spec.md` (markdown) <br> - `review‑criteria.yml` (see below) | - `review‑report.json` (structured findings) <br> - `approved‑spec.md` (spec with inline annotations) | **Gate‑R1**: All required criteria must be marked *PASS* in `review‑report.json`. If any *FAIL*, workflow halts and reports blocker. |
| 2️⃣ | **Codex** (code‑gen) | Generate implementation code based on the *approved* spec. | - `approved‑spec.md` <br> - `code‑template/` (optional scaffolding) | - `src/` (generated source tree) <br> - `generation‑log.txt` | **Gate‑C1**: Automated lint + unit‑test compilation must succeed (0 errors). If failures, Codex re‑runs up to 3 retries; after that, blocker is raised. |
| 3️⃣ | **CI‑Validate** (deterministic) | Run full test suite and static analysis on generated code. | - `src/` | - `test‑results.xml` <br> - `static‑analysis-report.json` | **Gate‑V1**: ≥ 90 % test cov
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is the **deterministic post‑processing flag** that takes a freshly generated workflow (JSON/YAML) and runs a second, focused LLM pass that:

1. **Tightens the natural‑language task description** for each step (removes ambiguity, adds concrete verbs, references inputs/outputs).  
2. **Adds or sharpens acceptance‑gate predicates** (boolean expressions, schema checks, or test‑script snippets) so every step can be automatically verified before the next step runs.  

Below is a **step‑by‑step contract** you can copy‑paste into your CI pipeline or local dev script.

---

### 1. Contract Overview  

| Field | Type | Required? | Meaning |
|-------|------|-----------|---------|
| `workflow_path` | `string` | ✅ | Path to the generated workflow file (JSON or YAML). |
| `refine_schema` | `object` | ✅ | JSON‑Schema that defines the *desired* shape of each step after refinement (see §2). |
| `output_path` | `string` | ✅ | Where the refined workflow will be written. |
| `max_iterations` | `integer` | ❌ (default = 1) | How many refinement passes to run (useful for 80‑100 % validation loops). |
| `dry_run` | `boolean`
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Feature Implementation & Release**

| **Section** | **Details** |
|-------------|--------------|
| **Goal** | Add a `--version` flag to the `ricky` CLI that prints the current package version (semantic‑version string) and exits with status 0. |
| **Owner** | `@ricky-devlead` |
| **Stakeholders** | QA, Docs, Release Engineering, CI/CD |
| **Branching Model** | <ul><li>`feature/ricky-version` – feature branch created from `main`.</li><li>When ready, open PR `feat: add --version flag` targeting `main`.</li></ul> |
| **Contract (API)** | <ul><li>Command: `ricky --version` (or `ricky -v`).</li><li>Output: `<package-name> vX.Y.Z` (e.g., `ricky v1.3.2`).</li><li>Exit code: `0`.</li><li>No side‑effects (no file writes, no network calls).</li></ul> |
| **Acceptance Criteria** | 1. Running `ricky --version` prints exactly one line with the version string.<br>2. The version matches the value in `pyproject.toml` (or `package.json` for JS).<br>3. Flag is discoverable via `ricky --help` under “Options”.<br>4. Unit tests cover parsing, output, and exit code.<br>5. Integration test runs on CI across supported OSes (Linux, macOS, Windows). |
| **Implementation Steps** 
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation & Auto‑Fix Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, config bundles, code modules) through deterministic validation, automatic remediation, and controlled re‑execution until all pass or a hard‑stop blocker is raised.

---

## 1. High‑Level Diagram  

┌─────────────┐ 1. Discover ┌─────────────┐
│ Artifact │ ─────────────► │ Queue │
│ Source (S3/│ │ (Batch) │
│ Git, …) │ ◄───────────── │ │
└─────┬───────┘ 7. Report └─────┬───────┘
│ │
│ │
▼ ▼
┌─────────────┐ 2. Pull ┌─────────────┐ 4. Validate ┌─────────────┐
│ Worker Pool │ ◀────────── │ Dispatcher │ ─────────────► │ Validator │
│ (N parallel)│ │ (Task Split)│ │ (Determin‑ │
└─────┬───────┘ └─────┬───────┘ │ ised Rules)│
│ │ └─────┬─────┘
│ 5. Fix (if needed) │ │
▼ ▼ ▼
┌─────────────┐ 6. Re‑run ┌──
...[truncated]


**Must**

- Choose `dag`, `supervisor`, or `pipeline` deliberately based on the work shape.
- Explain the pattern choice in artifact metadata or a rationale.
- Use `dag` for validation/fix/rerun loops when dependencies matter.

**Must Not**

- Default blindly to `dag` for every workflow.
- Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-boundary-copy</code> (generation-quality/manual)</summary>

**User message**

> Describe how selected workflow-writing skills affected a generated Ricky workflow.

**Ricky output**

```text
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the skill does | Direct effect on the generated workflow | Example of the resulting artifact |
|-------|---------------------|----------------------------------------|-----------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output schemas (JSON Schema, OpenAPI, or TypedDict) for every step. | • Every node has a **contract block** that the engine validates before execution.<br>• Guarantees type‑safe hand‑offs, preventing silent data‑corruption. | ```yaml\nsteps:\n  - id: fetch_user\n    contract:\n      input: {user_id: string}\n      output: {name: string, email: string}\n``` |
| **Verification Gate Insertion** | Inserts automated checks (unit test, schema validation, policy lint) after each critical step. | • The workflow pauses at a **gate** until the check passes, turning runtime failures into compile‑time feedback.<br>• Enables “fail‑fast” behavior and clear error attribution. | ```yaml\n  - id: validate_user\n    type: gate\n    run: python -m jsonschema -i {{fetch_user.output}} schema/user.json\n``` |
| **Review Artifact G
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Workflow Draft – Request for Clarification**

Below is a minimal “contract‑first” scaffold that follows our AgentWorkforce standards (deterministic verification gates, review artifacts, 80‑100 validation loops, scoped PR boundaries, etc.). Because the product specification you provided is vague, I’ve highlighted the sections that need concrete details before the workflow can be finalized and committed.

| **Section** | **What’s Needed** | **Placeholder / Example** |
|-------------|-------------------|---------------------------|
| **Workflow Name & ID** | A concise, unique name and identifier. | `wf‑<product‑name>-v1` |
| **Purpose / Goal** | One‑sentence description of the business outcome. | “Automate onboarding of new SaaS customers.” |
| **Inputs** | List of required input artifacts (e.g., JSON payload, file, API call). | `customerInfo: JSON` |
| **Outputs** | Expected deliverables after successful run. | `onboardedAccountId: string` |
| **Stages & Tasks** | For each stage, define: <br>• Task name <br>• Owner (human or service) <br>• Deterministic verification gate (e.g., schema validation, unit test, contract check) <br>• Success/Failure criteria <br>• Review artifact (e.g.,
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Action Plan**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣  | **Capture the failure context** – pull the run‑log, step‑metadata, and worker heartbeat for the timed‑out step and the “alive‑but‑idle” worker. | Provides a deterministic snapshot that can be reproduced and audited. | `failure_context.json` (log excerpts, timestamps, worker IDs, resource usage). |
| 2️⃣  | **Mark a hard blocker** in the workflow state machine. Set the run status to **BLOCKED** and attach the `failure_context.json`. | Guarantees downstream steps do not proceed on incomplete data and signals the coordination layer that human/automated intervention is required. | Updated workflow state entry (`status: BLOCKED`). |
| 3️⃣  | **Create a repair branch** off the latest stable commit of the workflow definition (e.g., `repair/timeout‑2023‑05‑09`). | Isolates the fix work, prevents race conditions with other PRs, and keeps the production branch clean. | Git branch `repair/timeout‑2023‑05‑09`. |
| 4️⃣  | **Generate a deterministic verification gate** that will run before any merge of the repair branch. The gate must: <br>• Replay the t
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Pre‑Launch Stale‑State Guard**

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `guard‑stale‑state` |
| **Scope** | Local development runs of any `.agent‑workflow/` (including `ricky run`, `ricky test`, `ricky debug`). |
| **Inputs** | • Working directory (root of the repo) <br>• Optional config `ricky.yaml` key `staleGuard.enabled` (default = true). |
| **Outputs** | • `status: PASS | FAIL` <br>• `stalePaths: []` (list of stale directories detected) <br>• `message: string` (human‑readable report). |
| **Success Criteria** | `status == PASS`. All detected stale directories must be either empty or explicitly ignored via config. |
| **Failure Action** | Abort the launch and emit a **Blocker Report** (see “Verification Gate” below). |

---

### 1. Detection Logic

```python
# pseudo‑code executed as the first step of every local workflow
def detect_stale_state(root: Path) -> Tuple[bool, List[Path]]:
    stale_dirs = [".agent-relay", ".relay", ".trajectories"]
    found = []
    for d in stale_dirs:
        dir_path = root / d
        if dir_path.is_dir():
            # consider stale if it contains any file *other* than .gitkeep
   
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Quick Fix for “Another Ricky/Relay Run Is Already Active”**

The marker you’re seeing is a lock file that Ricky creates to prevent overlapping executions. It means a previous run didn’t clean up properly (crashed, was cancelled, or left a stray process). Follow the deterministic clean‑up steps below; each step is a verified gate that you can run in isolation and confirm before moving to the next.

---

## 1️⃣ Verify the Current State

| Command | Expected Output |
|---------|-----------------|
| `ricky status` | Shows **ACTIVE** if a lock is present, otherwise **IDLE**. |
| `ps aux \| grep -i ricky` | Lists any lingering Ricky/Relay processes. |
| `cat .ricky/run.lock` *(if it exists)* | Shows the PID and timestamp of the lock holder. |

If `ricky status` reports **ACTIVE** **and** there is **no** matching PID in the process list, the lock is stale.

---

## 2️⃣ Safe Lock Removal (Deterministic Gate)

> **Goal:** Remove the stale lock *only* when you’re sure no legitimate run is in progress.

1. **Confirm no active PID**  
   ```bash
   LOCK_PID=$(cat .ricky/run.lock | cut -d' ' -f1)
   if ps -p $LOCK_PID > /dev/null; then
       echo "Process $LOCK_PID still running – aborting l
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Prerequisites (verify once)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` CLI version | `aw --version` | `>=2.4.0` |
| Repository root (contains `.aw/` config) | `git rev-parse --show-toplevel` | Path printed |
| Docker (if workflow uses containers) | `docker version` | Client/Server version printed |
| Environment variables | `echo $AW_WORKSPACE` | Non‑empty path to workspace |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Input contract (what you must provide)

| Variable | Description | Example |
|----------|-------------|---------|
| `FAILED_STEP` | Exact name of the step that failed in the first attempt | `build-image` |
| `PREV_RUN_ID` | Run identifier of the previous (failed) execution | `run-20260508-01` |
| `WORKFLOW_FILE`
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts --tail 200` | Recent error stack, failed validation IDs, and any “blocker” messages. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Full step‑by‑step trace (including input‑output payloads) – keep this run separate from the original “no‑auto‑fix” run. |
| 3️⃣  | Open the workflow contract file `workflows/foo.contract.ts` (or `foo.schema.json` if you use JSON schema). | Verify that **all required inputs** are declared, types match, and any **deterministic verification gates** (e.g., `gate: "hasValidAuthToken"`) are satisfied. |
| 4️⃣  | Run the contract validator directly: <br>`ricky validate contracts/foo.contract.ts --input ./sample-data/foo.input.json` | Should return **PASS** for every gate. If any gate fails, note the gate name and the failing payload. |
| 5️⃣  | Check the **review artifact** for the most recent PR that touched `foo.ts`. <br>`ricky pr view <last‑pr‑id> --artifacts` | Look for “blocked by” comments, lint failures, or missing test coverage. |
| 6️⃣  | Run the unit‑test suite for the workflow: <br>`r
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky – Local Execution of a TypeScript Workflow Artifact**  
*(Primary local path = `./workflows` – the repository‑root default for all workflow source files.)*

---

## 1. Preconditions (contract “local‑exec‑pre‑check”)

| Item | Requirement | How Ricky verifies |
|------|-------------|--------------------|
| **Node runtime** | `node >=18` and `npm >=9` installed | `node -v` / `npm -v`; abort with *BLOCKER* if missing |
| **TypeScript compiler** | `tsc` available (via project dev‑dependency) | `npm ls typescript` → version check |
| **Workflow manifest** | `workflow.json` present in the same directory as the `.ts` file | `fs.existsSync('workflow.json')` |
| **Dependency lock** | `package-lock.json` up‑to‑date | `npm ci` succeeds; otherwise run `npm ci` automatically |
| **Clean build output** | No stale `dist/` artifacts | If `dist/` exists, run `npm run clean` (script must be defined) |
| **Execution entry point** | Exported async function named `run` (or as defined in `workflow.json.entry`) | Static analysis (`ts-node -e "import('./src/foo.ts').then(m=>console.log(typeof m.run))"` ) |

If any check fails, Ricky emits a **BLOCKER** report with a concise remediation guide and h
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for a Persistent Structural Failure**

When a structural failure remains after an initial fix attempt, Ricky must treat the situation as a **hard blocker** and transition to a controlled “boundary” state. The following contract‑driven workflow guarantees deterministic hand‑off, traceability, and rapid recovery.

---

## 1. Immediate Blocker Declaration (Deterministic Gate)

| Step | Action | Artifact | Owner |
|------|--------|----------|-------|
| 1.1 | **Raise a Blocker Issue** in the `#workflow‑issues` repo with the label `blocker‑structural‑failure`. | `ISSUE-<NNN>` – includes: <br>• Failure ID (hash) <br>• Original fix PR reference <br>• Re‑run logs showing unchanged error <br>• Impact assessment (services, SLA) | Ricky (automated) |
| 1.2 | **Add a “Boundary” tag** to the issue to signal that normal repair loops are suspended. | Tag `boundary‑active` | Ricky |
| 1.3 | **Notify** the designated Incident Lead (via Slack `#incidents`) and the Architecture Review Board (ARB). | Slack message with link to ISSUE‑NNN | Ricky |

*Gate condition*: No further automated repair attempts are launched until the blocker is cleared.

---

## 2. Scoped Iso
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Template & Example**

Below is a ready‑to‑use digest format that aggregates the health of multiple Ricky workflow runs. It follows the **Ricky repository conventions** (AGENTS.md), includes **deterministic verification gates**, **review artifacts**, and highlights any **blockers**. Use the template for any set of runs; replace the placeholder values with the actual data from your CI/CD system (GitHub Actions, GitLab CI, Jenkins, etc.).

---

## 1. Digest Header
| Field | Value |
|-------|-------|
| **Report Date** | `YYYY‑MM‑DD HH:MM UTC` |
| **Generated By** | `ricky‑health‑digest` (CLI vX.Y.Z) |
| **Scope** | `workflow‑run‑ids: [1234, 1235, 1236, …]` |
| **Environment** | `prod / staging / dev` |
| **Total Runs Analyzed** | `N` |
| **Overall Status** | `✅ Healthy` / `⚠️ Degraded` / `❌ Critical` |

---

## 2. Summary Statistics
| Metric | Value | Target / SLA |
|--------|-------|--------------|
| **Successful Runs** | `S / N` | ≥ 95 % |
| **Failed Runs** | `F / N` | ≤ 5 % |
| **Runs with Blockers** | `B / N` | 0 |
| **Average Duration** | `⌀ T minutes` | ≤ 30 min |
| **Verification Gate Pass Rate** | `V / N` | 100 % |
| **Review Artifact Coverage**
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**  
*Version 0.1 – Design Draft*  

---

## 1. Overview
The **Workflow Debugger** Slack surface gives agents a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It is built as a **Slack App** that talks to the Ricky Core via the existing **Ricky‑API** (REST + WebSocket) and surfaces a **state‑machine view** of any workflow instance.

| Feature | Description |
|---------|-------------|
| **Live Instance Selector** | `/ricky debug list` → interactive dropdown of active workflow IDs (filtered by user’s org/team). |
| **State‑Tree Viewer** | Rendered as a Slack Block Kit “section + context” hierarchy showing current node, pending actions, and completed steps. |
| **Step‑Through Controls** | Buttons: **Pause**, **Resume**, **Step‑Forward**, **Step‑Back**, **Jump‑to‑Node**. |
| **Inline Patch** | Text‑area modal to edit a node’s input payload; submit triggers a **deterministic verification gate** before applying. |
| **Error‑Overlay** | If a node fails, a red banner appears with the exception stack, logs, and a **“Create Fix PR”** shortcut. |
| **Audit Trail**
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Field | Type | Description |
|-------|------|-------------|
| **type** | `string` (enum) | Fixed value `"web_submit_spec"` – identifies this hand‑off. |
| **payload** | `object` | Container for the data the UI sends to the workflow engine. |
| **payload.spec** | `object` | Full **WorkflowSpec** JSON (see `specs/workflow/v1.json`). |
| **payload.mode** | `string` (enum) | Execution mode – `"dry-run"` | `"run"` | `"preview"` |
| **payload.metadata** *(optional)* | `object` | Arbitrary key‑value pairs for tracing (e.g., `requestId`, `userId`). |

---

## 1. Contract Definition (YAML)

```yaml
# contracts/handoffs/web_submit_spec.yaml
type: object
required:
  - type
  - payload
properties:
  type:
    const: web_submit_spec
  payload:
    type: object
    required:
      - spec
      - mode
    properties:
      spec:
        $ref: "../specs/workflow/v1.json"
      mode:
        type: string
        enum: [dry-run, run, preview]
      metadata:
        type: object
        additionalProperties: true
additionalProperties: false

The contract lives under contracts/handoffs/ and is version‑controlled with the rest of the repo.


2

...[truncated]


**Must**

- Add a handoff variant that normalizes into `LocalInvocationRequest` or `CloudGenerateRequest`.
- Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.

**Must Not**

- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>surfaces-ingress.mcp-claude-context-is-metadata</code> (surfaces-ingress/manual)</summary>

**User message**

> A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

**Ricky output**

```text
**Decision:** Generate a new workflow draft.

**Rationale**

1. **Input Type** – You supplied a *workflow specification* together with *conversation context* but no existing workflow artifact to debug, restart, or analyze. The spec describes the intended steps, inputs, outputs, and verification gates, which is the prerequisite for authoring a fresh workflow.

2. **Action Mapping**  
   - **Generate** – Create a workflow from a specification.  
   - **Debug** – Requires an existing workflow with failing verification.  
   - **Restart** – Requires a halted execution instance.  
   - **Analyze** – Requires execution logs or performance data.  
   - **Coordinate** – Requires multiple dependent workflows or external agents needing orchestration.

3. **Current Need** – The spec is the source material; the appropriate next step is to author a concrete workflow contract that includes:
   - Deterministic verification gates (pre‑, post‑, and invariant checks).  
   - Scoped branch/PR boundaries (feature branch, PR title, reviewers).  
   - A validation loop count (target 80–100 iterations) for automated testing.  
   - Blocker reporting template (if any required resources are missing).

**Ne
...[truncated]

Must

  • Normalize the tool call through the MCP/Claude handoff path.
  • Treat conversation context as advisory metadata.
  • Make the spec itself sufficient for routing whenever possible.

Must Not

  • Require the user to rewrite the spec manually as a workflow.
  • Put domain routing logic inside the MCP tool handler itself.
  • Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`BAD_PROMPT`** (same numeric value).  
- All changes are limited to the **`cloud-ricky`** service repository; no cross‑service impact.

---

## 1. Contract Change (API Specification)

| Endpoint | Method | Success Payload (200) | New Field | Error Mapping |
|----------|--------|-----------------------|-----------|---------------|
| `/v1/generate` | POST | `{ "output": string, "metadata": object, "requestId": string }` | **`requestId`** – UUID v4, globally unique per request. | `BAD_PROMPT` (formerly `INVALID_PROMPT`) – 400, `{ "code":"BAD_PROMPT","message":"Prompt validation failed." }` |

*Version bump*: **v1.2.0** (semantic minor bump – backward‑compatible addition).

The OpenAPI fragment to be merged:

```yaml
paths:
  /v1/generate:
    post:
      summary: Generate content
      operationId: generateContent
      requestBody:
        $ref: '#/components/requestBodies/GenerateRequest'
      responses:
        '200':
          description: Successful generation
          content:
            application/js
...[truncated]

Must

  • Allow additive response fields within /api/v1/ricky/....
  • Reject renaming or c

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant