Skip to content

fix(local): inline Ricky-local personas into dist bundle#107

Merged
kjgbot merged 1 commit into
mainfrom
fix/bundle-personas-into-dist
May 15, 2026
Merged

fix(local): inline Ricky-local personas into dist bundle#107
kjgbot merged 1 commit into
mainfrom
fix/bundle-personas-into-dist

Conversation

@kjgbot

@kjgbot kjgbot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Make npm install -g @agentworkforce/ricky self-contained for the writer step, regardless of how the published tarball is assembled. Persona JSONs under personas/ are now inlined into dist/ricky.js at bundle time via esbuild's JSON loader, with the filesystem retained as a developer-time override.

The bug this fixes

Between v0.1.38 and v0.1.51, package.json's files array did not include "personas". The published tarballs for those versions shipped without the directory, so rickyLocalPersonaDir() walked up from dist/ricky.js without ever hitting the personas/agent-relay-workflow.json sentinel and the resolver fell through to a package fallback that doesn't carry the agent-relay-workflow persona.

The symptom on a v0.1.38 install (caught yesterday by a user trying to author a multi-PR workflow):

$ ricky --mode local --spec-file specs/foo.md --run --workforce-persona --best-judgement --yes --no-auto-fix
Generation: failed (status: error).
Workflow target: workflows/generated/...ts
Artifact written: no
Reason: Workforce persona writer did not complete: failed.

With auto-fix enabled, the user instead saw a "success" — but the artifact was a 3-step dummy stub (prepare-contextruntime-precheck: truefinal-signoff: echo placeholder) because auto-fix's recovery strategy replaces the (never-written) workflow with the smallest thing that passes the builder's structural validator. The real failure was upstream: no persona to drive the writer.

files has since been corrected (v0.1.51+ ships personas/), but the runtime path is still brittle: a single mis-merge in package.json would silently regress global installs again, and the symptom shows up only at writer time, deep in a wrapped WorkforcePersonaWriterError with no surface-level hint that the install layout is the cause.

The fix

  • src/product/generation/bundled-personas.ts — statically imports each personas/*.json so esbuild's JSON loader inlines them into dist/ricky.js. The imports are typed as RickyLocalPersonaSpec so the structural contract is enforced at compile time, not just at runtime.
  • loadAllRickyLocalPersonas(dir) — now seeds the cache from BUNDLED_RICKY_LOCAL_PERSONAS first, then layers any personas/*.json read from dir on top so a developer-edited file overrides the bundled default for the same intent. The bundle is the source of truth at runtime; the filesystem is a developer-time override mechanism.
  • bundled-personas.test.ts — contract test that enforces parity between the personas/ directory and the bundle. Adding a new persona JSON without a matching import will fail CI rather than silently regress on the next release.

Verification

$ npm run typecheck   # clean for these files (pre-existing scheduled-agent.ts errors unrelated)
$ npx vitest run src/product/generation
 Test Files  7 passed (7)
      Tests  129 passed (129)

$ npm run bundle
$ grep -c "agent_relay_workflow_default = {" dist/ricky.js
1
$ grep -c "agent_relay_workflow_review_default = {" dist/ricky.js
1

The persona bodies are inlined at ~line 7816 / 7860 of the post-build bundle.

Test plan

  • Reviewer pulls this branch, runs npm run bundle, and confirms dist/ricky.js contains both persona bodies inline.
  • Reviewer temporarily removes personas/ from the working tree, runs the bundle, and confirms ricky --mode local --spec-file <any-spec>.md --workforce-persona still resolves an agent-relay-workflow persona (the bundled fallback).
  • CI confirms bundled-personas.test.ts passes and would fail if a new personas/*.json were added without updating bundled-personas.ts.

🤖 Generated with Claude Code

Between v0.1.38 and v0.1.51, `package.json`'s `files` array did not
include `"personas"`, so the published tarball shipped without the
directory. Global installs of those versions silently failed at the
writer step because `rickyLocalPersonaDir()` walked up from
`dist/ricky.js` without ever finding the sentinel file and the resolver
fell through to a package fallback that doesn't carry the
`agent-relay-workflow` persona. This was noticed by a user on v0.1.38
trying to author workflows; the symptom was an immediate "Workforce
persona writer did not complete: failed." with no debug artifacts.

This commit makes the install bulletproof against any future drift in
the `files` array (or install layouts that don't match the filesystem
walk) by inlining every `personas/*.json` directly into `dist/ricky.js`
at bundle time via esbuild's JSON loader:

- `bundled-personas.ts` statically imports each persona JSON. The
  imports are typed as `RickyLocalPersonaSpec` so the structural
  contract is enforced at compile time.
- `loadAllRickyLocalPersonas(dir)` now seeds the cache from the bundled
  set first, then layers any `personas/*.json` read from `dir` on top
  so a developer-edited file overrides the bundled default for the
  same intent. This preserves the dev-time override workflow while
  making the runtime path resilient.
- `bundled-personas.test.ts` enforces parity between the personas
  directory and the bundle: adding a new persona JSON without a
  matching import will fail CI rather than silently regress on the
  next release.

Verified bundle: `grep "agent_relay_workflow_default = {" dist/ricky.js`
shows the persona body inlined at byte 0x6e... post-build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Three files implement bundled persona loading: a new module exports personas from JSON, a resolver is updated to seed its cache from the bundle before applying filesystem overrides, and tests validate bundle structure and disk alignment.

Changes

Bundled Personas Feature

Layer / File(s) Summary
Bundled Personas Definition and Validation
src/product/generation/bundled-personas.ts, src/product/generation/bundled-personas.test.ts
New module statically imports two persona JSON files and exports BUNDLED_RICKY_LOCAL_PERSONAS. Test suite validates each persona has required fields (id, intent, tiers with best, best-value, minimum), confirms bundled IDs match personas/*.json on disk, and asserts the bundle is non-empty.
Resolver Cache Seeding from Bundle
src/product/generation/ricky-local-persona-resolver.ts
loadAllRickyLocalPersonas() now imports bundled personas, iterates through validated bundle entries to seed the byIntent cache, then reads filesystem personas/*.json files to apply developer overrides (which overwrite matching bundle entries). Updated documentation reflects bundle-first sourcing and override semantics.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

📦 A bundle born from JSON dreams,
Personas packed in modular beams,
The resolver loads them, then takes a bow,
Overrides dance in—the filesystem takes the vow! 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: inlining Ricky-local personas into the dist bundle to ensure self-contained global installations.
Description check ✅ Passed The description comprehensively explains the bug, the fix, verification steps, and test plan, all directly related to the changeset's objective of bundling personas for self-contained installations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/bundle-personas-into-dist

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/ricky-local-persona-resolver.ts`:
- Around line 123-126: The loop over BUNDLED_RICKY_LOCAL_PERSONAS currently
ignores invalid entries; instead fail fast: inside the for (const spec of
BUNDLED_RICKY_LOCAL_PERSONAS) loop check isRickyLocalPersonaSpec(spec) and if it
returns false throw a descriptive Error (or assert) that includes identifying
info (e.g., the spec or its index) so the code does not silently skip invalid
bundled personas; keep the valid branch using byIntent.set(spec.intent, spec).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 8440f8db-c034-4bc1-8d12-1a43f5b80bca

📥 Commits

Reviewing files that changed from the base of the PR and between b0e3ff6 and 83d9be4.

📒 Files selected for processing (3)
  • src/product/generation/bundled-personas.test.ts
  • src/product/generation/bundled-personas.ts
  • src/product/generation/ricky-local-persona-resolver.ts

Comment on lines +123 to +126
for (const spec of BUNDLED_RICKY_LOCAL_PERSONAS) {
if (isRickyLocalPersonaSpec(spec)) {
byIntent.set(spec.intent, spec);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when a bundled persona is invalid.

Line 124 silently skips invalid bundled specs. Since the bundle is the runtime source of truth, this can reintroduce missing-persona failures without a hard signal.

Proposed fix
-  for (const spec of BUNDLED_RICKY_LOCAL_PERSONAS) {
-    if (isRickyLocalPersonaSpec(spec)) {
-      byIntent.set(spec.intent, spec);
-    }
-  }
+  for (const [index, spec] of BUNDLED_RICKY_LOCAL_PERSONAS.entries()) {
+    if (!isRickyLocalPersonaSpec(spec)) {
+      throw new Error(`Invalid bundled Ricky-local persona at index ${index}`);
+    }
+    byIntent.set(spec.intent, spec);
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/ricky-local-persona-resolver.ts` around lines 123 -
126, The loop over BUNDLED_RICKY_LOCAL_PERSONAS currently ignores invalid
entries; instead fail fast: inside the for (const spec of
BUNDLED_RICKY_LOCAL_PERSONAS) loop check isRickyLocalPersonaSpec(spec) and if it
returns false throw a descriptive Error (or assert) that includes identifying
info (e.g., the spec or its index) so the code does not silently skip invalid
bundled personas; keep the valid branch using byIntent.set(spec.intent, spec).

@github-actions

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-15T12-36-46-836Z-HEAD-provider
Mode: provider
Git SHA: 1fe6372

Passed: 11 | Needs human: 43 | Reviewable: 43 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope of the change**  
- Extend the **Ricky** repository to surface that it now *leverages the Agent Assistant (AA) platform for deeper orchestration, validation, and self‑healing*.
- Update the public documentation, the `README.md`, the `AGENTS.md` entry for Ricky, and the internal `ricky/config.py` constant that advertises its dependencies.
- Add a deterministic verification gate that checks the AA client version at startup and fails fast if the required minimum version is not met.
- Provide a minimal test suite (2 × unit tests) that validates the new gate and the updated doc strings.

**Branch / PR plan**  

| Branch | Purpose |
|--------|---------|
| `feature/aa‑integration‑v2` | Author the changes listed below. |
| `test/aa‑gate‑validation` | Add unit tests for the new verification gate. |

When ready, open **PR #42** (title: *“Ricky now uses Agent Assistant more deeply – docs, config, and startup gate”*) targeting `main`. The PR will contain the following commits, each with a clear atomic purpose:

1. **docs‑update‑readme** – add AA integration paragraph and badge.  
2. **docs‑update‑agents‑md** – revise Ricky entry in `AGENTS.md`.  
3. **code‑add‑aa‑gate** – introduce `en
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*Prepared 2026‑05‑15 – internal audit snapshot (last 30 days)*  

| Metric | Current State | Target (Ricky Spec) | Gap / Action |
|--------|---------------|---------------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents expose a `turn‑context` contract (67 %). | 100 % of agents that maintain state across turns must declare the contract. | Identify the 6 agents lacking the contract; add `@agent-assistant/turn-context` to their `workflow.yml`. |
| **Deterministic verification gates** | 8 agents have automated gate tests that assert `turn‑context` payload shape; 4 rely on manual QA. | 100 % gate coverage with schema validation (`jsonschema` v2020‑12). | Implement missing gate tests; see “Gate‑Implementation Checklist” below. |
| **Review artifacts** | PRs for 14 agents include `turn‑context` design notes; 4 PRs omitted them. | Every PR that touches stateful logic must attach a **Turn‑Context Impact Matrix**. | Add the matrix template to the PR checklist (see “PR Checklist Addendum”). |
| **Validation loops** | Average of 42 validation loops per agent (target 80‑100). | 80‑100 loops to ensure edge
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation: Move all four capabilities into the Agent Assistant**

| Capability | Why it belongs in the Agent Assistant | Impact on reliability & coordination |
|------------|----------------------------------------|--------------------------------------|
| **Workflow generation** | The Assistant already owns the *authoring contract* (template → concrete workflow). Centralising generation avoids duplicated template logic across CLI tools. | Guarantees a single source‑of‑truth for workflow shape; downstream validators can rely on a deterministic output format. |
| **Workflow validation** | Validation is the deterministic gate that the Assistant enforces before a workflow is persisted or executed. Embedding it here lets the Assistant return structured pass/fail results and detailed evidence. | Enables the “80‑to‑100 validation loops” pattern: the Assistant can automatically re‑run validation after each edit until the gate passes. |
| **Debugging (interactive “what‑if” runs)** | Debugging requires the same runtime context (environment, secrets, sandbox) that the Assistant already provisions for validation. Keeping it inside the Assistant lets it reuse the same sandbox and produc
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Core Customer‑Success (CS) teams, 3 months, 2 pilot squads*  

---

## 1. Slice Objective
Deliver a deterministic, review‑ready version of the **Ricky** AgentWorkforce assistant that can be adopted by CS teams to:

1. **Automate workflow authoring** (create, version, and merge AGENTS.md‑compliant pipelines).  
2. **Enforce reliability gates** (contract validation, 80‑100 % loop coverage, blocker reporting).  
3. **Provide a self‑service “adopt‑Ricky” UI** (branch/PR wizard, verification dashboard).

Success is measured by **≥ 85 % of pilot tickets closed without manual reviewer intervention** and **zero production‑blocking regressions** during the slice.

---

## 2. Scope & Boundaries

| Area | In‑Scope | Out‑of‑Scope |
|------|----------|--------------|
| **Workflow authoring** | AGENTS.md‑compliant YAML pipelines, deterministic contract generation, PR‑template scaffolding. | Non‑YAML legacy scripts. |
| **Verification gates** | Contract‑based unit tests, 80‑100 % loop validation, deterministic “gate‑pass” badge. | Performance benchmarking beyond CI latency. |
| **Review artifacts** | Auto‑generated review check
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Layer (Slack / Web)

Below is a **concrete, contract‑driven design** for adding a “Ricky Support Assistant” that can be invoked from Slack or a web portal.  The design follows the **AgentWorkforce workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops, blocker reporting) and is scoped so that each change lives in its own branch/PR.

---

### 1. High‑Level Architecture

| Component | Responsibility | Tech Stack | Interaction |
|-----------|----------------|------------|-------------|
| **Ricky‑Core** | Existing reliability, coordination, authoring engine (unchanged). | Python 3.11, FastAPI, PostgreSQL | Exposes internal RPC (`/ricky/*`). |
| **Ricky‑API** | Thin façade that translates Slack/Web requests into Ricky‑Core RPC calls. | FastAPI, Pydantic, OpenAPI 3.1 | Auth → Validation → Core RPC → Response. |
| **Slack‑Adapter** | Slack‑App (Bolt) that forwards slash‑commands / message actions to Ricky‑API. | Node 20, Bolt‑JS, OAuth2 | Event → HTTPS → Ricky‑API. |
| **Web‑Portal** | React SPA for agents to chat with Ricky, view logs, and submit “blocker” tickets. | React 18, Vite, TypeScript, TailwindCSS | UI → REST/WS → Ricky‑API
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?


1. Ricky’s “dry‑run” mode

When --run is omitted, Ricky operates in spec‑only (dry‑run) mode:

Step Action Output
Parse The CLI parses the --spec string and resolves any embedded placeholders (e.g., ${repo}) against the local context. A Workflow Specification Object (JSON/YAML) is built.
Validate The spec is run through the Deterministic Verification Gate:
• Schema validation (AGENTS schema v2)
• Contract compliance (inputs, outputs, required resources)
• Linting of step names, IDs, and dependencies.
If any rule fails, Ricky prints a blocker report and exits with a non‑zero status.
Render The validated spec is rendered into the canonical workflow definition (e.g., workflow.yaml). The file is written to ./ricky/workflows/ and its path is echoed to stdout.
Preview A concise review artifact is displayed:
• Workflow name, description, and version
• List of stages, their inputs/outputs, and any requires edges <
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding Workflow**  
*(Designed for a brand‑new user on macOS / Linux / Windows PowerShell)*  

---

## 1️⃣ Prerequisites Check (Verification Gate)

| Step | Command | Expected Output | Pass/Fail |
|------|---------|----------------|-----------|
| 1.1  | `python3 --version` | `Python 3.9+` | ✅ |
| 1.2  | `git --version` | `git version X.Y.Z` | ✅ |
| 1.3  | `node -v` (optional, for UI extensions) | `v14+` | ⚠️ *skip if not needed* |
| 1.4  | `pip install --upgrade pip` | `Successfully installed pip‑X.Y.Z` | ✅ |

If any gate fails, abort the onboarding and display the error with a **Blocker Report** (see § 7).

---

## 2️⃣ Install Ricky CLI (Deterministic Step)

```bash
# Create an isolated virtual environment
python3 -m venv ~/.ricky_env
source ~/.ricky_env/bin/activate   # macOS/Linux
# .\ricky_env\Scripts\Activate.ps1  # Windows PowerShell

# Install the CLI from the official PyPI release
pip install ricky-workflow==1.0.0

Verification: ricky --versionricky-workflow 1.0.0.


3️⃣ Initial Configuration (Contract: ricky init)

ricky init \
  --org "Acme Corp" \
  --team "DataOps" \
  --default-branch "main" \
  --repo-url "git@
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Blocker Report – Missing or Empty Specification**

| **Item** | **Status** |
|----------|------------|
| **Spec file** | ❌ Not found / empty |
| **Impact** | Cannot generate a deterministic workflow contract, verification gates, or review artifacts. |
| **Required Action** | Provide a complete, non‑empty specification (YAML/JSON/Markdown) that defines the workflow’s inputs, outputs, steps, and success criteria. |

---

## Immediate Next Steps for the Requester

1. **Locate or Create the Spec**  
   - If the spec exists elsewhere, copy it into the repository at the expected path (e.g., `specs/<workflow‑name>.yaml`).  
   - If you need to start from scratch, use the minimal template below and fill in the domain‑specific details.

2. **Submit the Spec**  
   - Commit the file on a dedicated branch (`feature/<workflow‑name>-spec`).  
   - Open a PR targeting `main` (or the appropriate release branch) with the title `Add spec for <workflow‑name>`.

3. **Notify Ricky**  
   - Once the PR is open, ping `@ricky-agent` (or the designated workflow‑authoring bot) to trigger the next phase of contract generation and validation.

---

## Minimal Spec Template (YAML)

```yaml
# specs/<workflow
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT CONNECTED (no successful health check)
• GitHub : NOT CONNECTED (no successful health check)

✅ Core Services
• Scheduler : RUNNING
• Dispatcher : RUNNING
• Storage : RUNNING

⚙️ Configuration
• Active workflow contracts : 3
• Pending PRs : 0
• Validation loops (last run): 0/80‑100

🔍 Last Verification Gates
• Google OAuth token check : FAILED / NOT VERIFIED
• GitHub API token check : FAILED / NOT VERIFIED

🚨 Blockers
• Google provider not authenticated – cannot fetch external data or trigger Google‑based actions.
• GitHub provider not authenticated – cannot read/write repositories, issue events, or webhook triggers.

🛠️ Recommended Actions

  1. Verify credentials for the Google provider (OAuth client ID/secret, refresh token) and run ricky provider connect google.
  2. Verify credentials for the GitHub provider (personal access token or GitHub App) and run ricky provider connect github.
  3. Re‑run ricky healthcheck aft
    ...[truncated]

**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a feature branch** from `main` using the naming convention `feat/<ticket‑id>-<short‑desc>`. | Ticket ID, short description | New branch `feat/1234‑update‑readme` | `git branch` exists and follows regex `^feat\/\d{4,}-[a-z0-9-]+$` |
| 2️⃣ | **Ricky (Verification)** | **Sync with upstream** – pull latest `main` into the feature branch. | `main` HEAD | Feature branch up‑to‑date | `git merge-base --is-ancestor main feat/...` must be true |
| 3️⃣ | **Developer (Implementation)** | **Make the primitive change** (e.g., edit `README.md`). | Local repo, change spec | Modified files staged | `git diff --cached` shows only allowed paths (`README.md`, `docs/**`) |
| 4️⃣ | **Ricky (Automation)** | **Run lint & unit tests** via GitHub Actions workflow `ci.yml`. | Push to feature branch | CI status badge | All jobs return `success`; any failure blocks PR creation |
| 5️⃣ | **Developer** | **Commit** with conventional commit message `feat: update README with new badge`. | Staged changes | 
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow Title:** `review‑and‑implement‑claude‑codex`

**Purpose**  
Automate a two‑stage development cycle where Claude (LLM‑reviewer) validates design/spec changes and Codex (LLM‑coder) generates the implementation. The workflow is deterministic, auditable, and bounded to a single feature branch.

---

## 1. Contract (YAML)

```yaml
name: review-and-implement-claude-codex
description: |
  Takes a feature spec, runs Claude for a textual review, then passes the
  reviewed spec to Codex for code generation. All artefacts are persisted
  and verified before merge.
trigger:
  - pull_request.opened
  - workflow_dispatch
inputs:
  spec_path:
    type: string
    description: Relative path to the markdown/spec file
    required: true
  target_branch:
    type: string
    description: Branch where the generated code will be committed
    default: feature/claude-codex
outputs:
  review_report:
    description: Path to Claude’s review markdown
  implementation_pr:
    description: URL of the PR created by Codex
environment:
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  CLAUDE_API_KEY: ${{ secrets.CLAUDE_API_KEY }}
  CODEX_API_KEY: ${{ secrets.CODEX_API_KEY }}

2. Stages & Dete

...[truncated]


**Must**

- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write `tool-selection.json` or equivalent audit metadata.

**Must Not**

- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.refine-is-opt-in-and-bounded</code> (generation-quality/manual)</summary>

**User message**

> Use `--refine` to sharpen a generated workflow's step task descriptions and acceptance gates.

**Ricky output**

```text
### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.

Below is a concrete contract, the verification gate sequence, and a minimal example of how to invoke the flag.

---

## 1. Contract Overview  

| Element | Requirement | Rationale |
|---------|-------------|-----------|
| **Input** | Valid workflow file (`.json` or `.yaml`) that passes the *Schema Validation Gate* (`workflow-schema-v1`). | Guarantees structural integrity before refinement. |
| **Output** | Same topology (steps, dependencies, IDs) with updated `task.description` and `task.acceptanceGate`. | No breaking changes to downstream pipelines. |
| **Idempotence** | Running `--refine` twice on the same file must produce identical output. | Guarantees deterministic behavior. |
| **Traceability** | Each refined field must include a comment block (`# refined‑by: refine‑vX.Y`) in YAML or a `// refined‑
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*Repository:* `github.com/agentworkforce/ricky`  
*Target Branch:* `main` → `feat/version-cli` (feature branch)  

---

## 1. Objective
Define, implement, test, and release the `ricky --version` command that prints the current Ricky CLI version in **semantic‑version** format (e.g., `ricky 2.4.1`). The workflow must be reproducible, gate‑checked, and auditable.

---

## 2. Contract (Inputs / Outputs)

| Item | Description |
|------|-------------|
| **Input** | Source code base (Go module `github.com/agentworkforce/ricky`). |
| **Output** | - New CLI flag `--version` that prints `ricky <semver>` to `stdout` and exits with code 0.<br>- Updated `README.md` with usage example.<br>- Unit test `TestVersionFlag` covering success path and edge cases (e.g., malformed version string).<br>- CI artifacts: compiled binary, test report, and `version.txt` artifact containing the exact version string. |
| **Version Source** | Single source of truth: `internal/version/version.go` exposing `const Version = "X.Y.Z"` generated by `make version`. |
| **Failure Modes** | - Missing or mismatched version constant.<br>- Flag registration conflict.<b
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates the final state of every artifact.

---

## 1. Contract (YAML)

```yaml
name: batch_artifact_validation
description: >
  Validate, auto‑fix, and rerun a collection of independent artifacts.
  Terminates when every artifact is either PASS or BLOCKED.
scope:
  branch: batch-validation
  pr_title: "Batch validation results – {run_id}"
  pr_labels: [batch‑validation, auto‑generated]
inputs:
  artifacts_dir: "./artifacts"          # root folder containing sub‑folders per artifact
  validator: "./tools/validate.py"      # must exit 0 on success, 1 on failure, 2 on non‑recoverable error
  fixer: "./tools/fix.py"               # idempotent script; receives artifact path, returns 0 if fix applied
  max_iterations: 100                   # safety cap for the validation/f
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a concise “skill‑impact map” that shows the concrete ways each of the core Ricky writing skills is reflected in a generated workflow. The example workflow is a **“Customer‑Onboarding Ticket Triage”** pipeline that moves a new support request from intake to the appropriate specialist queue.

| **Skill** | **What the Skill Enforces** | **Resulting Artifact in the Workflow** | **Why It Matters** |
|----------|----------------------------|----------------------------------------|--------------------|
| **Deterministic Contract Definition** | Every step declares its **input schema**, **output schema**, and **exit codes** in a `contract.yaml`. | ```yaml\nstep: intake\ninput: { ticket_id: string, payload: object }\noutput: { enriched_ticket: object }\nexit_codes: [SUCCESS, VALIDATION_ERROR]\n``` | Guarantees downstream steps can validate the payload without runtime surprises. |
| **Verification Gates** | After each step a **gate** runs a schema‑validation test and a business‑rule check. | `gate: validate_enrichment` runs `jsonschema` against `enriched_ticket` and asserts `priority ∈ {P1,P2,P3,P4}`. | Sto
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

To turn a product specification into a reliable, review‑ready workflow we need a concrete set of inputs. With a vague spec the generated workflow would be incomplete and could introduce blockers later in the pipeline.

**Please provide the following details (you can answer as many as you have now; we’ll fill in the gaps together):**

| # | Required Information | Why It Matters |
|---|----------------------|----------------|
| 1 | **High‑level product goal** (e.g., “automate onboarding of new SaaS customers”) | Drives the overall workflow purpose and success criteria. |
| 2 | **Key functional steps** (ordered list of major actions, e.g., “collect user data → verify email → provision account”) | Forms the backbone of the workflow DAG. |
| 3 | **Inputs & outputs for each step** (data schema, file types, API contracts) | Enables deterministic verification gates and contract testing. |
| 4 | **Stakeholder owners** (team or individual responsible for each step) | Sets review‑artifact ownership and escalation paths. |
| 5 | **Success & failure conditions** (e.g., “email verification returns 200 → success; 4xx → retry; 5xx → abort”) | Defines vali
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s next actions after a step‑timeout + orphaned worker**

| Phase | Action | Why | Deliverable / Artifact |
|-------|--------|-----|------------------------|
| 1️⃣ Detect & Log | • Pull the execution log from the orchestration engine (e.g., Temporal, Airflow). <br>• Record the exact timestamp, step ID, and timeout value that was breached. <br>• Capture the heartbeat / liveness record of the worker that never emitted an artifact. | Provides an immutable audit trail and the data needed for deterministic verification. | `logs/timeout‑<run‑id>.json` and `logs/orphan‑worker‑<run‑id>.json` |
| 2️⃣ Declare a Blocker | • Open a **Blocker Issue** in the workflow repo (e.g., `BLOCKER‑<run‑id>`). <br>• Tag the issue with `type:timeout`, `type:orphan‑worker`, and `severity:high`. <br>• Link the two log artifacts. | Guarantees visibility to all stakeholders and prevents downstream pipelines from proceeding. | GitHub Issue `BLOCKER‑<run‑id>` |
| 3️⃣ Isolate the Faulty Run | • Create a **scoped branch** off `main` named `repair/<run‑id>`. <br>• Add a **run‑manifest** (`manifest.yaml`) that pins the exact input snapshot, code SHA, and configuration used for the failed run. | Guarantees repr
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Ricky – Stale‑State Detection for Local Workflow Launches**  
*(Implementation contract, verification gates, and remediation flow)*  

---

## 1. Scope & Preconditions
| Item | Description |
|------|-------------|
| **Target directories** | `.agent‑relay/`, `.relay/`, `.trajectories/` located at the repository root (or any configured workspace root). |
| **Stale definition** | A directory is *stale* when **any** of the following is true: <br>1. Contains files older than the last successful workflow run timestamp (recorded in `.ricky/last‑run.json`). <br>2. Contains a lock file (`*.lock`) whose PID does not correspond to a live process. <br>3. Contains a marker file `STALE` placed by a previous failed run. |
| **Invocation point** | This check runs **immediately before** `ricky run <workflow‑name>` resolves the workflow graph. |
| **Assumptions** | - The host OS provides `stat` and `ps` utilities (or Node.js `fs`/`process` equivalents). <br>- The repository follows the standard Ricky layout (`.ricky/`, `workflows/`, etc.). |

---

## 2. Contract – `detectStaleState()` API

```ts
/**
 * Detects stale state in the three runtime directories.
 *
 * @param rootPath   Absolute path to t
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The marker you’re seeing is a lock file that Ricky creates at the start of a run to prevent concurrent executions. When a previous run crashes, is killed, or never cleans up, the lock stays behind and blocks new runs.

Below is the deterministic, step‑by‑step repair workflow you can run locally. Each step is a **verification gate** – if the check fails, stop and report the blocker before proceeding to the next gate.

---

## 1️⃣ Verify the Workspace State

| Gate | Command | Expected Result |
|------|---------|-----------------|
| **1.1** Check for active processes | `ps aux \| grep -i "ricky\|relay"` | No `ricky` or `relay` PID should be listed (aside from the `grep` line). |
| **1.2** Confirm you’re on the correct branch | `git rev-parse --abbrev-ref HEAD` | Should be the branch you intend to work on (e.g., `feature/xyz`). |
| **1.3** Ensure the repo is clean | `git status --porcelain` | Output must be empty (no uncommitted changes). |

*If any gate fails, resolve the blocker (kill stray process, checkout correct branch, commit/stash changes) before moving on.*

---

## 2️⃣ Locate & Inspect the Lock File

R
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix follows the **Ricky workflow contract**:

1. **Inject the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file (or the master‑rendered wrapper) so that `.env.local` is read first, then `.env` as a fallback.  
2. **Add an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Wrap any embedded HEREDOC command strings** (the `node --input-type=module` blocks) with the same guard, because those strings are evaluated in a separate Node process and need the env to be present there as well.  

---

## 1. Loader & Guard implementation (single source of truth)

Create (or update) `ricky-workflow-env.ts` in the repository root (or a shared `utils/` folder) and export the two helpers:

```ts
// ricky-workflow-env.ts
import { config as loadDotenv } from 'dotenv';
import { resolve } from 'node:path';
import { exit } from 'node:process';

/**
 * Load .env.local first, th
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a concrete, reproducible contract for running the workflow locally with **auto‑fix** enabled, handling the first‑attempt failure, applying the repair, and re‑executing the failed step. All steps are scoped to a single branch (`auto‑fix/run‑<run‑id>`) and a single PR, so you can review the changes in isolation.

---

## 1️⃣ Preconditions (Inputs)

| Name | Type | Description | Source |
|------|------|-------------|--------|
| `workflow.yaml` | file | The workflow definition to execute. | Repo root |
| `run_id` | string | Identifier of the previous (failed) run. | Provided by CI (e.g., `run-20240512-01`) |
| `failed_step` | string | Name of the step that failed in the previous run. | Provided by CI logs (e.g., `build:docker`) |
| `auto_fix=true` | flag | Enables the auto‑fix pipeline. | CLI flag |
| `artifact_path` | path | Directory where the previous run’s artifacts (logs, intermediate files) are stored. | `artifacts/<run_id>/` |

*All inputs must be present before proceeding. If any are missing, abort and raise a **Blocker** with a clear message.*

---

## 2️⃣ Branch & PR Setup (Deterministic Scope)

```bash
git checkout -b aut
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Immediate troubleshooting checklist**

| Step | Action | Expected outcome |
|------|--------|-------------------|
| 1️⃣ | **Open the run log** – `ricky logs workflows/foo.ts --last` | A full JSON‑line log ending with a `status: "failed"` entry. |
| 2️⃣ | **Locate the failure gate** – search the log for `gate:` entries. The first gate that reports `result: "error"` is the deterministic verification point that stopped the workflow. |
| 3️⃣ | **Read the error payload** – the gate will include a `detail` field (stack trace, contract mismatch, missing artifact, etc.). Copy that snippet. |
| 4️⃣ | **Validate the contract** – run the contract validator manually: <br>`ricky validate contracts/foo.contract.ts --target workflows/foo.ts` <br>It will return `PASS` or a list of mismatched inputs/outputs. |
| 5️⃣ | **Check for missing scoped artifacts** – run: <br>`ricky artifacts list --workflow workflows/foo.ts` <br>Confirm that every artifact listed in the workflow’s `requires:` and `produces:` sections exists in the current branch. |
| 6️⃣ | **Run the failing gate in isolation** – most gates expose a CLI entry point. Example: <br>`ricky gate run lint --file src/foo.ts` <br>or <br>`ricky ga
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky – Executing a Local TypeScript Workflow Artifact**

Below is the concrete, reproducible contract Ricky follows when a developer asks to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The steps are written as a deterministic “execution gate” that can be copied into a CI job, a local script, or a README snippet.

---

## 1️⃣ Preconditions (Workflow Contract)

| Item | Requirement | Verification |
|------|-------------|--------------|
| **Repo root** | The command must be run from the repository root (contains `package.json`). | `git rev-parse --show-toplevel` must equal `pwd`. |
| **Node version** | Node ≥ 18 (LTS) and npm ≥ 9. | `node -v` → `v18.x` or higher; `npm -v` → `9.x`+. |
| **TypeScript compiler** | `typescript` and `ts-node` are declared in `devDependencies`. | `npm ls typescript ts-node` returns a non‑error tree. |
| **Workflow entry** | The workflow file is a single `.ts` module that exports a default async function `run()` (or a named `run`). | `grep -E "export\s+(default\s+)?async\s+function\s+run"` <path>.ts. |
| **Primary local path** | By convention `./workflows`. This can be overridden by the e
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Escalation Workflow for Persistent Structural Failures**

When a structural failure remains after an initial fix attempt, Ricky follows a deterministic, contract‑driven escalation path. The workflow is bounded, auditable, and designed to surface blockers early while preserving branch and PR hygiene.

---

## 1. Preconditions (Contract)

| Item | Requirement |
|------|-------------|
| **Failure Ticket** | Exists in the issue tracker (e.g., `WF‑1234`) with a *Structural‑Failure* label. |
| **Initial Fix Branch** | `fix/structural‑<short‑id>` opened from `main` and merged into a *validation* branch (`validate/<short‑id>`). |
| **Verification Gate** | `structural‑integrity-test` must have run at least once and reported **FAIL**. |
| **Owner** | Assigned engineer (or AI‑agent) with write access to the repo. |

If any of the above contracts are missing, Ricky aborts and raises a **Blocker Report** (see §7).

---

## 2. Deterministic Verification Loop (80‑100 iterations)

1. **Re‑run the Structural Integrity Test Suite**  
   - Command: `./ci/run‑integrity.sh --branch validate/<short‑id>`  
   - Capture deterministic hash of test output (`sha256(test‑log)`).

2. **Compa
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest**  
*Aggregated from  ≈ 2 500 + workflow runs (last 30 days)*  

| Metric | Definition | Current Value | Target / SLA | Status |
|--------|------------|---------------|--------------|--------|
| **Run Success Rate** | % of runs that completed **Succeeded** (no failures, no time‑outs) | **96.3 %** | ≥ 95 % | ✅ |
| **Mean Time To Recovery (MTTR)** | Avg. elapsed time from first failure to successful rerun (including manual interventions) | **1 h 12 m** | ≤ 2 h | ✅ |
| **Critical Failure Rate** | % of runs that hit a **Blocker** gate (e.g., missing contract, schema mismatch) | **1.8 %** | ≤ 2 % | ✅ |
| **Verification Gate Pass Rate** | % of deterministic verification steps that passed on first attempt | **92.7 %** | ≥ 90 % | ✅ |
| **Review Cycle Length** | Avg. number of review loops per PR (including author, reviewer, and gate feedback) | **3.4** loops | ≤ 4 loops | ✅ |
| **Branch‑Scope Violations** | # of PRs that touched files outside their declared scope | **7** | 0 | ⚠️ |
| **Deterministic Test Flakiness** | % of verification steps that flaked (different outcome on re‑run) | **0.9 %** | ≤ 1 % | ✅ |
| **Manual Intervention Count** | # of runs requiri
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**

---

## 1. Overview
A dedicated Slack surface that lets engineers **inspect, step‑through, and mutate** Ricky‑managed workflows in real time. It surfaces deterministic verification gates, validation‑loop status, and blocker reports directly in a thread‑friendly UI, while preserving the existing “Ricky Bot” command namespace.

---

## 2. Public Contract (Slack API)

| Endpoint | Method | Payload | Response | Auth |
|----------|--------|---------|----------|------|
| `/ricky/debug/start` | `POST` | `{ workflow_id: string, branch?: string, run_id?: string }` | `{ run_id, status: "started", start_ts }` | Workspace‑app token |
| `/ricky/debug/step` | `POST` | `{ run_id: string, action: "next" \| "prev" \| "jump", target?: string }` | `{ run_id, current_node, state_snapshot, validation: { passed, failed, pending } }` | Same |
| `/ricky/debug/patch` | `POST` | `{ run_id, node_id, patch: Partial<NodeSpec> }` | `{ run_id, node_id, patch_applied: true }` | Same |
| `/ricky/debug/stop` | `POST` | `{ run_id }` | `{ run_id, status: "stopped", end_ts }` | Same |
| `/ricky/debug/status` | `GET` | `?run_id=` | `{ run_id, workflow_id, branch, current_n
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Hand‑off Type:** `web_submit_spec`

| Element | Definition |
|---------|------------|
| **Name** | `web_submit_spec` |
| **Purpose** | Accept a JSON‑encoded **workflow specification** together with an execution **mode** (`preview` | `run`) from a web client and enqueue the spec for the AgentWorkforce engine. |
| **Input Payload** | ```json { "spec": { …workflow definition… }, "mode": "preview|run" } ``` |
| **Response** | ```json { "status":"queued", "run_id":"<uuid>", "message":"Spec accepted – processing in <mode> mode." } ``` |
| **Error Codes** | *400* – malformed JSON / missing fields  <br>*422* – spec validation failure (see Validation Gate) <br>*500* – internal enqueue error |
| **Auth** | Must be called with a valid `Bearer <JWT>` containing the `workflow:submit` scope. |
| **Rate‑limit** | 30 calls / minute / client‑id (configurable via `ratelimit.yaml`). |

---

## 1. Contract & Deterministic Verification Gates  

| Gate | Trigger | Check | Outcome |
|------|---------|-------|---------|
| **G‑01 JSON Schema** | Immediately on receipt | Validate payload against `schemas/web_submit_spec.json`. | Pass → next gate; Fail → 400. |
| **G‑02 Auth Scope** | After G‑01 | Veri
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordin


Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant