Classify local broker startup stalls by miyaontherelay · Pull Request #131 · AgentWorkforce/ricky

miyaontherelay · 2026-05-21T20:22:23Z

Summary

Add a RUNTIME_HANDOFF_STALLED local blocker for Agent Relay broker startup/handoff timeout signals.
Route that blocker through external-setup handling instead of auto-repairing or reporting unsupported runtime.
Teach runtime diagnostics to classify broker startup timeouts as runtime handoff stalls.
Honor single-workflow/no-child-workflow specs before master routing.
Honor listed-only validation constraints by using executable acceptance commands instead of generic root typecheck/vitest gates, including git/bash/actionlint/ruby static gates.

Validation

npx vitest run src/product/generation/pipeline.test.ts test/generated-workflow-reliability-contract.test.ts src/local/entrypoint.test.ts src/runtime/diagnostics/failure-diagnosis.test.ts
npm run typecheck
git diff --check

Notes

This addresses the observed cloud Ricky run failure modes: broker startup timeout misclassification, master/child over-generation despite static single-workflow constraints, and generic root validation gates in monorepo/workflow-only specs.

coderabbitai · 2026-05-21T20:22:36Z

📝 Walkthrough

Walkthrough

This PR updates git command documentation in the pipeline generation logic. A test assertion for the git-diff-gate command is updated to expect the correct format, and clarifying inline comments are added to two heredoc templates documenting the git commands used in evidence generation.

Changes

Git Evidence Commands Documentation

Layer / File(s)	Summary
Git command documentation and test assertion alignment `src/product/generation/pipeline.test.ts`, `src/product/generation/template-renderer.ts`	Test assertion for `git-diff-gate` command updated to match the expected `git ls-files --others --exclude-standard` format. Inline `// Evidence commands: ...` comments added to `buildTargetGitDiffCommand` and `buildManifestGitDiffCommand` heredocs to clarify the git commands used to compute evidence.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

A rabbit hops through git commands so clear,
With comments that whisper what tools are here,
Evidence gathered with ls-files bright,
Test assertions aligned just right! 🐰✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The PR title 'Classify local broker startup stalls' is vague and does not clearly convey the main changes described in the pull request objectives.	Consider revising the title to be more specific about the primary change, such as 'Add RUNTIME_HANDOFF_STALLED blocker for broker startup timeouts' or 'Improve broker startup timeout classification and validation handling'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description provides comprehensive details about the changes, including new blockers, routing updates, diagnostics improvements, and validation constraints, all related to the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/local-broker-startup-timeout-classification

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

src/runtime/diagnostics/failure-diagnosis.test.ts (1)

41-46: ⚡ Quick win

Add a timeout-first phrasing regression case.

This test covers broker-first wording, but Line 118 in src/runtime/diagnostics/failure-diagnosis.ts adds a timeout-first branch. Add one explicit test for that order to prevent silent regressions on the new matcher path.

Suggested test addition

 describe('failure-diagnosis: blocker differentiation', () => {
+  it('classifies timeout-first broker acknowledgement wording as a handoff stall', () => {
+    expectDiagnosis(
+      { source: 'runtime', message: 'timed out waiting for Agent Relay broker startup acknowledgement' },
+      BlockerClass.RuntimeHandoffStall,
+    );
+  });
+
   it('classifies handoff stall by source', () => {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/diagnostics/failure-diagnosis.test.ts` around lines 41 - 46, Add
a new unit test in src/runtime/diagnostics/failure-diagnosis.test.ts that
mirrors the existing broker-first case but uses the timeout-first message
ordering so the timeout-first matcher branch in failure-diagnosis.ts is
exercised; specifically call expectDiagnosis with a diagnostic object whose
source is 'runtime' and message equal to the timeout-first phrasing (e.g.,
starting with "local broker startup timeout" then mentioning "waiting for Agent
Relay broker acknowledgement") and assert BlockerClass.RuntimeHandoffStall,
keeping the test name similar to "classifies broker startup timeout as a handoff
stall (timeout-first phrasing)" so both matcher paths are covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/runtime/diagnostics/failure-diagnosis.test.ts`:
- Around line 41-46: Add a new unit test in
src/runtime/diagnostics/failure-diagnosis.test.ts that mirrors the existing
broker-first case but uses the timeout-first message ordering so the
timeout-first matcher branch in failure-diagnosis.ts is exercised; specifically
call expectDiagnosis with a diagnostic object whose source is 'runtime' and
message equal to the timeout-first phrasing (e.g., starting with "local broker
startup timeout" then mentioning "waiting for Agent Relay broker
acknowledgement") and assert BlockerClass.RuntimeHandoffStall, keeping the test
name similar to "classifies broker startup timeout as a handoff stall
(timeout-first phrasing)" so both matcher paths are covered.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: a55b1503-4e79-4228-88d4-cc2ced2b07e5

📥 Commits

Reviewing files that changed from the base of the PR and between 9d65615 and 2834519.

📒 Files selected for processing (5)

src/local/auto-fix-loop.ts
src/local/entrypoint.test.ts
src/local/entrypoint.ts
src/runtime/diagnostics/failure-diagnosis.test.ts
src/runtime/diagnostics/failure-diagnosis.ts

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

github-actions · 2026-05-21T20:30:46Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-22T09-31-12-059Z-HEAD-provider
Mode: provider
Git SHA: edddb6f

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

## Updated Ricky Documentation – “Deeper Integration with Agent Assistant”

### 1. Overview
Ricky (AgentWorkforce reliability, coordination, and authoring assistant) now **leverages the Agent Assistant (AA) platform as a first‑class service** for:

| Feature | How AA is used | Benefit |
|---------|----------------|---------|
| **Intent Classification** | Calls `aa.intent.detect()` on every incoming user request. | Near‑real‑time routing to the correct workflow branch. |
| **Contextual Memory** | Persists and retrieves session state via `aa.memory.store()` / `aa.memory.fetch()`. | Stateful multi‑step coordination without custom DB code. |
| **Tool Orchestration** | Executes external tools through `aa.tool.run()` with built‑in retry & timeout policies. | Uniform error handling and observability across all tool calls. |
| **Self‑Healing Verification** | Runs `aa.verify.contract()` after each workflow step. | Deterministic gate that blocks on contract violations. |
| **Feedback Loop** | Sends `aa.feedback.record()` after each user interaction. | Auto‑tuning of prompts and routing heuristics. |

> **Note:** All AA calls are wrapped in the `AAClient` helper (see code section) to keep Ric
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑22)**  

| Metric | Current State | Target / Desired State | Gap / Action |
|--------|---------------|------------------------|--------------|
| **Workflow contracts defined** | 12 contracts (5 core, 7 extension) | 100 % of active agents should have a contract | 2 core agents still use ad‑hoc context handling – create contracts for them. |
| **Deterministic verification gates** | 78 % of contracts have unit‑test gates; 62 % have integration‑test gates | 100 % gate coverage (unit + integration) | Add integration tests for the “summarize‑session” and “escalate‑on‑timeout” gates. |
| **Review artifacts** | 94 % of PRs include a “Context‑Impact” checklist; 68 % include a “Turn‑Latency” benchmark report | 100 % checklist + benchmark | Enforce benchmark artifact generation in the CI template. |
| **Validation loops (80‑100)** | Average loops per contract: 63 (range 41‑78) | ≥ 80 loops per contract before merge | Schedule a “Loop‑Boost” sprint to add synthetic turn scenarios (edge‑case token limits, multi‑agent hand‑off). |
| **Blocker reporting** | 12 blockers logged in the last quarter; 8 resolved within SLA (48 h) | 1
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Agent Assistant already owns the *authoring contract* (input schema → workflow DSL). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for DSL syntax, enabling deterministic verification gates downstream. |
| **Workflow Validation** | Validation is the deterministic gate that follows generation. Embedding it in the same service ensures the contract is enforced immediately (schema, type‑checking, static analysis). | Reduces false‑positive PR merges; validation failures become explicit blocker artifacts attached to the PR. |
| **Debugging (runtime simulation & trace)** | Agent Assistant already maintains the execution sandbox used for “dry‑run” checks. Extending it to pr
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (May 27 – Aug 19 2024)*  

---

## 1. Slice Goal
Deploy the next‑generation reliability & coordination layer (Ricky v2.0) to **30 % of active AgentWorkforce teams** (≈ 45 teams) while establishing a repeatable adoption pipeline that guarantees:

* **Deterministic contract compliance** – every new workflow must pass the *Ricky Contract Suite* (RCS) before merge.  
* **Verification gates** – automated + manual checks at **design → code → staging → production**.  
* **Feedback loops** – ≥ 80 validation cycles per team (issue → fix → re‑test) before “green‑light”.  
* **Blocker transparency** – real‑time blocker dashboard, auto‑escalation to PM‑lead if > 48 h unresolved.

---

## 2. Scope & Boundaries  

| Area | In‑Scope | Out‑of‑Scope |
|------|----------|--------------|
| **Workflow authoring** | New “Ricky‑Ready” templates (incident triage, SLA‑monitor, cross‑team hand‑off). | Legacy workflows that never touch Ricky APIs. |
| **CI/CD pipeline** | Extension of existing `ricky-ci.yml` with RCS steps; separate branch `adopt/v2‑slice`. | Full platform rewrite (e.g., migration to Bazel). |
| **Observab
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Slack / Web Support Design  
*Agent Assistant Package (AAP)‑based, deterministic, review‑ready, and fully testable.*

---

### 1. High‑level Architecture

| Component | Responsibility | AAP Package | Runtime |
|-----------|----------------|------------|---------|
| **Ricky‑Core** | Business rules, workflow orchestration, contract enforcement | `ricky-core` | Node 18 (Lambda) |
| **Slack‑Adapter** | Translate Slack events ↔ Ricky‑Core actions | `ricky-slack-adapter` | Cloud‑Run (container) |
| **Web‑Portal** | UI for agents & managers (React SPA) | `ricky-web-portal` | Vercel / Netlify |
| **Auth‑Gateway** | OAuth2 (Slack) + SSO (Okta) → JWT | `ricky-auth-gateway` | Cloud‑Run |
| **Persistence** | DynamoDB (state), S3 (artifacts) | `ricky-persistence` | Managed AWS |
| **Observability** | Structured logs, metrics, trace IDs | `ricky-obs` | CloudWatch / Datadog |
| **CI/CD** | GitHub Actions, deterministic gates, 80‑100 validation loops | – | – |

All packages are published to the internal **Agent Assistant Registry** (`registry.internal.ricky.ai`) and version‑locked via `package-lock.json`.

---

### 2. Contract‑First API Design

#### 2.1 Core Service Contract (`ri
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step	Action performed by Ricky	Result you’ll see
1️⃣	Parse the command line – Ricky reads `--mode local` and the free‑form `--spec` string.	The tool knows you want a local workflow generation, not a remote or CI‑only run.
2️⃣	Interpret the spec – The text `"generate a workflow for package checks"` is sent to the Spec‑to‑Workflow engine.	Ricky builds a workflow contract (YAML/JSON) that describes the series of package‑check tasks (e.g., lint, unit‑test, dependency audit, build artifact verification).
3️⃣	Render the workflow file – Using the standard `workflow/template` repository, Ricky fills in the concrete steps, adds deterministic verification gates (e.g., `lint‑pass`, `test‑coverage≥80%`), and writes the output to `./ricky_workflows/package-checks.yaml` (or the path you configured).	You get a ready‑to‑commit workflow definition, but no tasks are executed.
4️⃣	Print a summary – Ricky logs a concise report: th
...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the **Ricky** AgentWorkforce CLI. Follow the steps in order; each step is a deterministic gate that either succeeds (green) or fails (red) with an explicit error message. If any gate fails, stop, fix the reported blocker, and re‑run the failed step before proceeding.

---

## 1️⃣ Prerequisites (Verification Gate)

| Check | Command | Expected Result |
|-------|---------|-----------------|
| **Node ≥18** | `node -v` | `v18.x` or higher |
| **npm ≥9** | `npm -v` | `9.x` or higher |
| **Git** | `git --version` | `git version 2.x` |
| **Docker** (optional, for container‑based agents) | `docker version --format '{{.Server.Version}}'` | `>=20.10` |

*If any check is red, install the missing tool before continuing.*

---

## 2️⃣ Install the Ricky CLI (Deterministic Installation)

```bash
# Global install (recommended)
npm i -g @ricky/cli
# Verify installation
ricky --version

Success output: ricky-cli v1.2.3 (or later).
Failure: see the npm error log, resolve dependency conflicts, then re‑run.

3️⃣ Initialise a New Workspace (Contract: `workspace:init`)

...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky’s Response – Handling an Empty or Missing Spec File**

---

### 1. Immediate Blocker Report
- **Status:** **BLOCKED**
- **Reason:** No functional specification (or an empty file) is available to derive a deterministic workflow contract.
- **Impact:** No downstream validation, testing, or CI pipelines can be generated until a concrete spec is supplied.

---

### 2. Automated Notification to the Requestor
```markdown
🚨 **Workflow Generation Blocked**

I’m unable to create a workflow because the specification file is missing or empty.

**Required Action**
1. **Provide a complete spec** (YAML/JSON/Markdown) that describes:
   - **Inputs** (data types, required fields, validation rules)
   - **Processing steps** (tasks, dependencies, resources)
   - **Outputs** (artifacts, formats, delivery locations)
2. If you’re unsure of the exact format, use the **Spec Skeleton** template below and fill in the details.

Once the spec is uploaded, I’ll:
- Generate a deterministic workflow contract.
- Create verification gates and review artifacts.
- Spin up the 80‑to‑100 validation loop.

Thanks for your cooperation! 🙏

3. Provide a Minimal Spec Skeleton (for the user to fill)

...[truncated]


**Must**

- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.

**Must Not**

- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.status-does-not-invent-provider-state</code> (cli-behavior/manual)</summary>

**User message**

> Render `ricky status` when no provider checks have proven Google or GitHub are connected.

**Ricky output**

```text
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Last Run
• Workflow Engine : idle
• Last Execution : N/A
• Pending Jobs : 0

📦 Configuration
• Active Branches : main
• Loaded Plugins : core, auth, vcs
• Feature Flags : all defaults

⚠️ Alerts
• No successful connectivity verification for Google.
• No successful connectivity verification for GitHub.
• Actions that depend on these providers may fail until verification succeeds.

🛠️ Next Steps (recommended)

Run provider health checks:
ricky check --provider google
ricky check --provider github
Review credentials / tokens in the secret store.
Re‑run the workflow after successful checks.

--- End of Report ---

Must

Report unknown or not-connected provider state honestly.
Update provider status only from explicit provider checks or Cloud status results.
Give concrete setup guidance for Cloud when relevant.

Must Not

Mark Google or GitHub connected because guidance text was shown.
Invent a provider connection URL or OAuth flow.
Show empty fields with no recovery guidance when config is missing.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a single‑file edit (e.g., typo fix, config tweak) to a repository while guaranteeing traceability, automated verification, and safe merge.  

---  

## 1. Scope & Branching Contract  

| Item | Definition |
|------|------------|
| **Change Type** | Primitive (single‑file, < 200 LOC) |
| **Target Repo** | `owner/repo` (must have `main` protected) |
| **Branch Naming** | `ricky/primitive/<short‑desc>/<YYYYMMDD>` |
| **PR Title** | `fix: <short‑desc>` (follow Conventional Commits) |
| **PR Labels** | `type:primitive`, `status:ready-for-review` |
| **Merge Strategy** | Squash‑merge only, require `main` status checks pass |

*Boundary*: No new dependencies, no CI config changes, no workflow file edits. Anything outside this contract is a blocker and must be escalated to a higher‑level workflow.  

---  

## 2. Skill Set Selected by Ricky  

| Skill | Role in Workflow |
|-------|------------------|
| **GitOps** | Branch creation, commit authoring, PR lifecycle |
| **Static Analysis** | Run `eslint` / `flake8` (depending on language) on the changed file |
| **Unit Test Runner** | Execute affected test suite (`npm test`, `pytest -q
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Owner** | `team‑mlops` |
| **Purpose** | Automate a deterministic pipeline where Claude performs a textual/code review of a design spec, then Codex generates the implementation. |
| **Scope** | • Input: `design_spec.md` (Markdown) <br>• Output: `implementation/` (source tree) <br>• Supported languages: Python, JavaScript, TypeScript |
| **Branch/PR Boundary** | All changes are made on a feature branch `feat/review‑implement‑<ticket-id>`. A single PR merges the branch after the final verification gate. |
| **Contract** | <ul><li>**Input contract** – `design_spec.md` must contain a top‑level `## Requirements` section and a `## Acceptance Criteria` list.</li><li>**Claude review contract** – Claude must return a JSON object with keys `issues`, `suggestions`, and `approved` (boolean). `issues` is an array of strings, each ≤ 120 chars.</li><li>**Codex implementation contract** – Codex must emit a zip file `implementation.zip` whose root contains a `README.md` and source files that pass the unit‑test suite generated from the acceptance 
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.

Below is the end‑to‑end contract you can copy‑paste into your CI pipeline or local CLI.

---

## 1️⃣ Contract Overview  

| Element | Requirement | Rationale |
|--------|-------------|-----------|
| **Input** | `workflow.{json|yaml}` – a valid workflow spec (must contain `steps[]` with `id`, `task.description`, `task.acceptanceGate`) | Guarantees the refiner has a complete graph to work on. |
| **Output** | Same schema, but every `task.description` is **imperative** (verb‑first, ≤ 120 chars) and every `task.acceptanceGate` is **boolean‑expressible** (e.g., `output.fileExists && output.rowCount > 0`) | Improves readability, testability, and downstream automation. |
| **Determinism** | The refiner must be **pure**: given identical input it always returns identical output. No timestamps, random IDs, or envi
...[truncated]

Must

Keep the deterministic unrefined path as the default.
Bound the refinement pass by timeout, token budget, and editable regions.
Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

Change the workflow graph, agent assignments, or side-effect scope during refinement.
Fail the whole generation if optional refinement times out.
Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Release**  
*Repository: `github.com/agentworkforce/ricky`*  
*Target Branch: `main` → `release/vX.Y.Z`*  

---  

### 1. Scope & Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Prints the current semantic version (`MAJOR.MINOR.PATCH`) of the `ricky` binary.  
2. Includes optional build metadata (`+git.<sha>`) when the binary is built from a non‑tagged commit.  
3. Returns a non‑zero exit code **only** on internal errors (e.g., failure to read embedded version data).  

The workflow must guarantee that every release tag matches the output of `ricky --version` on the published binary.

---  

### 2. Contract (YAML)

```yaml
name: ricky-version
description: |
  Implements the `ricky --version` flag with deterministic output.
inputs:
  - name: source_dir
    type: path
    description: Root of the ricky source tree.
outputs:
  - name: binary_path
    type: path
    description: Path to the built `ricky` executable.
  - name: version_string
    type: string
    description: Exact string printed by `ricky --version`.
gates:
  - name: compile_success
    type: boolean
    description: 
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large collection of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation pipeline, automatically apply fix‑scripts where possible, and rerun until all artifacts are either **PASS** or **BLOCKED**. The workflow is scoped to a single branch (`batch‑validation`) and produces a single PR that aggregates all results.

---

## 1. Repository Layout (branch `batch‑validation`)

/artifacts/
├─ artifact‑001.json
├─ artifact‑002.json
└─ … (any number)

/validation/
├─ schema.yaml # JSON/YAML schema for all artifacts
├─ validator.py # deterministic validator (returns PASS/FAIL)
└─ fixers/
├─ fixer_001.py # id‑specific fixers (optional)
└─ generic_fixer.py # fallback fixer

/.github/
└─ workflows/
└─ batch_validation.yml # CI workflow (this document)


---

## 2. Contract (Inputs / Outputs)

| Item | Type | Description |
|------|------|-------------|
| `artifacts/*` | file set | Independent artifacts to be processed. |
| `validation/schema.yaml` 
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How the Selected Workflow‑Writing Skills Shaped the Generated Ricky Workflow  

| Skill | What the Skill Enforces | Direct Effect on the Generated Workflow |
|-------|------------------------|------------------------------------------|
| **Concrete Contract Definition** | Every step must declare **inputs**, **outputs**, and **exit criteria** in a typed JSON schema. | The workflow begins with a `contract` block that lists required parameters (`sourceRepo`, `targetBranch`, `reviewers`) and the exact shape of the final artifact (`prUrl`, `status`). Down‑stream steps reference these contracts, so the runtime can validate data before a step runs, preventing “missing field” errors. |
| **Deterministic Verification Gates** | After each logical group of actions, a **gate** runs a pure‑function check that must return `true` before the pipeline proceeds. | After the *lint* and *unit‑test* stages, a `gate: lintPass` and `gate: unitPass` are inserted. They call the `verifyLint` and `verifyTests` functions, respectively. If a gate fails, the workflow aborts with a clear `BLOCKER` status, avoiding cascading failures. |
| **Review Artifacts & Traceability** | Every human‑in‑the‑loop step must
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

---

### 📋 Quick Turn‑around Request  
You asked for a workflow generated from a *vague* product specification. To deliver a reliable, testable workflow we need a minimal, concrete contract that defines:

| Item | Why It’s Needed | Example |
|------|----------------|---------|
| **Goal / Success Metric** | Drives the final verification gate. | “Process 10 k user sign‑ups with < 2 % error.” |
| **Key Actors / Systems** | Determines task owners and integration points. | `frontend`, `auth‑service`, `email‑svc`, `analytics` |
| **Core Steps (high‑level)** | Forms the backbone of the DAG. | 1️⃣ Collect form → 2️⃣ Validate → 3️⃣ Create account → 4️⃣ Send welcome email |
| **Inputs / Outputs per Step** | Enables deterministic unit‑tests and contract checks. | *Validate* receives `payload`; outputs `validatedPayload` or `validationError`. |
| **Non‑functional Constraints** | Sets performance / security gates. | “All steps must complete < 500 ms; data at rest encrypted.” |
| **Failure & Retry Policy** | Guarantees reliability and defines blocker handling. | Retry up to 3× with exponential back‑off; on permanent failure raise a `workflow‑blocked` a
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Immediate containment**  
   - **Abort the stray worker** – send a termination signal to the container/VM that is still alive.  
   - **Mark the timed‑out step as *failed‑timeout*** in the workflow run record (status = `FAILED`, reason = `TIMEOUT`).  

2. **Gather evidence**  
   - Pull the worker’s stdout/stderr, system metrics, and any partial files from its sandbox.  
   - Retrieve the workflow engine logs for the timed‑out step (start‑time, timeout‑deadline, heartbeat timestamps).  
   - Store all artifacts in the run‑specific “debug bundle” bucket (`gs://wf‑debug/<run‑id>/`).

3. **Create a deterministic verification gate**  
   - Add a **post‑mortem gate** to the workflow definition that will run on the next execution:  
     ```yaml
     - name: verify‑no‑orphan‑workers
       type: verification
       condition: |
         all_workers == expected_workers
       on_failure: abort
     ```
   - This gate guarantees that any future stray workers cause an immediate abort before the next step starts.

4. **Open a scoped incident ticket**  
   - Title: *[WF‑<name>] Step X timed out, worker Y orphaned*  
   - Attach the debug bundle, logs, a
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `detect‑stale‑state` |
| **Scope** | Runs **before** any `local‑launch` entry point in a developer’s workspace. |
| **Inputs** | <ul><li>`repoRoot: string` – absolute path to the repository root.</li><li>`staleDirs: string[]` – default `[“.agent‑relay”, “.relay”, “.trajectories”]`.</li></ul> |
| **Outputs** | <ul><li>`staleFound: boolean` – true if any of the target directories contain files older than the configured freshness threshold.</li><li>`staleReport: string` – human‑readable markdown report.</li></ul> |
| **Success Criteria** | <ul><li>`staleFound === false` → workflow proceeds to the next stage.</li><li>`staleFound === true` → workflow aborts with a **Blocker** status and emits the `staleReport`.</li></ul> |
| **Freshness Threshold** |  **24 h** by default; configurable via `WORKFLOW_STALE_THRESHOLD_HOURS` env var. |
| **Deterministic Verification Gate** | A **unit‑test** (`detect‑stale‑state.test.ts`) that creates temporary directories with controlled timestamps and asserts both true/false paths. |
| **Review Artifact** | `det
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The marker is a lock file that prevents concurrent executions. It can be left behind when a previous run crashes, is killed, or when two developers start a run in the same workspace. Follow the deterministic repair flow below.

---

## 1️⃣ Verify the current state

| Command | Expected output |
|---------|-----------------|
| `ricky status` | Shows **idle** or **busy**. If it reports *busy* but you see no active process, the lock is stale. |
| `ps -ef \| grep ricky` (or `tasklist /FI "IMAGENAME eq ricky.exe"` on Windows) | No running `ricky`/`relay` processes. |

If a process **is** running and you really need to stop it, use `ricky stop` (graceful) or `kill <pid>` (force).

---

## 2️⃣ Locate the lock file

The lock is stored in the workspace root under the hidden `.ricky` directory:

/
├─ .ricky/
│ ├─ run.marker ← lock file
│ └─ run.pid ← PID of the process that created the lock


---

## 3️⃣ Deterministic clean‑up procedure

| Step | Action | Verification |
|------|--------|---------------|
| **3.1** | **Confirm no live process** – `cat .ricky/run.pid` → PID. Run `ps -p <PI
...[truncated]

Must

Report the active marker, run id, or status path when available.
Ask the user to inspect, wait for, or explicitly clear the active run.
Avoid launching a competing run that could corrupt evidence.

Must Not

Silently start another run.
Hide the existing run marker from the user.
Treat the conflict as a generic failure with no recovery path.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the exact patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched** – even when they contain HEREDOC‑style `node --input-type=module` snippets that embed literal `import { … } from 'node:fs'` or `from 'node:path'` strings.

---

## 1. Helper utilities (add once per repository)

Create (or update) `src/rickyWorkflowEnv.ts`:

```ts
// src/rickyWorkflowEnv.ts
import { config as loadDotenv } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.
 * This runs synchronously at import time, guaranteeing that any
 * subsequent `process.env` access sees the loaded values.
 *
...[truncated]

Must

Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. The contract assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Preconditions (verify before you start)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| `aw` binary present | `which aw` | Path to `aw` |
| Correct repo root | `git rev-parse --show-toplevel` | Repo root path |
| Working tree clean | `git status --porcelain` | No output |
| Docker (or runtime) available | `docker version` | Client/Server version printed |
| Environment variables | `echo $AW_WORKFLOW_DIR $AW_ARTIFACTS_DIR` | Non‑empty values |

If any check fails, **block** the run and resolve the issue before proceeding.

---

## 2️⃣ Input contract

```json
{
  "workflow_name": "my_local_workflow",          // <‑ name of the .aw.yaml file (no extension)
  "failed_step": "process_data",                 // <‑ exact step identifier that failed
  "previous_run
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

| Step | Command / Action | What to Look For |
|------|------------------|-----------------|
| 1️⃣  | `ricky logs workflows/foo.ts --tail 100` | Recent error messages, stack traces, and the exact point where the workflow stopped. |
| 2️⃣  | `ricky run workflows/foo.ts --debug` | Verbose output that shows each **gate** (validation, side‑effect, retry) being entered and exited. |
| 3️⃣  | Open the workflow file | Verify the **contract** at the top of `foo.ts` – inputs, outputs, and required environment variables. |
| 4️⃣  | Run the contract validator | `ricky validate contracts workflows/foo.ts` – ensures the TypeScript types and JSON schema match. |
| 5️⃣  | Check the **deterministic verification gates** | Look for any `gate.verify(...)` calls that returned `false`. The log will include the gate name and the failing payload. |
| 6️⃣  | Inspect the **review artifact** (if any) | `ricky artifacts show foo-review.json` – contains the last successful review snapshot; compare it to the current state. |
| 7️⃣  | Run a single step in isolation | `ricky step run workflows/foo.ts --step <stepName> --no-auto-fix` – isolates the failure to a specific node. |
| 
...[truncated]

Must

Preserve one-attempt behavior when auto-fix is disabled.
Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
Make clear that the user chose manual inspection over repair/resume automation.

Must Not

Start a repair loop despite --no-auto-fix.
Suppress the diagnosis because no repair was attempted.
Present the failure as a completed repair attempt.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Execution Playbook for a Local TypeScript Workflow Artifact**

Below is the concrete, repeat‑able contract Ricky follows when a developer asks to run a TypeScript‑based workflow that lives in the *primary local path* (the repository root or the designated `workflows/` folder). The steps are written as a deterministic workflow that can be copied into a branch, reviewed, and merged without side‑effects.

---

## 1️⃣ Preconditions (Gate A – “Environment Ready”)

| Check | Expected State | Verification Command |
|-------|----------------|----------------------|
| Node runtime | `>=18.0.0` | `node -v` |
| npm / yarn | `>=9.0.0` (npm) or `>=1.22.0` (yarn) | `npm -v` / `yarn -v` |
| TypeScript compiler | `>=5.0.0` | `npx tsc -v` |
| `ts-node` installed globally **or** as a dev‑dependency | `>=10.0.0` | `npx ts-node -v` |
| Repository clean | No uncommitted changes in the primary path | `git status --porcelain` (must be empty) |
| Required secrets / env vars present | All keys listed in `.env.example` are defined | `dotenv -e .env -- printenv | grep -E 'KEY1|KEY2'` |

If any check fails, Ricky raises a **Blocker** with a short remediation note and aborts the run.

---

## 2️⃣ Art
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Protocol for Persistent Structural Failures**

When a structural failure remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven workflow that isolates the problem, prevents downstream impact, and creates a clear hand‑off for human owners. The protocol is split into **four phases** – Detection, Containment, Escalation, and Recovery – each with explicit contracts, verification gates, and artifact requirements.

---

## 1. Detection Phase  
| Step | Contract | Verification Gate | Artifact |
|------|----------|-------------------|----------|
| 1.1 Log failure event | `failure_event: {id, timestamp, component, severity, error_code}` | Event must be persisted in the central **Failure Ledger** (SL‑001) within 5 s. | `failure_event.json` |
| 1.2 Run automated health‑check suite | `health_check: {component, status, metrics[]}` | All checks return `PASS` or `FAIL` with a deterministic exit code. | `health_check_report.yaml` |
| 1.3 Compare against *Fix‑Attempt Contract* (FAC‑v2) | `fac_applied: true/false` | If `fac_applied` is true but `health_check` still reports `FAIL`, transition to Containment. | `fac_audit.log` |

---

#
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary (Generated 2026‑05‑22)**  

| Metric | Value | Interpretation |
|--------|-------|----------------|
| **Total Runs (last 30 days)** | 1 842 | ‑ |
| **Successful Runs** | 1 617 (87.8 %) | Healthy |
| **Failed Runs** | 225 (12.2 %) | Review required |
| **Runs with Blockers** | 38 (2.1 %) | Immediate attention |
| **Average Runtime** | 4 m 23 s | Within SLA (≤ 5 m) |
| **95th‑percentile Runtime** | 7 m 12 s | Slightly above SLA – investigate outliers |
| **Deterministic Verification Pass Rate** | 96.4 % | 3.6 % non‑deterministic failures |
| **Contract‑Compliance Checks** | 1 842 / 1 842 (100 %) | All runs emitted contract artifacts |
| **Review Artifacts Produced** | 1 842 (PRs, logs, diff reports) | Complete |

---

### 1. Top 3 Failure Categories (by count)

| Category | Runs | % of Failures | Typical Root Cause |
|----------|------|---------------|--------------------|
| **Schema Mismatch** | 97 | 43 % | Upstream data‑model change not reflected in contract version |
| **Determinism Violation** | 68 | 30 % | Non‑idempotent external API calls; missing mock stubs |
| **Resource Exhaustion** | 42 | 19 % | CI runner memory caps exceeded on larg
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

The repaired artifact must be a **self‑contained, deterministic workflow definition** that satisfies every static‑analysis and runtime‑precheck rule enforced by the AgentWorkforce CI pipeline. Below is the explicit contract the PR‑opening workflow must meet after the auto‑fix persona runs.

| # | Contract Element | Requirement | How to Verify |
|---|------------------|-------------|----------------|
| 1 | **Module Import** | The workflow must import the **exact** package name `@agent-relay/github-primitive` **once**, using a named import that includes `createGitHubStep`. No stray or duplicate imports are allowed. | `npm ls @agent-relay/github-primitive` → version resolved; AST scan for duplicate `import` statements. |
| 2 | **Step Factory Usage** | `createGitHubStep` must be called **exactly once** to produce a step object that is later added to the workflow’s `steps` array. The call must pass a **typed configuration object** with the required keys: `action: "createPullRequest"`, `repo`, `baseBranch`, `headBranch`, `title`, `body`. All values must be **serializable primitives** (string/boolean) or references to earlier step outputs. | Static type‑che
...[truncated]

Must

Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel. It follows the Ricky repository conventions (AGENTS.md) and the shared authoring rules:

| Aspect | Decision |
|--------|----------|
| **Entry point** | `/ricky debug <workflow‑id>` (Slash command) |
| **Response type** | Ephemeral message with interactive blocks; optional thread for long‑running logs |
| **Authentication** | OAuth 2.0 token scoped to `workflow:read`, `workflow:write`, `slack:chat:write` |
| **State persistence** | `debug_sessions` table (PostgreSQL) – one row per session, TTL = 48 h |
| **Visibility** | Only the invoking user and members of the workflow’s “owner” Slack user group can view the session |
| **Error handling** | Deterministic “blocker” messages (see §5) and a fallback “Report issue” button that opens a pre‑filled GitHub issue template |

---

## 2. Interaction Flow (Contract)

| Step | Slack UI | Backend Action | Success → Next | Failure → Blocker |
|------|----------|----------------|----------------
...[truncated]

Must

Treat Slack as a co-equal product surface, not a wrapper around CLI.
Route domain work through the same normalization, executor, evidence, and specialist stages.
Keep Slack-specific fo

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/pipeline.test.ts`:
- Around line 1075-1076: Remove the legacy tokenized assertion and keep only the
normalized shell command assertion: delete the expect checking for "'ls-files',
'--others', '--exclude-standard'" and ensure the test retains the
expect(gate(artifact, 'git-diff-gate').command).toContain('git ls-files --others
--exclude-standard'); so the test validates the normalized command emitted by
gate(artifact, 'git-diff-gate').command.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4d78644e-474f-42ec-b92b-2850309ce427

📥 Commits

Reviewing files that changed from the base of the PR and between 2834519 and 91fb8f7.

📒 Files selected for processing (2)

src/product/generation/pipeline.test.ts
src/product/generation/template-renderer.ts

✅ Files skipped from review due to trivial changes (1)

src/product/generation/template-renderer.ts

coderabbitai · 2026-05-22T09:33:03Z

    expect(gate(artifact, 'git-diff-gate').command).toContain("'ls-files', '--others', '--exclude-standard'");
+    expect(gate(artifact, 'git-diff-gate').command).toContain('git ls-files --others --exclude-standard');


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Remove the legacy ls-files token assertion to avoid contradictory expectations.

Line 1075 still enforces the old tokenized format while Line 1076 enforces the normalized shell command. This can fail a correct implementation that emits only the normalized form. Keep only the normalized expectation.

Suggested change

- expect(gate(artifact, 'git-diff-gate').command).toContain("'ls-files', '--others', '--exclude-standard'"); expect(gate(artifact, 'git-diff-gate').command).toContain('git ls-files --others --exclude-standard');

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

expect(gate(artifact, 'git-diff-gate').command).toContain("'ls-files', '--others', '--exclude-standard'");

expect(gate(artifact, 'git-diff-gate').command).toContain('git ls-files --others --exclude-standard');

expect(gate(artifact, 'git-diff-gate').command).toContain('git ls-files --others --exclude-standard');

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/product/generation/pipeline.test.ts` around lines 1075 - 1076, Remove the legacy tokenized assertion and keep only the normalized shell command assertion: delete the expect checking for "'ls-files', '--others', '--exclude-standard'" and ensure the test retains the expect(gate(artifact, 'git-diff-gate').command).toContain('git ls-files --others --exclude-standard'); so the test validates the normalized command emitted by gate(artifact, 'git-diff-gate').command.

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 21, 2026

View reviewed changes

fix(generation): expose untracked diff evidence text

91fb8f7

kjgbot force-pushed the fix/local-broker-startup-timeout-classification branch from 730ec5c to 91fb8f7 Compare May 22, 2026 09:30

coderabbitai Bot reviewed May 22, 2026

View reviewed changes

khaliqgant merged commit 424c5ed into main May 22, 2026
3 checks passed

khaliqgant deleted the fix/local-broker-startup-timeout-classification branch May 22, 2026 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify local broker startup stalls#131

Classify local broker startup stalls#131
khaliqgant merged 1 commit into
mainfrom
fix/local-broker-startup-timeout-classification

miyaontherelay commented May 21, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

github-actions Bot commented May 21, 2026 •

edited

Loading

3️⃣ Initialise a New Workspace (Contract: `workspace:init`)

3. Provide a Minimal Spec Skeleton (for the user to fill)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		expect(gate(artifact, 'git-diff-gate').command).toContain("'ls-files', '--others', '--exclude-standard'");
		expect(gate(artifact, 'git-diff-gate').command).toContain('git ls-files --others --exclude-standard');

Conversation

miyaontherelay commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Notes

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

github-actions Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ricky Eval Review

Human Review Cases

3️⃣ Initialise a New Workspace (Contract: workspace:init)

3. Provide a Minimal Spec Skeleton (for the user to fill)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

miyaontherelay commented May 21, 2026 •

edited

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading

github-actions Bot commented May 21, 2026 •

edited

Loading

3️⃣ Initialise a New Workspace (Contract: `workspace:init`)