fix: avoid optional env preflight blockers by kjgbot · Pull Request #118 · AgentWorkforce/ricky

kjgbot · 2026-05-15T21:10:36Z

Summary

Treat common BYOH/runtime vars like OPENAI_API_KEY and ANTHROPIC_API_KEY as optional during deterministic MISSING_ENV_VAR auto-fix instead of injecting workflow-start assertions for them.
Relax over-broad shell REQUIRED_VARS preflight arrays into OPTIONAL_VARS warnings when they only contain ambient local runtime credentials.
Add regression coverage for the OPENAI_API_KEY failure shape that blocked local Ricky workflow reruns.

Verification

npx vitest run src/local/auto-fix-loop.test.ts
npx tsc --noEmit --module NodeNext --moduleResolution NodeNext --target ES2022 --types node,vitest/globals --skipLibCheck src/local/auto-fix-loop.ts src/local/auto-fix-loop.test.ts
git diff --check -- src/local/auto-fix-loop.ts src/local/auto-fix-loop.test.ts

Known unrelated check

npm run typecheck -- --pretty false currently fails on src/scheduled-agent.ts because @agent-relay/agent types are not resolved and ctx/event are implicit any.

coderabbitai · 2026-05-15T21:10:47Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 609daed7-bb66-4db7-a81f-7caf197924e2

📥 Commits

Reviewing files that changed from the base of the PR and between 36bfcd6 and c0f08e0.

📒 Files selected for processing (2)

src/local/auto-fix-loop.test.ts
src/local/auto-fix-loop.ts

📝 Walkthrough

Walkthrough

The PR introduces support for treating specific environment variables (BYOH/runtime provider credentials) as optional during the deterministic repair flow. A constant defines which variables are ambient-optional, the repair preflight splits required vars into assertion and optional sets, shell logic is rewritten to relax missing-var handling, and tests verify the behavior.

Changes

Ambient Optional Environment Variables in Deterministic Repair

Layer / File(s)	Summary
Ambient optional constant and preflight integration `src/local/auto-fix-loop.ts`	Introduces `AMBIENT_OPTIONAL_ENV_VARS` constant identifying environment variables to treat as optional; `repairMissingEnvVarPreflight` splits detected required vars into assertion and ambient-optional sets, injecting fast assertions only for non-ambient vars.
Shell preflight rewriting for optional variables `src/local/auto-fix-loop.ts`	Adds `relaxAmbientOptionalEnvPreflight` and helper functions to rewrite the preflight shell script: converts `REQUIRED_VARS` to `OPTIONAL_VARS`, changes missing-var echo label from `MISSING_ENV_VAR` to `OPTIONAL_ENV_VAR_NOT_SET`, and removes `exit 1` behavior for unset optional vars.
Test coverage and test helper updates `src/local/auto-fix-loop.test.ts`	Updates `missingEnvEvidence` helper to accept an optional variable name parameter; adds two new tests verifying ambient credentials don't become workflow-start blockers and that over-broad required-variables preflights are rewritten to optional-var flows.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

AgentWorkforce/ricky#49: Both PRs modify the missing-MISSING_ENV_VAR auto-fix/repair flow in src/local/auto-fix-loop.ts by changing deterministic env-preflight behavior (the main PR relaxes specific env vars to optional while the related PR changes classification/extraction logic).
AgentWorkforce/ricky#47: Both PRs handle MISSING_ENV_VAR "environment prerequisite" blockers in auto-fix-loop.ts—the main PR relaxes missing-env failures for ambient optional vars while the related PR escalates for MISSING_ENV_VAR.
AgentWorkforce/ricky#93: The main PR's updates to preflight/asserter env-var handling are in the same codepath as the related PR's changes to assertRickyWorkflowEnv behavior during --start-from resumes.

🐰 With ambient credentials now floating free,
No more hard fails when they don't agree,
Optional whispers replace stern demands,
Preflight scripts now hold gentle hands. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: avoid optional env preflight blockers' accurately summarizes the main change of treating ambient environment variables as optional rather than required blockers in the deterministic auto-fix process.
Description check	✅ Passed	The description is well-structured and directly relates to the changeset, explaining the rationale for treating BYOH/runtime vars as optional, the specific changes made, and providing verification steps.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/optional-env-preflight-autofix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

No issues found across 2 files

_{Re-trigger cubic}

devin-ai-integration

Devin Review found 1 potential issue.

View 4 additional findings in Devin Review.

devin-ai-integration · 2026-05-15T21:17:22Z

+function relaxAmbientOptionalEnvPreflight(content: string, optionalEnvVars: string[]): string {
+  if (optionalEnvVars.length === 0) return content;
+
+  const optionalSet = new Set(optionalEnvVars);
+  const lines = content.split('\n');
+  let changed = false;
+
+  for (let index = 0; index < lines.length; index += 1) {
+    const line = lines[index] ?? '';
+    const match = /^(\s*)REQUIRED_VARS=\(([^)]*)\)/.exec(line);
+    if (!match) continue;
+
+    const [, indent = '', body = ''] = match;
+    const names = body.split(/\s+/).filter(Boolean);
+    const optionalInArray = names.filter((name) => optionalSet.has(name) || AMBIENT_OPTIONAL_ENV_VARS.has(name));
+    if (optionalInArray.length === 0) continue;
+
+    const optionalInArraySet = new Set(optionalInArray);
+    const requiredInArray = names.filter((name) => !optionalInArraySet.has(name));
+    if (requiredInArray.length === 0) {
+      lines[index] = `${indent}OPTIONAL_VARS=(${optionalInArray.join(' ')})`;
+      relaxFollowingEnvLoop(lines, index + 1);
+    } else {
+      const loopLines = optionalEnvLoopLines(indent, optionalInArray);
+      lines[index] = `${indent}REQUIRED_VARS=(${requiredInArray.join(' ')})`;
+      lines.splice(index + 1, 0, ...loopLines);
+      index += loopLines.length;
+    }
+    changed = true;
+  }
+
+  return changed ? lines.join('\n') : content;
+}


🔴 relaxAmbientOptionalEnvPreflight uses regex on raw TypeScript content instead of AST, violating AGENTS.md "Source-Text Analysis" rule

The new relaxAmbientOptionalEnvPreflight function (line 1081) and its helper relaxFollowingEnvLoop (line 1126) detect and modify shell patterns (REQUIRED_VARS=(...), REQUIRED_VARS[@], MISSING_ENV_VAR:, exit 1, done) by scanning raw TypeScript file content with regex and substring checks. This violates the mandatory AGENTS.md rule at line 179: "Do not use substring matches, line-anchored regex, or comment-stripping heuristics."

The regex /^(\s*)REQUIRED_VARS=\(([^)]*)\)/ at src/local/auto-fix-loop.ts:1090 will match REQUIRED_VARS=(...) anywhere in the file — including inside TypeScript comments, .step({ task: ... }) string literals, or multi-line template expressions that describe (but are not) shell command bodies. AGENTS.md specifically warns this is the bug class they keep shipping: "a substring or regex that looked right against the file's 'normal' code matches the same characters inside a string literal, a comment, a fenced code block, a HEREDOC, or a multi-line declaration."

Unbounded modification consequence in relaxFollowingEnvLoop

If the regex falsely matches REQUIRED_VARS=(...) in a non-shell context (e.g. a task description or comment), relaxFollowingEnvLoop enters inLoop = true mode and scans all remaining lines in the file replacing MISSING_ENV_VAR: strings and splicing out exit 1 lines until it finds a line trimming to exactly done. If no done exists (because the match wasn't inside a real shell for loop), the function corrupts the rest of the TypeScript artifact.

The correct approach per AGENTS.md would be to use the TypeScript AST (already available via import ts from 'typescript') to isolate .step({ command: ... }) string bodies, then apply the shell-pattern regex only within those proven-inert command regions — exactly as hasRickyWorkflowAliasImport (src/local/auto-fix-loop.ts:1206) already does for import detection.

Prompt for agents

The functions relaxAmbientOptionalEnvPreflight (line 1081) and relaxFollowingEnvLoop (line 1126) in src/local/auto-fix-loop.ts scan raw TypeScript file content with regex to find and modify shell REQUIRED_VARS=(...) patterns. This violates the AGENTS.md Source-Text Analysis rule which mandates grammar-aware parsers for text inspection. The fix should: 1. Use the TypeScript AST (ts.createSourceFile, already imported and used elsewhere in this file, e.g. hasRickyWorkflowAliasImport at line 1206 and parseWorkflowStepGraph at line 192) to walk .step() call expressions and locate the command property's string value. 2. Extract the command string body from the AST node. 3. Apply the REQUIRED_VARS regex and loop-relaxation logic ONLY on the extracted shell command text. 4. Replace the command string in the content with the modified version. This ensures REQUIRED_VARS=(...) appearing in TypeScript comments, task descriptions, or other string literals is not falsely matched and modified. It also bounds relaxFollowingEnvLoop to the shell command body, preventing it from scanning past the command boundary into TypeScript code.

Was this helpful? React with 👍 or 👎 to provide feedback.

github-actions · 2026-05-15T22:00:13Z

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-15T21-11-13-539Z-HEAD-provider
Mode: provider
Git SHA: f1e7c4c

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic change package that brings the repository in line with the new product spec: *Ricky now leverages the Agent Assistant (AA) for all high‑level orchestration, intent routing, and validation.*  

The package follows **AGENTS.md** conventions, the **workflow standards**, and the **shared authoring rules** (branch naming, PR size, review artifacts, verification gates, and 80‑100 validation loops).

---

## 1. Scope & Branch Layout
| Item | Description |
|------|-------------|
| **Branch** | `feature/aa‑deep‑integration` |
| **PR Title** | `feat(ricky): embed Agent Assistant throughout workflow orchestration` |
| **PR Description** | Summarizes spec change, lists modified files, outlines verification gates, and links to review checklist. |
| **Scope** | - Docs (`README.md`, `docs/architecture.md`, `docs/usage.md`) <br> - Core code (`ricky/orchestrator.py`, `ricky/validation.py`) <br> - Tests (`tests/test_orchestrator_aa.py`, `tests/test_validation_aa.py`) |
| **Boundaries** | No changes to unrelated modules (`ricky/utils/*`, CI config, packaging). |

---

## 2. 
...[truncated]

Must

Ground claims in real package imports and runtime paths.
Distinguish current implementation from target architecture.
Identify which Agent Assistant primitive is actually exercised.

Must Not

Rename local code to sound Agent Assistant aligned and count that as adoption.
Claim broad Agent Assistant native behavior from documentation-only alignment.
Blur target architecture with landed behavior.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑15)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Workflow contracts defined** | 12 of 18 active agents have a `turn‑context` contract in their `workflow.yaml`. | 100 % of agents must declare a contract. | Add contracts to the remaining 6 agents (e.g., `data‑ingest`, `alert‑router`, `nlp‑summarizer`). |
| **Deterministic verification gates** | 8 agents run a pre‑merge gate that validates `turn‑context` schema compliance; 4 rely on manual review. | All PRs must pass an automated `turn‑context` schema test. | Implement the `ricky-verify-turn-context` CI step for the 4 lagging pipelines. |
| **Review artifacts** | 65 % of PRs include a “Turn‑Context Impact” section; 35 % lack it. | 100 % compliance. | Add a PR template enforcement rule (`.github/PULL_REQUEST_TEMPLATE.md`) that makes the section mandatory. |
| **Validation loops** | Average of 42 validation loops per turn‑context change (target 80‑100). | 80‑100 loops to ensure stability before merge. | Increase loop count by enabling the `ricky‑loop‑tester` in CI for all agents; schedule nig
...[truncated]

Must

Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
Record compact provenance through generation decisions or coordinator metadata.
Keep the shared turn context internal to the adapter boundary.

Must Not

Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
Drop Ricky-specific workflow metadata during envelope assembly.
Treat turn context as a product decision engine.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities—workflow generation, validation, debugging, and the staged‑CLI UX with blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Agent Assistant already owns *authoring* contracts (template selection, parameter binding). Adding generation keeps the “create‑first” path in a single, deterministic service. | Guarantees that every generated workflow immediately satisfies the **generation contract** (required inputs, output schema) before it reaches the validation stage. |
| **Workflow Validation** | Validation is the deterministic gate that enforces the **validation contract** (schema, policy, idempotency). Centralizing it prevents divergent validation logic across CLI, UI, or external scripts. | Provides a single source of truth for pass/fail decisions, enabling reliable **80‑to‑100 validation loops** and clear evidence collection. |
| **Workflow Debugging**
...[truncated]

Must

Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
Make extraction follow typed, tested, live product proof.

Must Not

Generalize workflow-specific behavior prematurely.
Adopt moving shared seams merely for architectural purity.
Lose the precise local-first staged workflow UX.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: July 1 – September 30 2026*  

| Phase | Dates | Scope | Concrete Contract (Inputs → Outputs) | Verification Gates | Review Artifacts | Success Metrics |
|-------|-------|-------|--------------------------------------|---------------------|------------------|-----------------|
| **1️⃣ Initiation & Baseline** | 1 Jul – 7 Jul | • Stakeholder alignment <br>• Baseline environment snapshot | **Contract A**: <br>*Input*: Current Ricky repo (main), existing CI/CD pipelines, documented SLA. <br>*Output*: Baseline artifact package (tar.gz) + “Current‑State Report”. | **Gate A** – All stakeholders sign‑off on baseline report (Slack #ricky‑adopt). | • Baseline Report (PDF) <br>• Signed sign‑off sheet (Google Doc) | 100 % of required stakeholders (Product, Ops, Security) signed. |
| **2️⃣ Feature Freeze & Branching** | 8 Jul – 14 Jul | • Create a scoped feature branch for the slice (ricky‑v2‑adopt). <br>• Freeze non‑slice changes on main. | **Contract B**: <br>*Input*: Baseline artifact, feature list (see §3). <br>*Output*: Branch `ricky-v2-adopt` with protected PR rules (2‑reviewer + CI). | **Gate B** – Branch protec
...[truncated]

Must

Pick exactly one real shared seam to evaluate or adopt.
Define a live Ricky product path that will prove the adoption.
Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
Skip the comparison/evaluation step for mature Ricky-local seams.
Treat adoption as successful without a live product-path proof.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky‑Assist : Future Slack / Web Support Package  
*Agent Assistant (AA) v2.3 – “Ricky‑Assist”*  

| Component | Purpose | Contract (AA‑Spec) | Verification Gate |
|-----------|---------|--------------------|-------------------|
| **AA‑SlackBot** | Real‑time Slack interaction for workflow queries, status checks, and issue triage. | `SlackBot(v:2.3) → {msg: string, action: enum[QUERY, TRIGGER, ESCALATE]}`<br>All inbound messages must match `SlackMessageSchema` (user‑id, channel‑id, timestamp, text ≤ 4000 chars). | Unit test `msg_schema_test`, integration test against a sandbox workspace, contract‑test `slackbot_contract.yaml`. |
| **AA‑WebPortal** | Self‑service web UI (React + Typescript) for creating, reviewing, and approving workflow PRs. | `WebPortal(v:2.3) → {view: enum[Dashboard, PR, Review, Logs], payload: JSON}`<br>All API calls must conform to `OpenAPI 3.1` spec `ricky‑api.yaml`. | End‑to‑end Cypress suite (≥ 120 tests), OpenAPI validator, accessibility audit (WCAG 2.1 AA). |
| **AA‑Orchestrator** | Central dispatcher that normalises Slack and Web requests into **AA‑Commands**. | `Orchestrator(v:2.3) → {cmd: Command, ctx: Context}`<br>`Command` must be one of `{CreatePR
...[truncated]

Must

Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
Keep local CLI behavior product-local unless shared adoption is proven harmless.
Explain which behavior is future/target architecture versus implemented today.

Must Not

Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
Let future surface abstractions distort the current CLI contract.
Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?

Step	Description	Outcome
1️⃣	Parse arguments – Ricky reads `--mode local` and the free‑form `--spec` string.	The spec is stored as the desired workflow contract but no execution is triggered.
2️⃣	Spec‑only mode – In the absence of `--run`, Ricky enters validation‑only mode.	It treats the request as a design‑time operation: generate, lint, and verify the workflow definition, then stop.
3️⃣	Workflow generation – Ricky invokes the Workflow Generator component, which translates the natural‑language spec into a concrete YAML/JSON workflow contract following the AGENTS / AGENTWORKFORCE schema.	You receive a file (e.g., `package-checks.workflow.yaml`) containing the full pipeline definition (steps, inputs, outputs, verification gates, etc.).
4️⃣	Static analysis & linting – The generated contract is run through the Workflow Linter and Schema Validator.	Any syntax errors, missing required fields, or policy violations are reported as diagnostics
...[truncated]


**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, copy‑and‑paste ready onboarding workflow that a new user should run the first time they invoke the `ricky` CLI. It follows the **Ricky repository conventions** (AGENTS.md, workflow standards, shared authoring rules) and includes:

* **Deterministic verification gates** – each step exits with a clear status code and logs a JSON‑structured result.
* **Review artifacts** – a generated `ricky‑onboard‑report.json` summarises success/failure of every gate.
* **80‑to‑100 validation loops** – each gate is retried up to 3 times with exponential back‑off; the overall onboarding loop runs until all gates pass or a hard blocker is hit.
* **Honest blocker reporting** – any non‑recoverable error aborts the run and prints a concise “BLOCKER” message with a ticket reference.
* **Scoped branch/PR boundaries** – the onboarding script lives in its own branch (`onboard/first‑run`) and opens a PR automatically once the user completes the flow, so the team can audit the environment snapshot.

---

## 1. Repository Layout (first‑run branch)

ricky/
├─ .github/
│ └─ workflows/
│ └─ onboarding.yml # CI gate for onboarding PRs

...[truncated]


**Must**

- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.

**Must Not**

- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.recovery-guidance-no-stack-traces</code> (cli-behavior/manual)</summary>

**User message**

> A user gives Ricky an empty spec or a missing spec file.

**Ricky output**

```text
**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the expected path (`/specs/<workflow‑name>.yaml`). | If the file does not exist, raise a **Blocker**: `SPEC_NOT_FOUND`. |
| **1.2** | If the file exists but is empty or contains only comments/whitespace, raise a **Blocker**: `SPEC_EMPTY`. | Include the file path and a short diagnostic snippet. |
| **1.3** | Post the blocker to the **#workflow‑issues** channel and tag the workflow owner (`@owner`). | Example: `🚨 BLOCKER: SPEC_EMPTY – /specs/data‑ingest.yaml is empty. @owner, please provide a valid spec.` |

---

### 2. Automated Placeholder Generation (Safety Net)

When a blocker is raised, Ricky automatically creates a **minimal placeholder spec** to keep the repository in a build‑able state.

```yaml
# /specs/<workflow‑name>.yaml  (auto‑generated placeholder)
name: <workflow‑name>
description: "Placeholder – awaiting full specification."
version: 0.0.1
steps: []
inputs: {}
outputs: {}

The placeholder is committed on a dedicated branch: `ricky/placeholder
...[truncated]


**Must**

- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.

**Must Not**

- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.status-does-not-invent-provider-state</code> (cli-behavior/manual)</summary>

**User message**

> Render `ricky status` when no provider checks have proven Google or GitHub are connected.

**Ricky output**

```text
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Configuration
• Active workflow contracts : 0
• Pending PRs / branches : none

⚙️ System Health
• Scheduler : OK
• Queue length : 0
• Last heartbeat :

🚨 Blockers
• Google integration not confirmed
• GitHub integration not confirmed
• Action required: run provider health checks or re‑authenticate credentials.

📝 Next Steps

Execute ricky provider check google and ricky provider check github.
Resolve any authentication or network errors reported.
Re‑run ricky status to confirm connectivity.

--- End of Report ---

Must

Report unknown or not-connected provider state honestly.
Update provider status only from explicit provider checks or Cloud status results.
Give concrete setup guidance for Cloud when relevant.

Must Not

Mark Google or GitHub connected because guidance text was shown.
Invent a provider connection URL or OAuth flow.
Show empty fields with no recovery guidance when config is missing.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**  
*Purpose*: Apply a small, self‑contained change to a repository (e.g., typo fix, comment update, CI config tweak) while guaranteeing traceability, deterministic verification, and safe merge.

| Phase | Step | Owner | Input | Output | Verification Gate |
|-------|------|-------|-------|--------|--------------------|
| **1️⃣ Planning** | 1.1 Create a scoped branch | **Ricky – Branch Manager** | Issue/Task ID, target repo, base branch | `feature/<issue‑id>-<short‑desc>` | Branch name matches regex `^feature\/[0-9]+-[a-z0-9-]+$` |
| | 1.2 Draft change spec | **Ricky – Spec Author** | Issue description, file paths, expected diff | `CHANGE_SPEC.md` (YAML) | Schema validation against `spec-schema.yaml` |
| | 1.3 Peer review of spec | **Ricky – Reviewer** | `CHANGE_SPEC.md` | Approved spec (GitHub PR comment “spec‑approved”) | All required reviewers sign‑off |
| **2️⃣ Development** | 2.1 Checkout branch & apply change | **Ricky – Editor** | Scoped branch, `CHANGE_SPEC.md` | Local commit(s) | Lint & format checks pass (`npm run lint`) |
| | 2.2 Add unit/CI test (if applicable) | **Ricky – Tester** | Code change | New/updated test files | `npm test` 
...[truncated]

Must

Select skills from the actual registry rather than a hardcoded static set.
Record ranked skill matches with confidence and trigger evidence.
Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

Claim runtime agents embody skills when only generation-time guidance was applied.
Fail generation solely because optional skill files are missing.
Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Goal** | Produce production‑ready code from a high‑level feature description, using Claude for design/logic review and Codex for code generation. |
| **Scope** | • Feature description (Markdown) <br>• Design review (Claude) <br>• Implementation (Codex) <br>• Automated tests & static analysis <br>• Pull‑request (PR) creation in the target repo. |
| **Branch Model** | `feature/<ticket‑id>-claude‑codex` created from `develop`. All changes confined to this branch; PR targets `develop`. |
| **Roles** | **Reviewer** – Claude (LLM) <br>**Implementer** – Codex (LLM) <br>**Gatekeeper** – CI job that runs verification loops. |
| **Artifacts** | 1. `design.md` – Claude’s review output (rationale, edge‑case list, API contract). <br>2. `implementation/` – generated source files. <br>3. `tests/` – unit/integration tests (auto‑generated + human‑augmented). <br>4. `review-report.json` – deterministic summary of Claude’s comments and Codex compliance flags. |
| **Deterministic Verification Gates** | 1. **Design‑Gate** – CI parses `design.md` for required sections (overview, data model
...[truncated]

Must

Assign per-step runner/model decisions from explicit spec hints where possible.
Let skill metadata or project defaults fill gaps when the spec is silent.
Write tool-selection.json or equivalent audit metadata.

Must Not

Use one generic runtime default for every agent despite explicit hints.
Let skill defaults override explicit user/spec runner hints.
Omit the reason each tool or model was chosen.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

## Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`task.acceptanceGate`** fields for every step. The goal is to make the intent crystal‑clear, the success criteria machine‑verifiable, and the hand‑off points unambiguous for reviewers and CI checks.

Below is the end‑to‑end contract you should follow when invoking `--refine`, plus a concrete example and the verification checklist you must run before committing the refined workflow.

---

### 1. Invocation Contract  

| Element | Requirement | Example |
|---------|-------------|---------|
| **Input file** | Path to a *valid* workflow definition (`.json` or `.yaml`). Must pass `workflow validate` before refinement. | `workflow.yaml` |
| **Flag** | `--refine` (no additional arguments). | `workflow validate --refine workflow.yaml` |
| **Output** | New file with suffix `.refined.{json|yaml}` placed next to the source. | `workflow.refined.yaml` |
| **Exit code** | `0` on success, non‑zero on any validation or refinement error. | — |
| **Determinism** | The same input must always pr
...[truncated]

Must

Keep the deterministic unrefined path as the default.
Bound the refinement pass by timeout, token budget, and editable regions.
Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

Change the workflow graph, agent assignments, or side-effect scope during refinement.
Fail the whole generation if optional refinement times out.
Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

1. Returns the current Ricky version string (e.g., `ricky 0.9.3`).
2. Exposes the version via a Go constant and a `--version` flag in the root command.
3. Is covered by unit tests, integration tests, and CI verification gates.
4. Is delivered in a scoped branch/PR with full review artifacts.

---

## 2. Scope & Boundaries
| Item | Inclusion | Exclusion |
|------|-----------|-----------|
| **Code** | `cmd/root.go`, `internal/version/version.go`, test files, CI config updates | Unrelated commands, documentation beyond `README.md` |
| **Docs** | `README.md` version badge, `CHANGELOG.md` entry | Full user guide overhaul |
| **CI** | New test matrix for version flag, deterministic gate in GitHub Actions | Full pipeline redesign |

---

## 3. Contract (Specification)

| Contract ID | Description | Acceptance Criteria |
|-------------|-------------|----------------------|
| **V001** | Expose version constant | `internal/version.Version` is a `string` constant matching `git tag` (sem
...[truncated]

Must

Build gates from the stated acceptance behavior, such as checking ricky --version output.
Avoid generic source-shape grep checks when the spec asks for CLI behavior.
Keep generated validation meaningful for the current repo shape.

Must Not

Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
Claim the workflow is proven by source syntax alone.
Ignore the package-json version resolution order in the spec.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation & Auto‑Fix Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, config bundles, code modules) through a deterministic validation → optional fix → re‑validation cycle until each artifact reaches a **PASS** state or is marked **BLOCKED** after a configurable retry limit.

---

## 1. Scope & Boundaries
| Item | Description |
|------|-------------|
| **Branch** | `feature/batch‑artifact‑validation` (isolated from main) |
| **PR Target** | `main` (requires CI gate & reviewer sign‑off) |
| **Artifacts** | Any files under `artifacts/` (flat or nested). Each artifact is self‑contained; no cross‑artifact dependencies. |
| **Retry Limit** | Default **3** passes per artifact (configurable via `workflow.yaml`) |
| **Owner** | `@team‑data‑ops` (reviewer) |
| **Blocker Policy** | If an artifact exceeds retry limit or fails a non‑recoverable gate, it is marked **BLOCKED** and a ticket is auto‑created. |

---

## 2. Contract (workflow.yaml)

```yaml
name: batch-artifact-validation
on:
  push:
    paths:
      - "artifacts/**"
  workflow_dispatch:
    inputs:
      retry_limit:
        description: "Maximum fix‑retry attempts pe
...[truncated]

Must

Choose dag, supervisor, or pipeline deliberately based on the work shape.
Explain the pattern choice in artifact metadata or a rationale.
Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

Default blindly to dag for every workflow.
Collapse independent artifact work into one vague agent task.
Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a step‑by‑step illustration of the **five core Ricky authoring skills** and the concrete artifacts they produce in a typical workflow (e.g., “On‑Call Incident Triage”).  Each skill injects a deterministic contract, a verification gate, or a review artifact that the runtime engine (Ricky‑Engine) uses to guarantee reliability, traceability, and rapid iteration.

| # | Skill | What the author does | Concrete Output in the workflow | Effect on the generated workflow |
|---|-------|----------------------|--------------------------------|-----------------------------------|
| 1 | **Scope Definition & Boundary Contract** | Writes a **`workflow.yaml`** header that declares *inputs*, *outputs*, *version*, and *branch/PR scope*. | ```yaml\nname: oncall‑triage\nversion: 1.3.0\nscope: branch=triage‑v1.3\ninputs:\n  - incident_id: string\noutputs:\n  - resolution_summary: string\n``` | The engine validates that every PR touching this workflow lives on the declared branch.  Any out‑of‑scope change is blocked early, preventing accidental cross‑team contamination. |
| 2 | **Deterministic Step Contracts** | For ea
...[truncated]

Must

Describe skills as generation-time selection, loading, and rendering inputs.
Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
Say the workflow instructions were informed by selected skills.

Must Not

Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
Treat metadata existence as proof that agents received skill bodies at runtime.
Overstate current tests beyond the generation boundary.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a **complete, ready‑to‑commit workflow** that turns a vague product specification into a concrete, test‑driven development pipeline.  
The workflow follows the AgentWorkforce standards for **deterministic contracts, verification gates, review artifacts, and iterative validation loops**.  It is scoped to a single feature branch (`feature/vague‑spec‑implementation`) and includes a pull‑request template that enforces the required 80‑100 validation loops before merge.

---

## 1️⃣ High‑Level Overview

| Stage | Purpose | Primary Artifact |
|-------|---------|------------------|
| **1️⃣ Ingest Spec** | Parse the vague spec into a **Spec‑Model** (JSON) | `spec-model.json` |
| **2️⃣ Derive Requirements** | Generate concrete functional & non‑functional requirements from the model | `requirements.md` |
| **3️⃣ Design Contract** | Produce a **Component Contract** (input/output schema, invariants, performance SLAs) | `contract.yaml` |
| **4️⃣ Scaffold Code** | Auto‑generate skeleton code + test stubs based on the contract | `src/`, `tests/` |
| **5️⃣ Implement & Unit‑Test** | Developer fills implementation; run deterministic unit tests | `pytest` results |
| **6️⃣ Verification Gate**
...[truncated]

Must

Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
Validate with dry-run or targeted structural checks where possible.
Return artifacts, warnings, and follow-up commands honestly.

Must Not

Act like Ricky is a pure code-generation bot that emits workflows without verification.
Stop at "code compiles" as the proof bar.
Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s next actions after a step‑timeout + orphaned worker**

| Phase | Action | Why it matters | Deliverable / Artifact |
|-------|--------|----------------|------------------------|
| 1️⃣ Detect & Log | • Pull the run‑log from the orchestration service (e.g., Temporal, Airflow). <br>• Record the exact timestamp, step ID, and timeout value that was exceeded. <br>• Capture the heartbeat / liveness record of the worker that never emitted an artifact. | Provides an immutable audit trail and the data needed for deterministic verification. | `run‑log‑<run‑id>.json` + `worker‑heartbeat‑<worker‑id>.log` |
| 2️⃣ Declare a **Blocker** | • Open a **Blocker Issue** in the workflow repo (e.g., `BLOCKER‑<run‑id>`). <br>• Tag the issue with `type:timeout`, `type:orphan‑worker`, and `severity:high`. <br>• Link the logs from step 1. | Guarantees visibility for all stakeholders and prevents downstream merges until the problem is resolved. | GitHub Issue `BLOCKER‑<run‑id>` |
| 3️⃣ Isolate the Faulty Branch | • Identify the branch that introduced the step (usually the PR that merged the step’s code). <br>• Create a **fault‑isolation branch** off `main` named `fix/timeout‑<run‑id>`. | Keeps the re
...[truncated]

Must

Classify the failure before deciding whether to retry.
Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
Preserve raw evidence and uncertainty when the class is not known.

Must Not

Blindly rerun the whole workflow without a blocker classification.
Treat every failure as a broken workflow definition.
Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow: Detect and Guard Against Stale Relay State Before a Local Run**  

| **Stage** | **Purpose** | **Inputs** | **Outputs** | **Verification Gate** | **Owner** |
|-----------|-------------|------------|-------------|-----------------------|-----------|
| **1️⃣ Scan Workspace** | Locate any stale state directories. | Project root (or user‑specified path). | List of found directories (`.agent‑relay/`, `.relay/`, `.trajectories/`). | ‑ File‑system walk must complete without error. | `ricky‑scanner` |
| **2️⃣ Compute Freshness Hash** | Determine if the contents are newer than the last successful workflow run. | Directory listings + `workflow‑metadata.json` (stores last‑run timestamp & hash). | Boolean `isStale`. | ‑ Hash comparison must be deterministic; mismatched hash ⇒ `isStale = true`. | `ricky‑hasher` |
| **3️⃣ Prompt / Auto‑Clean** | Decide remediation path. | `isStale`. | User decision (`continue`, `clean`, `abort`). | ‑ If `abort` chosen, workflow stops with clear blocker report. | `ricky‑prompt` |
| **4️⃣ Clean (optional)** | Remove stale state safely. | Decision `clean`. | Clean workspace (directories removed). | ‑ Post‑clean `fs.existsSync` must return false for each
...[truncated]

Must

Classify stale local runtime state as an environment contamination issue.
Recommend quarantine or isolated-run guidance before launch.
Record the observed paths and the action taken or recommended.

Must Not

Treat stale runtime state as a workflow logic failure.
Delete or overwrite state without an explicit safe path or user intent.
Continue into execution as if the workspace were clean.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**How to resolve “another Ricky or Relay run is already active in this workspace”**

The marker that blocks a new run is a simple lock file that Ricky creates at the start of every workflow execution. If the file is left behind (e.g., a crash, an orphaned process, or a stale CI container), the next invocation will abort with the message you’re seeing.

Below is the deterministic, step‑by‑step repair contract you can follow. Each step has an explicit verification gate so you can be sure the workspace is clean before you re‑run.

---

## 1️⃣ Verify the current lock state

```bash
# From the root of the repository
ricky status

Expected output: No active Ricky/Relay run detected.
If you still see “active run”, proceed to step 2.

2️⃣ Locate the lock file

Ricky stores the lock under the hidden .ricky directory:

ls -l .ricky/run.lock

Expected output: a file exists with a PID and timestamp, e.g.:

-rw------- 1 you you 64 Sep 12 14:23 .ricky/run.lock

If the file does not exist, the error is coming from a stale CI artifact—skip to step 5.

3️⃣ Inspect the lock contents

cat .ricky/run.lock

You should see something like:
...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.env-loader-injection-runtime-loadable</code> (runtime-recovery/manual)</summary>

**User message**

> A workflow artifact references a `MISSING_ENV_VAR` value. Ricky's deterministic auto-fix injects the `.env.local` / `.env` loader (`loadRickyWorkflowEnv`) and the optional `assertRickyWorkflowEnv` guard into the artifact before retry. The artifact may be a master-rendered workflow whose `.step({ command: ... })` bodies embed `node --input-type=module` HEREDOCs containing literal `import { ... } from 'node:fs'` / `from 'node:path'` strings.

**Ricky output**

```text
**Deterministic auto‑fix for missing environment variables**

Below is the exact patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment‑loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is parsed **once** before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched**, even when they contain HEREDOC‑style `node --input-type=module` blocks that embed literal `import { … } from 'node:fs'` / `from 'node:path'` strings.

---

## 1. Helper utilities (add to `src/rickyEnv.ts`)

```ts
// src/rickyEnv.ts
import { config as loadDotenv } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load .env.local if present, otherwise fall back to .env.
 * Returns the parsed environment object.
 */
export function loadRickyWorkflowEnv(): NodeJS.ProcessEnv {
  const cwd = process.cwd();
  const envFiles = ['.env.local', '.env'].map
...[truncated]

Must

Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, contract‑driven procedure you can copy‑paste into your terminal (or CI step) to run the workflow, trigger the auto‑fix path on the first failure, and verify the repaired artifact. It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Preconditions (contract)

| Item | Requirement |
|------|--------------|
| **CLI** | `aw` ≥ 2.4.0, on PATH |
| **Repo** | Clean working tree (no uncommitted changes) |
| **Workflow definition** | `workflow.yaml` present at repo root |
| **Failed step** | `FAILED_STEP` (e.g., `build-image`) |
| **Previous run ID** | `PREV_RUN_ID` (e.g., `run‑20240512‑a1b2c3`) |
| **Auto‑fix flag** | `--auto-fix` supported by the workflow runner |
| **Verification tools** | `jq` (for JSON parsing) and `git` |

If any of the above is missing, abort and report the blocker.

---

## 2️⃣ Execution contract

```bash
# 1. Capture inputs (replace placeholders)
FAILED_STEP="build-image"                # <-- step that failed
PREV_RUN_ID="run-20240512-a1b2c3"        # <-- previous successful run ID
WORKFLOW_FILE="workflow.ya
...[truncated]

Must

Use a bounded retry budget and summarize every attempt.
Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
Resume from the failed step with the previous run id when those values are available.

Must Not

Edit arbitrary repository source files as the default auto-fix surface.
Keep retrying after the configured max attempts.
Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Immediate troubleshooting checklist**

| Step | Action | Expected outcome |
|------|--------|-------------------|
| 1️⃣ | **Collect the run log**<br>`ricky logs workflows/foo.ts --last` | A full JSON‑LDB log file is printed. Save it as `foo.run.log`. |
| 2️⃣ | **Locate the failure node**<br>Search the log for `"status":"failed"` or `"error":` | Identify the exact step (e.g., `ValidateContract`, `RunTask`, `PublishArtifact`). |
| 3️⃣ | **Check deterministic gate output**<br>Open the gate report: `ricky gate-report workflows/foo.ts --run-id <id>` | The gate report lists which verification gates passed/failed and why. |
| 4️⃣ | **Inspect the contract**<br>`cat workflows/foo.contract.json` | Verify that the contract matches the workflow’s inputs/outputs, required scopes, and version constraints. |
| 5️⃣ | **Run the workflow locally in “dry‑run” mode**<br>`ricky run workflows/foo.ts --dry-run` | Confirms that the workflow can be parsed and that all static checks succeed. |
| 6️⃣ | **Re‑run with auto‑fix disabled** (already done) – note that any auto‑fixable gate failures will now be reported as hard errors. | Confirms that the failure is not auto‑fixable. |

---

## 1. Typical failure
...[truncated]

Must

Preserve one-attempt behavior when auto-fix is disabled.
Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
Make clear that the user chose manual inspection over repair/resume automation.

Must Not

Start a repair loop despite --no-auto-fix.
Suppress the diagnosis because no repair was attempted.
Present the failure as a completed repair attempt.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky – Executing a Local TypeScript Workflow Artifact**

Below is the concrete, repeatable contract Ricky should follow to run a TypeScript‑based workflow that lives in the repository’s *primary local path* (by default `./workflows`). The steps are written as a deterministic “execution plan” that can be copied into a branch‑specific PR and re‑used by any CI runner or developer workstation.

---

## 1️⃣ Preconditions (Verification Gate #1)

| Check | Expected State | How Ricky Verifies |
|------|----------------|--------------------|
| **Node runtime** | `>=18.0.0` installed and on `$PATH` | `node -v` → match regex `^v(1[89]|2)\.` |
| **Package manager** | `npm` (or `pnpm`/`yarn`) available | `npm -v` (or equivalent) |
| **TypeScript compiler** | `tsc` reachable (installed locally) | `npx tsc --version` |
| **Workflow source** | `./workflows/<workflow‑name>.ts` exists and is **not** ignored by `.gitignore` | `test -f ./workflows/<workflow‑name>.ts` |
| **Dependency lock** | `package-lock.json` (or `pnpm-lock.yaml`/`yarn.lock`) is present and up‑to‑date | `git diff --quiet package-lock.*` |
| **Execution sandbox** | No lingering `node_modules` from a different workflow version | `
...[truncated]

Must

Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
Record the actual spawn command in execution evidence.

Must Not

Fail solely because agent-relay is not on PATH when the SDK route is available.
Hide the actual runtime command from evidence.
Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Workflow for Persistent Structural Failures**

When a structural failure in an AgentWorkforce workflow survives an initial remediation attempt, Ricky follows a deterministic, contract‑driven process that isolates the problem, escalates it cleanly, and prevents downstream contamination. The steps below constitute the **“Persistent‑Failure Boundary”** contract and are to be implemented as a reusable sub‑workflow in every repository that adheres to the AGENTS.md conventions.

---

## 1. Contract Preconditions

| Condition | Description |
|-----------|-------------|
| **F‑01** | A structural failure (e.g., broken DAG, missing dependency, schema drift) has been detected by the **Integrity‑Check** gate. |
| **F‑02** | An automated or manual **Fix‑Attempt** (patch, config change, refactor) has been applied and merged to a *feature* branch. |
| **F‑03** | The **Post‑Fix Validation** gate (unit, integration, contract tests) reports **FAIL** for the same failure signature. |
| **F‑04** | No other open PR touches the same failure‑prone component. |

If any precondition is false, the boundary workflow aborts and returns control to the normal CI pipeline.

---

## 2.
...[truncated]

Must

Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
Distinguish escalation from a generic product failure.
Preserve enough context for a human operator to continue.

Must Not

Retry speculative fixes indefinitely.
Collapse to "something went wrong" without the attempted actions.
Discard evidence from failed repair attempts.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Compiled from Recent Runs**  
*(Generated on 2026‑05‑15, covering the last 7 days of execution across all active Ricky pipelines)*  

---

## 1. Executive Summary
| Metric | Value | Trend (7 d) |
|--------|-------|-------------|
| Total Runs | **1 842** | ↗ +8 % |
| Successful Runs | **1 617** | ↗ +10 % |
| Failed Runs | **225** | ↘ ‑12 % |
| Blocked Runs* | **0** | — |
| Avg. Runtime (successful) | **3 m 27 s** | ↔ |
| Avg. Runtime (failed) | **4 m 12 s** | ↔ |
| Deterministic‑Gate Pass Rate | **99.3 %** | ↗ +0.4 % |
| Review‑Artifact Completion | **98.7 %** | ↗ +0.6 % |
| Validation‑Loop Passes (80‑100) | **94 %** | ↗ +1.2 % |

\*A *blocked run* is one that halted at a deterministic verification gate awaiting external input. No blockers were observed this week.

---

## 2. Run‑Level Health Breakdown

| Workflow | #Runs | Success % | Avg. Runtime | Gate Failures | Review Gaps | Validation Loop Pass % |
|----------|------|-----------|--------------|---------------|------------|------------------------|
| **Deploy‑Prod** | 312 | 96.8 % | 4 m 02 s | 3 (0.96 %) | 2 (0.64 %) | 92 % |
| **Sync‑Catalog** | 428 | 98.1 % | 2 m 45 s | 1 (0.23 %) | 0 | 96 % |
...[truncated]

Must

Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

Mutate evidence while analyzing it.
Return generic advice like "improve your workflow" without references.
Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Artifact Contract – GitHub‑PR Workflow (using `@agent‑relay/github‑primitive`)**

| # | Contract Element | Requirement | How to Verify |
|---|------------------|-------------|---------------|
| 1 | **Module Imports** | The workflow file must import **exactly** the `createGitHubStep` symbol from `@agent‑relay/github‑primitive` and no other symbols from that package. | ‑ Static lint check (`import { createGitHubStep } from '@agent-relay/github-primitive'`). <br>‑ `npm ls @agent-relay/github-primitive` must resolve to a single version. |
| 2 | **Step Definition** | A single step named **`openPR`** must be created with `createGitHubStep`. The step’s `action` field must be `"pull_request.create"` and must include the required payload keys: `title`, `head`, `base`, `body`. | ‑ Unit test that imports the workflow and asserts `steps.openPR.action === 'pull_request.create'`. <br>‑ Schema validation of the step payload (JSON‑Schema). |
| 3 | **Input Validation** | All inputs used to populate the PR payload (`title`, `head`, `base`, `body`) must be **derived from workflow inputs** and must be validated with a deterministic guard (`if (!title) throw …`). | ‑ Runtime‑precheck script 
...[truncated]

Must

Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

PASS status: case executed
PASS maxToolCalls: expected <= 0 tool calls, got 0

REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow directly from a channel or DM. The surface is built as a **modal‑driven UI** with deterministic contracts and verification gates that keep the workflow state immutable until a successful commit.

---

### 1. High‑Level Interaction Flow  

| Step | Slack UI Element | Action | Result |
|------|------------------|--------|--------|
| 1 | `/ricky debug <workflow‑id>` (slash command) | Launches a **debugger modal** pre‑populated with the selected workflow’s latest run metadata. | Shows run summary, status, and a **“Load Execution Tree”** button. |
| 2 | **Execution Tree** button | Calls `GET /debug/tree?run_id=…` (Ricky API) → returns a **deterministic DAG** of steps with timestamps, inputs, outputs, and error flags. | Renders a **collapsible tree view** inside the modal (blocks with `section` + `context`). |
| 3 | Click on any step node | Opens a **step‑detail sub‑modal** (lazy‑loaded) showing: <br>• Input payload (JSON viewer) <br>• Output payload (if completed) <br>• Logs (tail of step log) <br>• “Re‑run”, “Skip”, “Patc
...[truncated]

Must

Treat Slack as a co-equal product surface, not a wrapper around CLI.
Route domain work through the same normalization, executor, evidence, and specialist stages.
Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

Put workflow gen

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

fix: avoid optional env preflight blockers

c0f08e0

cubic-dev-ai Bot reviewed May 15, 2026

View reviewed changes

devin-ai-integration Bot reviewed May 15, 2026

View reviewed changes

kjgbot merged commit f500518 into main May 15, 2026
5 checks passed

kjgbot deleted the fix/optional-env-preflight-autofix branch May 15, 2026 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid optional env preflight blockers#118

fix: avoid optional env preflight blockers#118
kjgbot merged 1 commit into
mainfrom
fix/optional-env-preflight-autofix

kjgbot commented May 15, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Possibly Related PRs

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 15, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

2️⃣ Locate the lock file

3️⃣ Inspect the lock contents

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kjgbot commented May 15, 2026

Summary

Verification

Known unrelated check

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly Related PRs

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Ricky Eval Review

Human Review Cases

2️⃣ Locate the lock file

3️⃣ Inspect the lock contents

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 15, 2026 •

edited

Loading