Skip to content

fix(generation): validate GitHub primitive step shape#148

Merged
khaliqgant merged 3 commits into
mainfrom
codex/fix-issue-144
Jun 1, 2026
Merged

fix(generation): validate GitHub primitive step shape#148
khaliqgant merged 3 commits into
mainfrom
codex/fix-issue-144

Conversation

@khaliqgant

Copy link
Copy Markdown
Member

Summary

  • add AST-based validation for createGitHubStep configs so malformed PR-shipping workflows fail generation with INVALID_GITHUB_STEP
  • recognize SDK-root GitHub primitive imports and reject missing name/action, unsupported actions, and stray command fields
  • classify existing malformed GitHub primitive startup failures as INVALID_GITHUB_STEP instead of generic INVALID_ARTIFACT
  • sync Ricky's vendored workflow-writing skill guidance with the required name/action shape

Related

Verification

  • npm run typecheck
  • vitest run src/product/generation/workforce-persona-writer.test.ts --testNamePattern "detectSpecIntentMismatch|stripGlobalGithubExecutorForMixedWorkflow"
  • vitest run src/runtime/failure/classifier.test.ts --testNamePattern "debugger-facing classification contract"
  • vitest run src/local/entrypoint.test.ts --testNamePattern "malformed GitHub primitive"
  • vitest run src/product/generation/pipeline.test.ts

@coderabbitai

coderabbitai Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@agent-relay-bot[bot], we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 36 minutes and 1 second. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 631af88a-bfee-4fba-80b0-f9aca3d5d036

📥 Commits

Reviewing files that changed from the base of the PR and between 699d90f and 3081a72.

📒 Files selected for processing (11)
  • .agents/skills/writing-agent-relay-workflows/SKILL.md
  • personas/agent-relay-workflow.json
  • src/local/entrypoint.test.ts
  • src/local/entrypoint.ts
  • src/product/generation/pipeline.ts
  • src/product/generation/workforce-persona-repairer.test.ts
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts
  • src/runtime/failure/classifier.test.ts
  • src/runtime/failure/classifier.ts
  • src/runtime/failure/types.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-issue-144

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces validation and error classification for malformed GitHub primitive steps (createGitHubStep) in workflows. It adds a new INVALID_GITHUB_STEP blocker code, implements static analysis to flag invalid configurations before startup, and updates relevant documentation, tests, and failure classification logic. The code review feedback suggests two improvements: stripping ANSI escape codes in matchesWorkflowInvalidPattern to ensure robust regex matching against formatted terminal outputs, and optimizing matchesInvalidGithubStepConfig by testing patterns on the entire stripped text directly instead of splitting it by lines.

Comment on lines +859 to +861
function matchesWorkflowInvalidPattern(text: string): boolean {
return GITHUB_STEP_CONFIG_ERROR_PATTERNS.some((pattern) => pattern.test(text));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The matchesWorkflowInvalidPattern function does not strip ANSI escape codes before testing the regex patterns. Since terminal and log outputs frequently contain ANSI escape sequences (such as colors and formatting), the word boundaries (\b) in GITHUB_STEP_CONFIG_ERROR_PATTERNS will fail to match correctly.

To ensure robust matching, strip ANSI escape codes from the input text before running the regex tests.

Suggested change
function matchesWorkflowInvalidPattern(text: string): boolean {
return GITHUB_STEP_CONFIG_ERROR_PATTERNS.some((pattern) => pattern.test(text));
}
function matchesWorkflowInvalidPattern(text: string): boolean {
const cleanText = text.replace(/\u001B\[[0-9;]*m/g, '');
return GITHUB_STEP_CONFIG_ERROR_PATTERNS.some((pattern) => pattern.test(cleanText));
}

Comment thread src/local/entrypoint.ts
Comment on lines +2638 to +2652
function matchesInvalidGithubStepConfig(text: string): boolean {
const patterns = [
/\bGitHub step requires a non-empty name\b/i,
/\bGitHub step "[^"]*" requires an action name\b/i,
/\bGitHub step "[^"]*" uses unsupported action\b/i,
/\bGitHub step "[^"]*" params must be an object\b/i,
/\bGitHub repo must be in owner\/repo format\b/i,
/\bGitHub repo object requires owner and repo\b/i,
];
for (const rawLine of text.split(/\r?\n/)) {
const line = stripAnsi(rawLine);
if (patterns.some((pattern) => pattern.test(line))) return true;
}
return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The matchesInvalidGithubStepConfig function currently splits the input text by lines, loops over each line, and strips ANSI codes on each line before testing the patterns. Since none of the patterns use line-start (^) or line-end ($) anchors, splitting the string and looping is unnecessary and inefficient, especially for large log outputs.

We can simply strip ANSI codes from the entire text once and test the patterns on the whole string directly.

function matchesInvalidGithubStepConfig(text: string): boolean {
  const cleanText = stripAnsi(text);
  const patterns = [
    /\bGitHub step requires a non-empty name\b/i,
    /\bGitHub step "[^"]*" requires an action name\b/i,
    /\bGitHub step "[^"]*" uses unsupported action\b/i,
    /\bGitHub step "[^"]*" params must be an object\b/i,
    /\bGitHub repo must be in owner\/repo format\b/i,
    /\bGitHub repo object requires owner and repo\b/i,
  ];
  return patterns.some((pattern) => pattern.test(cleanText));
}

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 10 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/runtime/failure/classifier.ts">

<violation number="1" location="src/runtime/failure/classifier.ts:364">
P2: `classifyFromPlainSummary` gives environment errors higher priority than workflow-invalid, unlike the other classifier paths, which can misclassify mixed messages.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

detected.push(FailureClass.EnvironmentError);
}

if (matchesWorkflowInvalidPattern(text)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: classifyFromPlainSummary gives environment errors higher priority than workflow-invalid, unlike the other classifier paths, which can misclassify mixed messages.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/runtime/failure/classifier.ts, line 364:

<comment>`classifyFromPlainSummary` gives environment errors higher priority than workflow-invalid, unlike the other classifier paths, which can misclassify mixed messages.</comment>

<file context>
@@ -333,6 +361,11 @@ function classifyFromPlainSummary(summaryText: PlainValidationSummary): FailureC
     detected.push(FailureClass.EnvironmentError);
   }
 
+  if (matchesWorkflowInvalidPattern(text)) {
+    signals.push(plainSignal(`Plain summary indicates workflow artifact configuration error: ${truncate(text, 120)}`, Confidence.High));
+    detected.push(FailureClass.WorkflowInvalid);
</file context>

Comment thread src/local/entrypoint.ts
@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer applied fixes — committed and pushed 9e850dc to this PR. The notes below describe what changed.

Reviewed and fixed PR #148 locally.

Changes made:

  • Expanded GitHub primitive validation to catch id, command, and separate top-level owner fields in createGitHubStep configs.
  • Added support for import { github } from '@agent-relay/sdk' with github.createGitHubStep(...).
  • Broadened runtime/local failure classification for actual primitive errors like requires an action, invalid JSON params, repo format errors, and unsupported actions.
  • Updated bundled workflow persona prompts so generated workflows use valid createGitHubStep config shape up front.

Verification passed:

  • npm run typecheck
  • npm test
    61 passed, 1 skipped
  • npm run build

@agent-relay-code agent-relay-code Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr-reviewer applied fixes — committed and pushed 9e850dc to this PR. The notes below describe what changed.

Reviewed and fixed PR #148 locally.

Changes made:

  • Expanded GitHub primitive validation to catch id, command, and separate top-level owner fields in createGitHubStep configs.
  • Added support for import { github } from '@agent-relay/sdk' with github.createGitHubStep(...).
  • Broadened runtime/local failure classification for actual primitive errors like requires an action, invalid JSON params, repo format errors, and unsupported actions.
  • Updated bundled workflow persona prompts so generated workflows use valid createGitHubStep config shape up front.

Verification passed:

  • npm run typecheck
  • npm test
    61 passed, 1 skipped
  • npm run build

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-06-01T18-13-43-583Z-HEAD-provider
Mode: provider
Git SHA: f260be7

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0 | Provider infra skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic change package that brings the repository into alignment with the new product spec: *Ricky now leverages the Agent Assistant (AA) for all high‑level reasoning, context stitching, and plan synthesis.*  

The package follows **Ricky repository conventions** (AGENTS.md), the **workflow standards** (contract → implementation → verification → review), and the **shared authoring rules** (atomic PRs, scoped branches, 80‑100 validation loops).

---

## 1. Scope & Branch Layout
| Item | Value |
|------|-------|
| **Branch name** | `feature/aa‑deep‑integration` |
| **PR title** | `feat(ricky): embed Agent Assistant for all reasoning & planning` |
| **Scope** | Docs (`README.md`, `docs/architecture.md`, `docs/usage.md`), core module (`ricky/core.py`), test suite (`tests/test_core_aa.py`). No UI changes. |
| **Owner** | `@ricky-maintainer` |
| **Reviewers** | `@agent-assistant-team`, `@doc-team` |

---

## 2. Contract (Change Specification)

| Contract ID | Description |
|-------------|-------------|
| **C‑AA‑001** | All public methods of `Ricky` that previously perfo
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review (as of 2026‑05‑31)**  

| Metric | Current State | Target / SLA | Gap / Action |
|--------|---------------|--------------|--------------|
| **Repository Coverage** | 68 % of active AgentWorkforce repos import `@agent-assistant/turn-context` in their `package.json`. | ≥ 90 % by Q4 2026 | Identify the 32 % outliers (list attached) and open a **“Add turn‑context”** issue on each. |
| **CI Integration** | 54 % of repos have a dedicated CI step that runs `ricky‑verify‑turn‑context` (custom lint + contract test). | 100 % by Q3 2026 | Create a **template workflow** (`.github/workflows/turn-context.yml`) and push a PR to each repo missing it. |
| **Contract Compliance** | 81 % of turn‑context usages pass the **Deterministic Verification Gate** (schema validation + runtime contract test). | 100 % | For the 19 % failing cases, generate a **Repair PR** that adds missing `turnId` propagation and updates the `TurnContext` interface. |
| **Documentation Adoption** | 42 % of repos reference the **Turn‑Context Design Doc** in their README. | 80 % | Add a **Docs‑Update** PR to each repo lacking the link, and include a checklist item in the repo’
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all four** capabilities — workflow generation, validation, debugging, staged‑CLI UX, and blocker/evidence wording — into the **Agent Assistant** (the “AA” component of the AgentWorkforce stack).

Below is a concise justification aligned with our repository conventions, workflow standards, and product specs.

---

## 1. Alignment with Agent Assistant’s Core Mandate
| Capability | Core AA responsibility | Why it belongs in AA |
|------------|-----------------------|----------------------|
| **Workflow generation** | *Authoring* – AA is the primary source of deterministic workflow contracts. | Generates concrete `workflow.yaml` + `contract.json` artifacts that downstream services (Orchestrator, Verifier) consume. |
| **Workflow validation** | *Reliability* – AA runs deterministic verification gates before a workflow is merged. | Provides the “pre‑merge gate” (`aa-validate`) that checks schema, contract compliance, and test‑case coverage (≥80 % of spec). |
| **Debugging** | *Coordination* – AA surfaces step‑level diagnostics and auto‑generates “debug tickets”. | Centralizes error‑capture (stack traces, env snapshots) and produces reproducible “debug bundles” 
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: July 1 – September 30 2026*  

---

## 1. Slice Objective
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** (≈ 150 agents) to onboard, configure, and run reliability‑focused workflows with zero‑touch CI/CD and deterministic verification.

Success is measured by:
| Metric | Target |
|--------|--------|
| **Onboarded agents** | ≥ 150 (10 teams) |
| **Workflow success rate** | ≥ 99 % (post‑deployment) |
| **Mean Time to Recovery (MTTR)** | ≤ 5 min for any blocked gate |
| **User satisfaction (NPS)** | ≥ +30 |
| **Documentation coverage** | 100 % of new contracts & gates reviewed |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • New **Ricky‑v2.0** workflow contract (YAML) with deterministic gates. <br>• CI pipeline (GitHub Actions) for contract lint, unit‑test, integration‑test, and gate verification. <br>• Automated PR template & reviewer assignment matrix. <br>• Pilot onboarding playbook, self‑service portal (GitHub‑Pages). <br>• Monitoring dashboards (Grafana) + alert routing. | •
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Slack / Web Support – Agent‑Assistant Package Design  

Below is a **complete, production‑ready package spec** that lets Ricky expose its workflow‑reliability, coordination, and authoring capabilities through two front‑ends:

| Front‑end | Package | Core Agent | Public Contract | Verification Gate | Review Artifacts |
|-----------|---------|------------|----------------|-------------------|------------------|
| **Slack** | `ricky-slack‑bot` | `ricky-core` | `SlackCommandContract` (JSON‑RPC over Slack Events API) | 1️⃣ Schema validation 2️⃣ Deterministic unit‑test suite 3️⃣ End‑to‑end (E2E) Slack‑app test harness | OpenAPI spec, CI‑pipeline logs, Slack‑App Manifest |
| **Web**   | `ricky-web‑ui`   | `ricky-core` | `WebRestContract` (REST/JSON) + `WebSocketContract` (real‑time updates) | 1️⃣ OpenAPI schema validation 2️⃣ Contract‑test matrix 3️⃣ Browser‑automation smoke tests | Swagger UI, Cypress reports, Deployment manifest |

---

### 1. Package Layout (Git repo `ricky-agent-assist`)

/ricky-agent-assist

├─ /ricky-core # shared library (workflow engine, validation, logging)
│ ├─ src/
│ └─ tests/

├─ /ricky-slack-bot
│ ├─ src/
│ │ ├─ index
...[truncated]


**Must**

- Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
- Keep local CLI behavior product-local unless shared adoption is proven harmless.
- Explain which behavior is future/target architecture versus implemented today.

**Must Not**

- Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
- Let future surface abstractions distort the current CLI contract.
- Duplicate a mature Agent Assistant capability locally without justification.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.generation-default-not-execution</code> (cli-behavior/manual)</summary>

**User message**

> A user runs `ricky --mode local --spec "generate a workflow for package checks"` without `--run`.

**Ricky output**

```text
**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Action Result
1️⃣ Parse arguments – Ricky reads --mode local and the free‑form --spec string. The request is interpreted as “create a workflow definition for package checks”.
2️⃣ Select modelocal tells Ricky to operate against the local repository (no remote CI/CD integration). All file I/O, validation, and artifact generation happen on the developer’s machine.
3️⃣ Generate workflow contract – Ricky builds a deterministic workflow contract (YAML/JSON) that includes:
Inputs – e.g., package_path, lint_config.
Stepsinstall_deps, run_lint, run_tests, security_scan.
Outputslint_report, test_report, scan_summary.
Verification gates – unit‑test pass, lint score ≥ 90, no high‑severity vulnerabilities.
The contract is written to ./ricky/workflows/package-checks.yaml (or the path you configured).
4️⃣ Render artifacts – Ricky also creates optional scaffolding:
• A README describing the workf
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. It follows the repository conventions (AGENTS.md), the workflow standards, and the shared authoring rules. All steps are deterministic, include verification gates, and produce review artifacts that can be inspected in CI.

---

## 1️⃣  Repository Layout (for reference)

ricky/
├─ .github/
│ └─ workflows/
│ └─ onboarding.yml # CI workflow (see §5)
├─ scripts/
│ └─ onboarding.sh # Main onboarding script (see §2)
├─ docs/
│ └─ onboarding.md # Human‑readable guide (this file)
├─ .ricky/
│ └─ config.yaml # Generated user config
└─ tests/
└─ onboarding.test.js # Validation loops (see §4)


---

## 2️⃣  `scripts/onboarding.sh` – Deterministic CLI Contract

```bash
#!/usr/bin/env bash
set -euo pipefail

# -------------------------------------------------
# Ricky First‑Run Onboarding
# Contract:
#   Input  : None (interactive)
#   Output : ~/.ricky/config.yaml
#   Exit   : 0 on success, non‑zero on any failure
# -------------------------------------------------

# 1. Verify 
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
1. **Detect** – When the workflow runner attempts to load `spec.yaml` (or the configured spec file) and receives:
   * a zero‑byte file, **or**
   * a `FileNotFoundError`.
2. **Report a blocker** – Emit a deterministic, machine‑readable error object (JSON) that downstream tools can surface:

```json
{
  "type": "BLOCKER",
  "code": "SPEC_MISSING_OR_EMPTY",
  "message": "Workflow spec is missing or empty. Execution halted until a valid spec is provided.",
  "details": {
    "expected_path": "workflows/my_flow/spec.yaml",
    "detected_state": "missing|empty"
  }
}

No further workflow steps are executed until the blocker is cleared.


2. Automated Remediation Scaffold (Optional)

If the environment permits auto‑generation, Ricky can create a minimal placeholder spec that satisfies the contract validation gate, allowing the team to iterate quickly.

2.1 Placeholder Template

# workflows/my_flow/spec.yaml
name: <YOUR_WORKFLOW_NAME>
description: "TODO: Add a concise description of the workflow purpose."
version: "0.1.0"
steps:
  # Add step defin
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Workflow Engine
• Scheduler : RUNNING
• Queue depth : 0 pending jobs
• Last heartbeat : 2026‑06‑01 12:03:17 UTC

📦 Repository State
• Current branch : main
• Uncommitted changes : 0
• Open PRs : 0

⚙️ System Health
• CPU usage : 12%
• Memory usage: 48%
• Disk space : 73% free

🚨 Alerts
• Provider connectivity not confirmed for Google and GitHub.
→ Run ricky provider check --google and/or ricky provider check --github
to verify credentials and network access.

📝 Next Steps

  1. Authenticate the missing providers (OAuth token, SSH key, etc.).
  2. Re‑run provider checks.
  3. Once verified, ricky status will show them as CONNECTED.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` HEAD, ticket ID (e.g., `GH‑1234`) | New branch `feat/gh‑1234‑<short‑desc>` | `git checkout -b` succeeds; branch name matches regex `^feat/gh-\d+-[a-z0-9-]+$` |
| 2️⃣ | **Developer** | **Implement change** | Scoped branch, design spec | Modified files, unit‑test updates | Local CI lint (`npm run lint`) passes; `git diff --stat` shows only allowed paths |
| 3️⃣ | **Ricky (Reliability)** | **Run pre‑merge verification** | Local commit(s) | CI pipeline result | GitHub Actions **“pre‑merge‑check”** must return `success` (lint, unit tests, type‑check) |
| 4️⃣ | **Developer** | **Open Pull Request** | Branch, PR template | PR #, reviewers auto‑assigned | PR title matches `^<type>(\(scope\))?: .+$`; description contains “Ticket: GH‑1234”. |
| 5️⃣ | **Ricky (Review Coordination)** | **Assign deterministic reviewers** | PR metadata, reviewer matrix (e.g., `@frontend‑lead`, `@qa‑engineer`) | Review assignments | GitHub API confirms reviewers added; no duplic
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Purpose** | Automate a two‑stage pipeline where Claude performs a semantic/code‑review of a design brief and Codex generates the corresponding implementation. |
| **Scope** | • Input: `design_spec.md` (Markdown) <br>• Output: `implementation/` (source files) <br>• Supported languages: Python, JavaScript, TypeScript |
| **Branch/PR Boundary** | All changes are made on a feature branch `feat/review‑implement‑<timestamp>` and merged via a single PR that includes the review artifact and generated code. |
| **Contract** | <ul><li>**Input contract** – `design_spec.md` must contain a top‑level `## Requirements` section and optional `## Constraints`.</li><li>**Claude review contract** – Returns a JSON file `review_report.json` with fields: `status` (`PASS`/`FAIL`), `issues[]` (array of strings), `suggestions[]` (array of strings).</li><li>**Codex implementation contract** – Consumes `review_report.json` (must be `PASS`) and produces a directory `implementation/` with a manifest `manifest.json` listing generated files and their SHA‑256 ha
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes a **generated workflow JSON** (or YAML) and returns a **tighter, review‑ready version**. It focuses on two things:

| Target | What `--refine` does |
|--------|----------------------|
| **Task descriptions** | • Removes vague verbs (“do”, “handle”). <br>• Adds concrete inputs, outputs, and success criteria. <br>• Limits each description to 1‑2 sentences (≈30‑45 words). |
| **Acceptance gates** | • Converts informal “check” statements into explicit **verification predicates** (e.g., `output.fileExists && output.size > 0`). <br>• Adds a **deterministic verification step** (`verify_<step_id>`) that runs before the next step. <br>• Flags any gate that cannot be expressed as a boolean predicate as a **blocker** for manual review. |

---

## 1. Basic Invocation  

```bash
awf generate --template my_template.yaml --output workflow.json
awf refine --input workflow.json --output workflow_refined.json
  • --input – the raw workflow you just generated.
  • --output – the refined artifact you will push to a feature branch.

You can also pipe directly:

...[truncated]


**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.3)*  

---

## 1. Objective
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

* Returns the current Ricky package version (semantic‑version string).  
* Exits with status 0 on success, non‑zero on unexpected failure.  
* Emits the version to **stdout** only (no extra whitespace).  
* Is fully covered by unit‑ and integration‑tests and passes CI gates before merge.

---

## 2. Scope & Boundaries  

| Scope | Included | Excluded |
|-------|----------|----------|
| **Code** | `src/cli.ts`, `src/version.ts`, test files under `test/` | Any unrelated CLI commands, UI components |
| **Docs** | `docs/cli/version.md` | Full user‑guide (handled elsewhere) |
| **PR** | Single PR `feat: add --version flag` targeting `main` | Multi‑feature PRs, major refactors |

---

## 3. Contract (Interface Definition)

```ts
// src/version.ts
export interface VersionProvider {
  /** Returns the semantic version string, e.g. "1.2.3". */
  getVersion(): string;
}

// src/cli.ts (excerpt)
import { VersionProvider } from "./version";

export function run(argv: string[], vp: Version
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact Validation‑Fix‑Rerun Loop”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, config bundles) through deterministic validation, automatic fixing where possible, and controlled re‑execution until all pass or a blocker is reported.

---

## 1. Scope & Boundaries
| Aspect | Definition |
|--------|------------|
| **Artifacts** | Any file/object that can be validated independently (e.g., `*.json`, `*.py`, `*.csv`). |
| **Batch size** | Configurable; default 100 artifacts per PR/branch. |
| **Branch/PR** | Each batch lives in its own feature branch `batch/<timestamp>/<batch‑id>` and opens a PR titled `Batch <batch‑id>: Validation Loop`. |
| **Exit conditions** | • All artifacts **PASS** → merge.<br>• Any artifact **BLOCKED** → abort with blocker report. |
| **Loop limit** | Max 8 validation‑fix‑rerun cycles per artifact (≈ 80‑100 total loops across batch). |

---

## 2. Contract (YAML) – `batch_validation.yml`

```yaml
name: batch-validation-loop
on:
  push:
    branches: [ main ]
  workflow_dispatch:
    inputs:
      batch_id:
        description: "Unique batch identifier"
        required: true
      batch_path:
   
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What It Contributes | Direct Impact on the Workflow Artifact |
|-------|--------------------|----------------------------------------|
| **Deterministic Contract Definition** | Writes explicit input‑output schemas (JSON‑Schema, OpenAPI) for every step. | Guarantees that each node can be type‑checked before execution; the generated workflow includes a **`contract`** block that the CI gate validates with `ajv` or `swagger‑validator`. |
| **Verification Gate Insertion** | Adds automated test steps (unit, integration, contract‑test) after each logical group. | The workflow contains **`gate: verify`** stages that run `npm test`, `pytest`, or custom scripts. If any gate fails, the pipeline aborts and reports a **blocker** with a deterministic error code. |
| **Review Artifact Generation** | Emits markdown review packets (design rationale, risk matrix, change‑log) alongside the PR. | The PR template is auto‑populated with a **`## Review Artifacts`** section, linking to generated `design.md` and `risk.md`. Reviewers can see exactly why a step was added or modified. |
| **80‑to‑100 Validation Loops** | En
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

Below is a **minimal, deterministic workflow skeleton** that you can flesh out once the missing product details are clarified.  
It follows the AgentWorkforce conventions (AGENTS.md) and includes:

* **Typed contract definitions** for inputs, outputs, and intermediate artifacts.  
* **Verification gates** that enforce deterministic checks before each transition.  
* **Review & validation loops** (80‑100 % coverage) to catch regressions early.  
* **Blocker‑reporting** hooks for any step that cannot proceed.  
* **Scoped branch/PR boundaries** so each logical change lives in its own PR.

---

## 1️⃣ High‑Level Overview

| Stage | Purpose | Owner | Primary Artifact |
|-------|---------|-------|------------------|
| **A – Ingest Spec** | Capture the vague product spec and normalize it. | Product Analyst | `SpecNormalized.json` |
| **B – Derive Requirements** | Translate normalized spec into concrete functional & non‑functional requirements. | Requirements Engineer | `Requirements.yaml` |
| **C – Design Draft** | Produce a design document (architecture, data model, API contracts). | System Designer | `Design.md` |
| **D – Review & Approve** | Formal review cycle (peer + stakeholder). |
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s Immediate Response Plan**

| Phase | Action | Owner | Deliverable | Success Gate |
|-------|--------|-------|-------------|--------------|
| 1️⃣ **Detect & Log** | • Pull the latest run‑metadata from the orchestration service (e.g., Temporal, Airflow). <br>• Record the exact timestamp of the timeout, the step ID, and the worker‑ID that remained alive. <br>• Capture the worker’s heartbeat logs and any partial stdout/stderr. | Ricky (automation) | `run‑audit‑<run‑id>.json` stored in the run‑audit bucket. | All fields populated; no missing timestamps. |
| 2️⃣ **Isolate the Stale Worker** | • Issue a **graceful termination** request to the worker via the runtime API (e.g., `kill -SIGTERM`). <br>• If the worker does not exit within 30 s, force‑kill (`SIGKILL`). <br>• Tag the worker’s container/pod with label `ricky/blocked‑artifact‑missing`. | Ricky (automation) | `worker‑termination‑<worker‑id>.log` | Worker state = *Terminated* in the cluster API. |
| 3️⃣ **Validate Artifact Absence** | • Run a deterministic verification gate: `artifact‑exists --run <run‑id> --step <step‑id>` <br>• Expect **false**. If any file appears, raise a **blocker** and attach the file to the incident
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow: Detect and Guard Against Stale Relay State Before a Local Run**  

| **Stage** | **Purpose** | **Inputs** | **Outputs** | **Deterministic Gate** |
|-----------|-------------|------------|-------------|------------------------|
| **1️⃣ Pre‑run Scan** | Walk the repository tree and collect any of the three stale directories. | `repoRoot` (absolute path) | `staleDirs: { .agent‑relay, .relay, .trajectories }` (list of found paths) | **Gate‑1** – If `staleDirs` is empty → skip to **5️⃣**. If not empty → continue. |
| **2️⃣ Content Hash Check** | Compute a content hash for each stale directory (recursive SHA‑256 of file names + sizes). | `staleDirs` | `dirHashes: { path → hash }` | **Gate‑2** – If any hash differs from the hash stored in the last successful run manifest (`.agent‑run‑manifest.json`) → mark as **dirty**; otherwise treat as **clean**. |
| **3️⃣ User Prompt / Auto‑Fix Policy** | Decide whether to abort, clean, or continue based on policy. | `dirty` flag, user‑config (`workflow.config.json → stalePolicy: ["abort","clean","ignore"]`) | `action: abort|clean|continue` | **Gate‑3** – If `action === abort` → fail fast with a blocker report. If `clean` → schedule deleti
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**How to resolve “Another Ricky or Relay run is already active in this workspace”**

The marker file is a safety lock that prevents two concurrent executions from stepping on each other. When the lock is left behind (e.g., a crash, a killed terminal, or a stray background process), the next run will stop with the message you’re seeing.

Follow the deterministic, low‑risk steps below. Each step is a **verification gate** – you confirm the condition before moving to the next one. If any gate fails, stop and report the blocker.

---

## 1️⃣ Verify the lock state

| Gate | Command | Expected outcome |
|------|---------|------------------|
| **1‑A** | `cat .ricky/run.marker` | Should contain a UUID and a PID (e.g., `run‑id: c3f9… pid: 12345`). |
| **1‑B** | `ps -p <pid-from-marker>` | Should return **no** process (exit code 1). If the PID is still alive, the run is truly active. |

*If 1‑B shows a live process, let that run finish or kill it deliberately (`kill -9 <pid>`). Then re‑run the workflow.*  

*If the PID is dead, proceed to step 2.*

---

## 2️⃣ Remove the stale marker safely

| Gate | Command | Reason |
|------|---------|--------|
| **2‑A** | `git status --porcelain .ricky/ru
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix does three things:

1. **Injects the environment‑loader** – `loadRickyWorkflowEnv()` – at the top of the generated file so that `.env.local` (or fallback `.env`) is read before any step runs.  
2. **Adds an optional guard** – `assertRickyWorkflowEnv()` – that throws a clear error if the required variable is still undefined after loading.  
3. **Leaves the existing step bodies untouched**, including any HEREDOC‑style `node --input-type=module` blocks that embed literal `import { … } from 'node:fs'` or `from 'node:path'` strings.

---

## 1. Helper utilities (single source of truth)

Create (or update) `ricky-workflow-env.js` in the repository root (or a shared `utils/` folder). This file is **import‑free** and can be required from any generated workflow artifact.

```js
// ricky-workflow-env.js
import { config } from 'dotenv';
import { resolve } from 'node:path';
import { existsSync } from 'node:fs';

/**
 * Load environment variables from .env.local (preferred) or .env.

...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, step‑by‑step contract you can copy‑paste into your terminal (or CI script) to re‑run the failed workflow with **auto‑fix** enabled, using the supplied *failed‑step identifier* and *previous run‑id*. The contract assumes you have the standard AgentWorkforce CLI (`awf`) installed and that the workflow source lives in a local Git repo.

---

## 1️⃣ Preconditions (verify before you start)

| Check | Command | Expected outcome |
|-------|---------|------------------|
| **CLI installed** | `awf --version` | `awf version x.y.z` |
| **Repo clean** | `git status --porcelain` | No untracked/modified files (or stash them) |
| **Previous run metadata** | `cat .awf/run‑metadata.json` | JSON containing `run_id` and `failed_step` (you already have these) |
| **Auto‑fix feature enabled** | `awf config get auto_fix` | `true` (if `false`, run `awf config set auto_fix true`) |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Create a scoped branch for the repair

```bash
# 2.1 Create a short‑lived branch named after the failed step and run id
BRANCH="auto-fix/${FAILED_STEP}/${PREV_RUN_ID}"
git checkout -b "$BRANCH"

...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Immediate troubleshooting checklist**

| Step | Action | Expected outcome |
|------|--------|-------------------|
| 1️⃣ | **Collect the run log**<br>`ricky logs workflows/foo.ts --last` | A full JSON‑LND log file is printed. Save it as `foo.run.log`. |
| 2️⃣ | **Locate the failure gate**<br>Search the log for `❌` or `FAIL` entries. Note the *gate name* (e.g., `validate‑schema`, `run‑test‑suite`). | You now know which deterministic verification gate stopped the workflow. |
| 3️⃣ | **Inspect the offending artifact**<br>Open the file/path referenced in the gate’s `details.file` field. | The exact line/section that caused the failure is visible. |
| 4️⃣ | **Run the gate locally**<br>`ricky gate <gate‑name> --dry-run workflows/foo.ts` | The gate reproduces the failure with a deterministic stack trace. |
| 5️⃣ | **Check contract compliance**<br>Open `workflows/foo.contract.ts` (or the contract embedded in the workflow header). Verify that the workflow’s inputs/outputs match the contract’s `InputSchema` and `OutputSchema`. | Any schema mismatch will be highlighted. |
| 6️⃣ | **Run unit tests**<br>`ricky test workflows/foo.test.ts` | All unit tests must pass before proceeding. |
| 7️⃣ | 
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky’s Standard Procedure for Executing a Local TypeScript Workflow Artifact**

| Phase | Action | Details | Expected Outcome |
|-------|--------|---------|-------------------|
| **1️⃣ Locate & Scope** | Identify the primary local path | `./workflows/` (or the path defined in `workflow.config.json` under `primaryPath`). | The exact directory that contains the `*.workflow.ts` file. |
| **2️⃣ Verify Contract** | Load the workflow contract file | `./workflows/<workflow‑name>.contract.json` must exist and contain the required keys: `name`, `version`, `inputs`, `outputs`, `gateways`. | Guarantees the artifact conforms to the **Ricky Contract Schema v1.2**. |
| **3️⃣ Install Dependencies** | Run deterministic install | ```bash\ncd <primaryPath>\nnpm ci   # uses package‑lock for reproducibility\n``` | All Node modules are installed exactly as locked; no drift. |
| **4️⃣ Compile (if needed)** | Transpile TypeScript to JavaScript | ```bash\nnpx tsc <workflow‑name>.workflow.ts --outDir ./dist\n``` | Generates `./dist/<workflow‑name>.workflow.js`. If the repo uses `ts-node` with `"module": "commonjs"` you may skip this step. |
| **5️⃣ Execute with Verification Gate** | Run the workflow thr
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Workflow for Persistent Structural Failures**

---

### 1. Scope & Preconditions
| Item | Description |
|------|-------------|
| **Trigger** | A structural failure (e.g., broken DAG, cyclic dependency, missing contract) remains after an initial automated or manual fix attempt. |
| **Assumptions** | • The failing component is version‑controlled in `src/workflows/`. <br>• All prior attempts have been logged in the issue tracker with `status: attempted‑fix`. |
| **Boundary** | The failure is **persistent** after **two** distinct remediation cycles (auto‑fix + manual patch) and cannot be resolved within the current sprint’s capacity. |

---

### 2. Deterministic Verification Gates

| Gate | Input | Success Criteria | Output |
|------|-------|------------------|--------|
| **G1 – Re‑run Test Matrix** | Latest commit on `fix‑branch` | 0/ N failures in `ci/structural-tests.yml` | `test‑report.json` |
| **G2 – Static Contract Validation** | `workflow-contract.yaml` | `contract‑validator` returns `PASS` | `contract‑status.txt` |
| **G3 – Dependency Graph Acyclicity** | `graph.dot` | `graph‑checker --acyclic` returns `OK` | `graph‑status.log` |
| **G4 – Human Re
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Summary of Recent Runs**  
*(Generated ⟨date‑time⟩ – All times UTC)*  

| # | Workflow | Run ID | Branch / PR | Start → End | Duration | Status | Gate Pass/Fail | Key Metrics | Blockers / Alerts |
|---|----------|--------|-------------|------------|----------|--------|----------------|-------------|-------------------|
| 1 | **wf‑publish‑article** | `run‑a1b2c3` | `feature/auto‑toc` (PR #452) | 02:13 → 02:27 | **14 min** | ✅ **Success** | **All 5 gates passed** | Articles = 124, Failures = 0 | – |
| 2 | **wf‑sync‑catalog** | `run‑d4e5f6` | `main` | 03:00 → 03:09 | **9 min** | ⚠️ **Partial** | Gate 3 (checksum) **FAIL** | Items = 8 742, Skipped = 12 | Checksum mismatch on SKU #8742 |
| 3 | **wf‑generate‑report** | `run‑g7h8i9` | `release/v2.3` (PR #389) | 04:45 → 05:02 | **17 min** | ✅ **Success** | **All 4 gates passed** | Reports = 57, Size = 3.2 GB | – |
| 4 | **wf‑ingest‑logs** | `run‑j0k1l2` | `main` | 06:20 → 06:20 | **<1 min** | ❌ **Failed** | Gate 2 (schema) **FAIL** | Files = 0 | JSON schema validation error (field `timestamp` missing) |
| 5 | **wf‑deploy‑staging** | `run‑m3n4o5` | `release/v2.3` | 07:15 → 07:38 | **23 min** | ✅ **Success** 
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

### Contract for the Repaired `github‑primitive` Workflow

The repaired workflow artifact must be a **self‑contained, deterministic, and verifiable** piece of code that satisfies every gate in the **runtime‑precheck** pipeline used by AgentWorkforce. Below is the explicit contract that the auto‑fix persona must deliver.

| **Section** | **Requirement** | **Why it matters** |
|-------------|----------------|--------------------|
| **1. Repository Layout** | - File must live at `workflows/github-pr-create.yaml` (or `.json`/`.ts` depending on the repo’s language convention). <br>- The top‑level `name` field must be `Open PR via GitHub Primitive`. | Guarantees the workflow is discoverable by the CI‑pipeline and matches the naming convention used by downstream tools. |
| **2. Imports** | - Exactly one import statement: <br>```yaml<br>import { createGitHubStep } from "@agent‑relay/github‑primitive"<br>``` <br>- No stray relative imports or duplicate package names. | The precheck validates that the **canonical package** is used; duplicate or ambiguous imports cause the “missing‑module” gate to fail. |
| **3. Step Definition** | - A single step called `open_pr` must be created with `create
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

A dedicated Slack app (Ricky Debug) that lets engineers **inspect, step‑through, and repair** any Ricky‑managed workflow from within Slack. The surface is built as a **modal‑first** experience with typed blocks, deterministic back‑ends, and explicit verification gates so that every action is auditable and repeatable.

---

### 1. High‑Level Architecture

| Component | Responsibility | Tech Stack |
|-----------|----------------|------------|
| **Slack App (Ricky Debug)** | Slash command, interactive modals, message actions | Bolt JS (Node 18), Slack Block Kit |
| **Ricky Core API** | Workflow metadata, state snapshots, execution control | Fastify TS, OpenAPI v3, PostgreSQL |
| **Verification Service** | Deterministic gate checks, policy enforcement | Temporal workers, JSON‑Schema validation |
| **Audit Log** | Immutable record of every debug action | CloudWatch Logs + S3 immutable bucket |
| **CI/CD Gate** | PR‑level contract testing for new Slack surface | GitHub Actions, Pact contract tests, 80‑100 validation loops |

All calls from Slack to Ricky Core are **signed with a per‑workspace JWT** (issued by the Slack app’s OAuth flow) and v
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

@agent-relay-code

Copy link
Copy Markdown
Contributor

pr-reviewer applied fixes — committed and pushed 3081a72 to this PR. The notes below describe what changed.

Reviewed PR #148 and fixed one real issue: the new guidance required action inputs under params, but fixtures still treated top-level branch as valid. The actual GitHub primitive factory drops unknown top-level fields, so I tightened the AST guard to flag those, plus invalid static repo / params shapes, and aligned the writer task + skill guidance.

Local validation run:

  • npm run typecheck
  • npm test 1403 passed, 1 skipped

@agent-relay-code agent-relay-code Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr-reviewer applied fixes — committed and pushed 3081a72 to this PR. The notes below describe what changed.

Reviewed PR #148 and fixed one real issue: the new guidance required action inputs under params, but fixtures still treated top-level branch as valid. The actual GitHub primitive factory drops unknown top-level fields, so I tightened the AST guard to flag those, plus invalid static repo / params shapes, and aligned the writer task + skill guidance.

Local validation run:

  • npm run typecheck
  • npm test 1403 passed, 1 skipped

@khaliqgant khaliqgant merged commit a40bac4 into main Jun 1, 2026
4 checks passed
@khaliqgant khaliqgant deleted the codex/fix-issue-144 branch June 1, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Malformed createGitHubStep (missing name/action, stray command) breaks every PR-shipping generated workflow

1 participant