Skip to content

fix(spec-intake): extract target file paths from backticked prose and ## Target Files blocks#87

Closed
kjgbot wants to merge 2 commits into
mainfrom
fix/spec-parser-target-files-extraction
Closed

fix(spec-intake): extract target file paths from backticked prose and ## Target Files blocks#87
kjgbot wants to merge 2 commits into
mainfrom
fix/spec-parser-target-files-extraction

Conversation

@kjgbot

@kjgbot kjgbot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • The spec parser's PATH_PATTERN regex required a leading whitespace boundary, so paths inside markdown backticks (the natural convention) never matched. A spec like Update \packages/web/app/api/v1/workflows/run/route.ts`...extracted **zero** target files. Lower-precision tokens likebase/head from prose (PR number, base/head SHA) DID match and ended up as the only targetFilesentry — pushing the renderer into the manifest-driven "single artifact underworkflows/generated/`" branch even when the spec named real source files.
  • Three coordinated parser changes plus surfacing target_files in the --json generation output so it's verifiable from the CLI.

What changed

  1. Extend PATH_PATTERN to accept paths preceded by backticks, single quotes, double quotes, parens, or angle brackets — typical markdown wrappers — without losing the whitespace boundary.
  2. Add looksLikeRealPath to suppress two-segment prose noise with no extension and no recognized leading directory prefix (e.g. base/head, my-org/my-repo). Paths with an extension, 3+ segments, or a known prefix (packages/, src/, tests/, workflows/, .github/, etc.) still pass. Existing workflows/wave2-product regression case remains green.
  3. Add extractTargetFilesBlock to recognize a literal ## Target Files markdown section. When present, its bullet list takes precedence over prose extraction so spec authors can be explicit about scope. Bullets, backticks, and quotes are stripped.
  4. Surface target_files in the generation JSON output (LocalGenerationStageResult.artifact.target_files) so callers and deterministic evals can verify the extracted scope without parsing the rendered workflow body.

Repro / before-and-after

End-to-end against cloud@codex/msd-shared-sandbox-review-runtime-spec's docs/runtimes/specs/msd-shared-sandbox-review-runtime.md:

Before: targetFiles: ["base/head"] (single noise match; spec drove the manifest path)

After:

"target_files": [
  "packages/web/app/api/v1/workflows/run/route.ts",
  "packages/core/src/bootstrap/launcher.ts",
  "packages/core/src/bootstrap/script-generator.ts",
  ".agent-workforce/msd-review/input.json"
]

— exactly the files the spec names for editing.

Test plan

  • npx vitest run src/product/spec-intake/parser.test.ts — 35 tests pass (29 existing + 6 new covering backticked paths, paren/quote-wrapped paths, prose-noise suppression, structured-block precedence, prose fallback, and 3+ segment paths)
  • npx vitest run — 1034/1034 green across 49 files
  • npm run typecheck — clean
  • npm run evals — 8/8 deterministic pass (5 existing + 3 new: target-files-from-backticked-prose, target-files-from-structured-block, target-files-suppresses-prose-noise); 42 needs-human (LLM cases, unchanged); 0 failed
  • Manual end-to-end against the MSD spec — target_files correctly populated in the JSON output

Out of scope

  • Changing the renderer's behavior when targetFiles is empty (the manifest path still applies for specs that don't name files).
  • Inferring target files from prose that doesn't have backticks/quotes — the regex already handles that case via the original whitespace boundary.
  • Cloud-mode generation parsing (this fix is in the shared spec-intake parser, so it benefits both local and cloud paths).

… `## Target Files` blocks

The spec parser's PATH_PATTERN regex required a path to be preceded by
whitespace or start-of-line, so paths inside markdown backticks (the
natural convention) never matched. A spec like

  - Update `packages/web/app/api/v1/workflows/run/route.ts` to accept ...

extracted exactly zero target files. Worse, lower-precision tokens like
`base/head` from prose (`PR number, base/head SHA`) DID match and ended
up as the sole `targetFiles` entry. With `targetFiles` effectively
empty, the renderer fell into the manifest-driven "single artifact under
workflows/generated/" branch — even when the spec named multiple
real source files for editing.

Three coordinated changes:

1. Extend PATH_PATTERN to accept paths preceded by backticks, single
   quotes, double quotes, parens, or angle brackets — the typical
   markdown wrappers — without losing the existing whitespace boundary.

2. Add `looksLikeRealPath` to suppress two-segment prose noise that has
   no extension and no recognized leading directory prefix
   (e.g. `base/head`, `my-org/my-repo`). Paths with an extension, three
   or more segments, or a known prefix (`packages/`, `src/`, `tests/`,
   `workflows/`, `.github/`, etc.) still pass. The existing
   `workflows/wave2-product` regression case remains green.

3. Add `extractTargetFilesBlock` to recognize a literal `## Target
   Files` markdown section. When present, its bullet list takes
   precedence over prose extraction so spec authors can be explicit
   about the source-change boundary. Bullets, surrounding backticks,
   and quotes are stripped.

Also surface `target_files` in the local generation `--json` output so
callers (and deterministic evals) can verify the extracted scope without
parsing the rendered workflow body.

Verification:
- `src/product/spec-intake/parser.test.ts` adds six cases covering
  backticked paths, paren/quote-wrapped paths, prose-noise suppression,
  structured-block precedence, prose fallback, and 3+ segment paths.
- `evals/suites/generation-quality/cases.md` adds three deterministic
  evals (`target-files-from-backticked-prose`, `-from-structured-block`,
  `-suppresses-prose-noise`) that drive the real CLI and assert
  `target_files` in the JSON output.
- Full vitest suite: 1034/1034 green; full eval suite: 8/8 deterministic
  pass, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

@coderabbitai

coderabbitai Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 996848ad-65e5-491d-b65e-a1a761070463

📥 Commits

Reviewing files that changed from the base of the PR and between e7efeb1 and 2108ef0.

📒 Files selected for processing (2)
  • src/product/spec-intake/parser.test.ts
  • src/product/spec-intake/parser.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/product/spec-intake/parser.test.ts
  • src/product/spec-intake/parser.ts

📝 Walkthrough

Walkthrough

Parser now extracts repository file paths from markdown prose or a prioritized ## Target Files block, filters false positives via extension/slash/prefix heuristics, and includes non-empty target_files in local generation-stage artifacts; tests and CLI regression cases cover the behaviors.

Changes

Target Files Extraction Feature

Layer / File(s) Summary
Parser Core Logic
src/product/spec-intake/parser.ts
PATH_PATTERN regex broadened to match paths after delimiters; RECOGNIZED_PATH_PREFIXES added; looksLikeRealPath helper classifies candidates by extension/slash-count/prefix; extractTargetFiles prioritizes ## Target Files section and falls back to prose regex scanning with cleanup and dedupe; extractTargetFilesBlock implemented.
Parser Unit Tests
src/product/spec-intake/parser.test.ts
New targetFiles extraction test suite validates extraction from backticked/parenthetical/quoted paths, noise suppression, structured block priority, prose fallback, empty-block handling, and preservation of deeply nested extensionless paths.
Generation Artifact Contract
src/local/entrypoint.ts
LocalGenerationStageResult.artifact extended with optional target_files?: string[]; createGenerationStage signature updated with targetFiles parameter; both success and failure paths now pass normalizedSpec.targetFiles; artifact conditionally includes target_files if non-empty.
Regression Test Cases
evals/suites/generation-quality/cases.jsonl, evals/suites/generation-quality/cases.md
Three new CLI regression cases: (1) backticked path extraction into target_files; (2) structured ## Target Files block takes precedence over prose; (3) two-segment prose tokens without extensions or recognized prefixes are suppressed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 A rabbit reads the spec by light,
Sniffs backticks, sections, day and night,
It keeps the paths that truly show,
Drops the noise and lets truth grow,
Target files gathered—hop, delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: extracting target file paths from backticked prose and structured ## Target Files blocks in the spec parser.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the problem, solution, test results, and verification approach.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/spec-parser-target-files-extraction

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/spec-intake/parser.ts`:
- Around line 879-882: The extractTargetFiles function currently falls back to
prose extraction when extractTargetFilesBlock(text) returns an empty array,
which incorrectly ignores a present-but-empty "## Target Files" block; change
the logic so that extractTargetFilesBlock distinguishes "block not present" from
"block present but empty" (e.g., return null or an object with a flag) and make
extractTargetFiles return the empty array when the block is present but empty
instead of running prose extraction; update or add a regression test for
extractTargetFiles that supplies a text with a present-but-empty "## Target
Files" section to assert it returns an empty array.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: ac31f4b5-bf82-4191-b2f9-a88ad2b68dfd

📥 Commits

Reviewing files that changed from the base of the PR and between 98f3c30 and e7efeb1.

📒 Files selected for processing (5)
  • evals/suites/generation-quality/cases.jsonl
  • evals/suites/generation-quality/cases.md
  • src/local/entrypoint.ts
  • src/product/spec-intake/parser.test.ts
  • src/product/spec-intake/parser.ts

Comment thread src/product/spec-intake/parser.ts

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-09T21-03-52-664Z-HEAD-provider
Mode: provider
Git SHA: 11c7c92

Passed: 8 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Ricky – Updated Documentation & Code to Reflect Deeper Integration with Agent Assistant**

Below is a complete, deterministic change package that brings the Ricky repository in line with the new product spec: *“Ricky now leverages the Agent Assistant (AA) for all internal reasoning, plan generation, and verification steps.”*  

The package follows **AGENTS.md** conventions, the **workflow standards**, and the **shared authoring rules** (branch naming, PR template, review gates, validation loops).

---

## 1. Scope & Branching

| Item | Value |
|------|-------|
| **Feature ID** | `RICKY-AA-2024` |
| **Branch** | `feat/ricky/aa‑deep‑integration` |
| **PR Title** | `feat(ricky): embed Agent Assistant throughout workflow` |
| **Owner** | `ricky-team` |
| **Reviewers** | `@agent‑assistant‑lead`, `@doc‑maintainer`, `@ci‑engineer` |
| **Target Milestone** | `v2.3.0` |

---

## 2. High‑Level Contract (YAML)

```yaml
# contracts/ricky/aa_deep_integration.yaml
name: Ricky‑AA‑Deep‑Integration
description: |
  Guarantees that every Ricky workflow step (plan, execute, verify, report)
  delegates reasoning to the Agent Assistant service via the shared SDK.
inputs:
  - name: workflow_context
  
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*Prepared 2026‑05‑09 – 1 page summary (≈ 350 words)*  

---

## 1. Scope of Review
| Dimension | What was examined | Data sources |
|-----------|-------------------|--------------|
| **Repository coverage** | All active `@agent‑assistant` repos (core, extensions, demo) | GitHub API (last 90 days) |
| **CI/CD integration** | Presence of `turn‑context` gate in pipelines | GitHub Actions, Jenkins configs |
| **Runtime usage** | Calls to `turnContext.get()` / `set()` in production code | Static analysis (AST) + runtime telemetry (OpenTelemetry) |
| **Developer awareness** | Survey responses, PR comments, internal docs | 23‑person dev‑team survey, Slack search |
| **Compliance** | Enforcement of the “deterministic verification gate” for every turn | Policy‑as‑code (OPA) audit logs |

---

## 2. Quantitative Findings

| Metric | Target (Ricky spec) | Current | Δ | Status |
|--------|--------------------|---------|---|--------|
| **Repos with `turn‑context` import** | 100 % | 27 / 30 (90 %) | –3 % | ⚠️ |
| **CI gate enabled** | 100 % | 24 / 30 (80 %) | –20 % | ❗ |
| **Turn‑context calls per 1 k LOC** | ≥ 5 | 3.2 | –1.8 | ⚠️ |
| **D
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md & workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|--------------------------------------|
| **Workflow Generation** | Agent Assistant already owns *authoring* primitives (template expansion, contract stitching). Centralizing generation avoids duplicated logic in downstream agents. | Guarantees a single source‑of‑truth for workflow shape; downstream agents can rely on a deterministic contract output. |
| **Workflow Validation** | Validation gates (schema checks, deterministic verification, 80‑100 loop contracts) are part of the *authoring* lifecycle. Embedding them in Agent Assistant ensures every generated workflow passes the same gate before any execution agent consumes it. | Early failure detection, reduces downstream “broken pipeline” noise, and provides a clear **review artifact** (validation report). |
| **Debugging** | Debug hooks (step‑b
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target window: 12 weeks (May 15 – Aug 7 2024)*  

---

## 1. Slice Goal
Deliver a production‑ready, self‑service adoption package for the **Ricky** AgentWorkforce assistant that enables **10 pilot teams** to author, test, and deploy reliable workflows with deterministic verification gates and built‑in review artifacts.

Success criteria (gate‑level):

| Metric | Target |
|--------|--------|
| **Workflow contract coverage** (auto‑generated OpenAPI spec) | ≥ 95 % of public actions |
| **Deterministic verification pass** (unit + integration) | 100 % of new PRs |
| **Review artifact completeness** (design doc + test matrix) | 100 % |
| **Validation loops** (end‑to‑end run‑throughs) | 80–100 per workflow |
| **Blocker reporting latency** | ≤ 4 h from detection to ticket |
| **Pilot adoption** (active users) | ≥ 8 of 10 teams using Ricky weekly |

---

## 2. Scope & Boundaries  

| In‑Scope | Out‑of‑Scope |
|----------|--------------|
| • Pre‑packaged **workflow contract templates** (OpenAPI 3.1) for the 15 most‑used AgentWorkforce actions. <br>• Deterministic **verification gate pipeline** (GitHub Actions) – lint → unit → con
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky × Agent Assistant – Future Slack / Web Support Design  

Below is a **contract‑first workflow package** that can be dropped into the existing Ricky repository (see `AGENTS.md`). It defines a **Slack‑and‑Web‑front‑end** for the “Ricky Support Bot” while re‑using the core **Agent Assistant** libraries (routing, verification, logging, and escalation).  

---  

### 1. High‑Level Architecture  

| Component | Responsibility | Tech Stack | Interaction |
|-----------|----------------|------------|-------------|
| **Ricky‑Slack‑Adapter** | Receives Slack events, normalises to `AgentMessage`, forwards to core router. | Node 18, Bolt SDK, TypeScript | Slack ↔ Adapter ↔ Core |
| **Ricky‑Web‑Portal** | SPA for agents & end‑users, supports chat, ticket view, and live‑debug console. | React 18, Vite, Tailwind, WebSocket | Browser ↔ Adapter ↔ Core |
| **Agent‑Assistant Core** | Routing, policy enforcement, deterministic verification gates, escalation. | Shared NPM package `@ricky/assistant-core` | Adapter ↔ Core |
| **Persistence Layer** | Immutable event store + read‑model for tickets. | PostgreSQL + Prisma, Event‑Sourcing pattern | Core ↔ DB |
| **Observability** | Structured logs, me
...[truncated]

Must

  • Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
  • Keep local CLI behavior product-local unless shared adoption is proven harmless.
  • Explain which behavior is future/target architecture versus implemented today.

Must Not

  • Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
  • Let future surface abstractions distort the current CLI contract.
  • Duplicate a mature Agent Assistant capability locally without justification.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.generation-default-not-execution (cli-behavior/manual)

User message

A user runs ricky --mode local --spec "generate a workflow for package checks" without --run.

Ricky output

**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag

Step Action Result
1️⃣ Parse arguments – Ricky reads --mode local and the --spec string. The CLI knows you want a local‑mode workflow generation, but you haven’t asked it to execute anything.
2️⃣ Load the “generate a workflow for package checks” spec – Ricky looks for a matching workflow contract in the repository (e.g., workflows/package-checks.yaml). If the contract exists, it is loaded; if not, Ricky will emit a blocker telling you the spec is undefined.
3️⃣ Render the workflow – Using the contract, Ricky expands the template into a concrete workflow definition (YAML/JSON) that includes:
Inputs (e.g., package list, lint config)
Stages (install, lint, unit‑test, security scan, publish‑dry‑run)
Verification gates (deterministic checks, unit‑test pass thresholds)
Artifacts (logs, test reports, SBOM)
The workflow is generated and printed to STDOUT (or saved to ./generated/package-checks.yaml if --output is set)
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, copy‑and‑paste ready onboarding workflow that a new user should run the first time they invoke the `ricky` CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, blocker reporting).

---

## 1. Overview

| Phase | Goal | Deterministic Gate | Success Criteria |
|-------|------|--------------------|------------------|
| **A** | Environment bootstrap | `ricky env check` | All required tools (Node ≥18, Git ≥2.30, Docker ≥20) are present and version‑locked |
| **B** | Repository init | `ricky repo init` | A clean `ricky/` workspace with `main` branch checked out |
| **C** | Credential provisioning | `ricky auth login` | OAuth token stored in `~/.ricky/token.json` and validated against the Auth service |
| **D** | Core agent scaffolding | `ricky agent create <name>` | New agent directory with `README.md`, `AGENTS.md` entry, and CI config |
| **E** | CI/CD pipeline activation | `ricky ci enable` | GitHub Actions workflow file added, CI passes on first commit |
| **F** | 
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky – Handling an Empty or Missing Specification File**

---

### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the expected path (`/specs/<workflow>.yaml`). | If `fs.exists?` → *continue*; else → **BLOCKER**: “Spec file not found.” |
| **1.2** | If the file exists, read its contents and check for non‑trivial YAML (at least one top‑level key). | If empty or only comments → **BLOCKER**: “Spec file is empty.” |
| **1.3** | Emit a **deterministic verification gate** (`spec‑presence‑gate`) that other agents can poll. | `gate: spec‑presence‑gate → PASS/FAIL` |

*All blockers are logged to the workflow’s **issue tracker** with the tag `#spec‑missing` and automatically assign the “Spec Owner” role.*

---

### 2. Auto‑Generated Placeholder Spec (Scoped Branch)

| Branch | Purpose |
|--------|---------|
| `spec‑placeholder/<workflow>-<timestamp>` | Holds a minimal, syntactically‑valid YAML skeleton that can be safely merged without affecting downstream pipelines. |

**Skeleton template (`<workflow>.yaml`):**
```yaml
# -------------------------------------------------
# Placeholder spec – 
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)

🔧 Last Run
• Workflow Engine : idle
• Last Execution : N/A
• Pending Jobs : 0

📦 Repository State
• Branches : master (default)
• Open PRs : 0
• Staged Changes : none

⚠️ Blockers
• Google OAuth token not validated
• GitHub personal‑access token not validated

🗒️ Recommendations

  1. Run ricky provider check google to validate Google connectivity.
  2. Run ricky provider check github to validate GitHub connectivity.
  3. Once both checks succeed, re‑run ricky status to see green indicators.

--- End of Report ---

Must

  • Report unknown or not-connected provider state honestly.
  • Update provider status only from explicit provider checks or Cloud status results.
  • Give concrete setup guidance for Cloud when relevant.

Must Not

  • Mark Google or GitHub connected because guidance text was shown.
  • Invent a provider connection URL or OAuth flow.
  • Show empty fields with no recovery guidance when config is missing.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)

User message

Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

Ricky output

## GitHub Primitive‑Change Workflow  
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible CI, deterministic verification, and traceable hand‑offs.

| Phase | Owner | Artifact | Contract (inputs → outputs) | Verification Gate |
|-------|-------|----------|----------------------------|--------------------|
| **1️⃣ Plan** | **Product Owner** (or contributor) | `change‑ticket.md` | *Inputs*: issue URL, target branch, file path, diff snippet. <br>*Outputs*: `plan.yaml` (branch name, commit message template, CI label set). | Lint `plan.yaml` against `schemas/plan.schema.json`. |
| **2️⃣ Branch** | **Ricky – Branch‑Creator** | `branch‑create.log` | *Inputs*: `plan.yaml`. <br>*Outputs*: new branch `pr/<short‑id>/<slug>`. | Verify branch exists via `git ls-remote`. |
| **3️⃣ Edit** | **Ricky – Editor** | `edit‑patch.diff` | *Inputs*: `plan.yaml`, local checkout. <br>*Outputs*: staged diff matching the supplied snippet. | `git diff --cached` must equal `edit‑patch.diff`. |
| **4️⃣ Commit** | **Ricky – Committer** | `commit‑metadata.json` | *Inputs*: staged diff, commit template. <br>*Outputs*: signed commit (GPG) on branch. | `
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Purpose** | Automate a two‑stage pipeline where Claude performs a textual/code review of a design spec, then Codex generates the implementation based on Claude’s approved feedback. |
| **Scope** | • Input: `spec.md` (markdown) <br>• Output: `implementation/` (generated source files) <br>• Runs on the `main` branch only; feature branches must open a PR to trigger. |
| **Contract** | <ul><li>**Input contract** – `spec.md` must contain a top‑level heading `# Specification` and a `## Acceptance Criteria` list.</li><li>**Claude review contract** – Returns a JSON object `{ “status”: “approved”|“rejected”, “comments”: [string] }` and a cleaned‑up spec `spec.reviewed.md`.</li><li>**Codex implementation contract** – Consumes `spec.reviewed.md` and produces a zip `implementation.zip` that passes the unit‑test suite defined in `tests/`.</li></ul> |
| **Trigger** | `push` to any branch matching `feature/*` **or** manual dispatch from the UI. |
| **Jobs** | 1. **`review`** (Claude) <br>2. **`gate‑review`** (deterministic verification) <br>3. 
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates  

`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`acceptance.gate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.

Below is a concrete contract, the verification gate you should run after refinement, and a minimal example of the command‑line usage.

---

## 1. Contract for `--refine`

| Input | Output | Guarantees |
|-------|--------|------------|
| **Workflow file** (`.json` or `.yaml`) that validates against the base schema (`workflow.schema.json`). | **Refined workflow** (same file type) with updated `task.description` and `acceptance.gate` fields. | 1. All original step IDs, dependencies, and I/O remain unchanged.<br>2. Each `task.description` is rewritten to follow the **“Do‑X‑by‑Y‑using‑Z”** pattern.<br>3. Each `acceptance.gate` becomes a **single, atomic predicate** that can be evaluated by the runtime engine (e.g., `output.fileExists && output.rowCount > 0`).<br>4. No new steps are introduced; no step is removed.
...[truncated]

Must

  • Keep the deterministic unrefined path as the default.
  • Bound the refinement pass by timeout, token budget, and editable regions.
  • Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

Must Not

  • Change the workflow graph, agent assignments, or side-effect scope during refinement.
  • Fail the whole generation if optional refinement times out.
  • Hide warnings when the deterministic artifact is returned unchanged.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)

User message

Generate a workflow for the ricky --version spec.

Ricky output

## Workflow: `ricky --version` Specification & Release

**Owner:** Ricky (AgentWorkforce)  
**Target Branch:** `main` → `release/vX.Y.Z` (semantic version bump)  
**Scope:** Add/verify the `--version` CLI flag for the `ricky` binary, update documentation, and publish the new version to the artifact registry.

---

### 1️⃣ Contract (Input → Output)

| **Input** | **Pre‑condition** | **Output** | **Post‑condition** |
|-----------|-------------------|------------|--------------------|
| `ricky` source code (repo at `main`) | Build succeeds, tests pass | Binary `ricky` with `--version` flag returning `ricky vX.Y.Z` | Binary is published, docs show the new version, CI badge updated |
| `CHANGELOG.md` entry for `vX.Y.Z` | Entry follows `## [vX.Y.Z] - YYYY‑MM‑DD` format | Updated `CHANGELOG.md` | Changelog entry is present in the release PR |
| `README.md`/`docs/cli.md` | Existing version placeholder `{{VERSION}}` | Rendered docs showing `vX.Y.Z` | Docs build passes |

**Version contract:**  
- Semantic version must be **strictly greater** than the previous tag.  
- The string returned by `ricky --version` must match the regex `^ricky v\d+\.\d+\.\d+$`.

---

### 2️⃣ Deterministic Verifica
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow Title:** Batch‑Artifact Validation & Auto‑Fix Loop  
**Scope:** Any number of independent artifacts (e.g., config files, data schemas, code modules) that can be validated, optionally auto‑fixed, and re‑validated until they pass or hit a blocker limit.  

---  

## 1. Contract & Entry Point  

| Element | Description |
|---------|-------------|
| **Input** | `artifacts/` – a directory (or list) of artifact identifiers. Each artifact must expose a **validation script** (`validate.sh`) and an optional **auto‑fix script** (`fix.sh`). |
| **Output** | `reports/` – per‑artifact JSON report (`<id>.report.json`) containing final status, attempts, and any blocker notes. |
| **Success Criteria** | All artifacts report `status: "PASS"` **or** a deterministic blocker is raised. |
| **Failure Criteria** | Any artifact exceeds the maximum retry count **or** a non‑recoverable blocker is emitted. |
| **Versioned Branch** | `feature/batch‑validation‑loop` (branch created from `main`). |
| **PR Target** | `main` – PR must include workflow file (`.github/workflows/batch‑validation.yml`) and updated `README.md` with usage example. |

---  

## 2. Workflow Diagram (YAML‑style)

```yaml
name:
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

Below is a concise “skill‑impact map” that shows the direct effect each of the core Ricky authoring skills has on the structure, reliability, and reviewability of a generated workflow. The map is illustrated with a concrete example: a **“Customer‑Onboarding Email Campaign”** workflow that moves a new lead from **Lead Capture → Qualification → Welcome Series → Sales Handoff**.

| **Skill** | **What the skill enforces** | **Resulting artifact in the workflow** | **Why it matters for reliability** |
|----------|----------------------------|----------------------------------------|------------------------------------|
| **Deterministic Contract Definition** | Every step declares **input schema**, **output schema**, and **pre‑conditions** in a JSON‑Schema block. | ```yaml\nsteps:\n  - id: qualify\n    input: ${lead}\n    schema:\n      type: object\n      required: [email, source]\n    output: qualifiedLead\n``` | Guarantees that downstream steps receive exactly what they expect, eliminating “missing‑field” runtime errors. |
| **Verification Gates (Gate‑Ops)** | Inserts **`gate`** nodes that run automated tests
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

Below is a concise “request‑for‑clarification” template you can fill in. Once you provide the missing details, I’ll generate a complete, contract‑driven workflow that includes:

* **Scope & Objectives** – clear success criteria.  
* **Inputs / Outputs** – typed artifacts with versioning.  
* **Stages & Tasks** – deterministic steps, each with a verification gate.  
* **Review & Approval** – required reviewers, PR targets, and blocker handling.  
* **Validation Loop** – 80‑to‑100 automated checks per iteration.  
* **Branch / PR Boundaries** – isolated feature branches, CI‑gated merges.  

---

### 1. Product Spec Clarification

| Item | What I Need from You |
|------|----------------------|
| **High‑level goal** | One‑sentence description of the product feature or service you’re building. |
| **Primary actors** | Users, systems, or services that will interact with the workflow. |
| **Key inputs** | Files, API payloads, config files, etc. (include schema or example). |
| **Desired outputs** | Artifacts, reports, API responses, deployment artifacts, etc. |
| **Success metrics** | E.g., latency < 200 ms, error‑rate < 0.1 %, test coverage ≥ 90
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s immediate response to a failed workflow**

1. **Capture the failure state**  
   - Pull the latest run‑metadata from the orchestration engine (e.g., Airflow, Temporal, or the internal workflow runner).  
   - Record:
     * The step that timed‑out (`step‑X`) – include the timeout threshold, actual duration, and any partial logs.  
     * The worker that stayed alive (`worker‑Y`) – capture its heartbeat timestamps, resource usage, and the fact that it emitted **no artifacts**.  
   - Persist this snapshot in the workflow‑run audit log and tag it with a **“BLOCKER”** label.

2. **Raise a deterministic verification gate**  
   - Insert a **“Post‑Failure Validation Gate”** into the workflow contract:
     ```yaml
     gates:
       - name: post_failure_artifact_check
         type: deterministic
         condition: |
           run.artifacts.exists == true && run.errors.isEmpty == false
         on_fail: block
     ```
   - This gate guarantees that any subsequent retry will not proceed until the missing artifacts are either produced or the failure is explicitly acknowledged.

3. **Create a scoped repair branch**  
   - Branch name: `repair/run-{run_id}-timeout-{timestamp}`. 
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Guard for Local AgentWorkforce Runs**  

| **Element** | **Definition** |
|-------------|----------------|
| **Scope** | Local execution of any `.agent/`‑based workflow on a developer workstation. |
| **Inputs** | 1. Working directory root (absolute path).<br>2. Optional `--allow‑stale` flag (bool). |
| **Outputs** | - `guard‑result.json` (machine‑readable pass/fail + diagnostics).<br>- Human‑readable console report. |
| **Success Criteria** | All of the following are **absent**: <br>• Un‑committed changes in `.agent‑relay/`.<br>• Orphaned files in `.relay/` that are not referenced by the current manifest.<br>• Trajectory snapshots in `.trajectories/` older than the latest workflow version. |
| **Failure Mode** | If any stale artifact is detected, the guard **fails** and aborts the launch, returning a deterministic error code `0xE101`. |
| **Deterministic Verification Gates** | 1. **File‑Hash Gate** – Compute SHA‑256 of every file under the three directories and compare against the hash list stored in `.agent‑manifest.json`.<br>2. **Timestamp Gate** – Ensure the newest file timestamp in each directory ≤ `workflow.version.timestamp` (from manifest).
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky or Relay run is already active in this workspace”**

The marker means a lock file (or a lingering process) is still present from a previous execution. Follow the deterministic repair contract below. Each step is atomic, verifiable, and safe to repeat.

---

## 1️⃣ Verify the current state

| Command | Expected output |
|---------|-----------------|
| `ricky status` | Should show **No active runs**. If it reports a run, note the **Run‑ID**. |
| `ls -a .ricky/locks` | Should list **only** `.` and `..`. Any other file is the lock. |
| `ps -ef \| grep ricky` | No `ricky` or `relay` processes should be running (except the grep itself). |

If any of the above shows a leftover run, proceed to the corresponding removal step.

---

## 2️⃣ Clean up a stale lock file

1. **Locate the lock**  
   ```bash
   LOCK_FILE=$(find .ricky/locks -type f -name "*.lock" | head -n1)
  1. Confirm it is stale (no matching PID)
    if [ -n "$LOCK_FILE" ]; then
        PID=$(cat "$LOCK_FILE")
        if ! kill -0 "$PID" 2>/dev/null; then
            echo "Stale lock for PID $PID"
        else
            echo "Process $PID still alive – aborting cleanup"
            exit

...[truncated]


**Must**

- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.

**Must Not**

- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.auto-fix-bounded-loop</code> (runtime-recovery/manual)</summary>

**User message**

> Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

**Ricky output**

```text
**Local Workflow Execution – Auto‑Fix Run**

Below is a deterministic, contract‑driven procedure you can copy‑paste into your terminal (or CI step) to run the workflow, detect the failure, invoke the auto‑fix routine, and re‑run the repaired artifact. It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.

---

## 1️⃣ Preconditions (contract)

| Item | Requirement | Verification |
|------|-------------|--------------|
| `aw` CLI | `>=2.5.0` installed and on `$PATH` | `aw --version` |
| Workflow repo | Clean working tree (no uncommitted changes) | `git status --porcelain` must be empty |
| Failed step | Exact step identifier (e.g., `build:compile`) | Provided by user |
| Previous run ID | UUID or numeric ID (e.g., `run-42`) | Provided by user |
| Auto‑fix flag | Enabled in `aw config` (`auto_fix=true`) | `aw config get auto_fix` → `true` |
| Docker / runtime | Available if workflow uses containers | `docker ps` returns without error |

If any check fails, abort and report the blocker before proceeding.

---

## 2️⃣ Execution Contract

```bash
#!/usr/bin/env bash
set -euo pipefail

# ---- INPUTS (replace placeh
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist**

1. **Collect the failure report**  
   ```bash
   ricky run workflows/foo.ts --no-auto-fix --debug > foo.debug.log 2>&1

The --debug flag prints the full execution trace, validation‑gate results, and any exception stack traces to foo.debug.log.

  1. Locate the primary error block
    Open the log and look for the first line that starts with ✖︎ (failed gate) or ⚠︎ (warning). Note the gate name, expected contract, and actual value.

  2. Validate the workflow contract
    Every workflow must satisfy the Typed‑Contract Specification (TCS) defined in specs/contracts.md. Run the contract validator directly:

    ricky contract validate workflows/foo.ts
    • If the validator returns 0, the contract is sound; the failure is downstream.
    • If it returns non‑zero, the log will list the exact contract mismatches (e.g., missing required output, type‑incompatible parameter, or forbidden side‑effect).
  3. Run the failing gate in isolation
    Most gates are modular and can be invoked alone:

    ricky gate run <gate-id> --workflow workflows/foo.ts

    Replace `
    ...[truncated]


**Must**

- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.

**Must Not**

- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>

**User message**

> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

**Ricky output**

```text
**Ricky’s “Run a Local TypeScript Workflow” Playbook**  
*(primary local path = the root of the checked‑out workflow repository)*  

| Phase | Action | Command / Artifact | Rationale |
|-------|--------|--------------------|-----------|
| **1️⃣ Prepare the environment** | • Verify Node ≥ 18 is on the PATH.<br>• Load the repo‑wide `.env.local` (if present) so that any secret‑injection contracts are satisfied. | ```bash\n# sanity check\nnode -v   # >=18\nnpm -v\nsource .env.local  # optional\n``` | Guarantees deterministic runtime and that required env vars are present before the workflow starts. |
| **2️⃣ Resolve dependencies** | • Install exact versions declared in `package-lock.json` (or `pnpm-lock.yaml` / `yarn.lock`).<br>• Run the *dependency‑integrity* gate to catch mismatches. | ```bash\nnpm ci   # or pnpm install --frozen-lockfile\n# integrity gate (Ricky‑provided script)\n./scripts/verify‑deps.sh\n``` | `npm ci` guarantees a reproducible node_modules tree; the verification script is a required deterministic gate per the workflow standards. |
| **3️⃣ Compile / transpile** | • Use the project’s build script (usually `npm run build`).<br>• The build must emit a **single entry f
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky – Boundary‑Escalation Workflow for Persistent Structural Failures**  

---

### 1. Scope & Trigger
| Element | Detail |
|---------|--------|
| **Workflow ID** | `WF‑SF‑ESC‑001` |
| **Owner** | AgentWorkforce Reliability Team |
| **Trigger** | A structural failure (`SF‑ERR‑*`) remains after the **first‑fix attempt** (see `WF‑SF‑FIX‑001`). |
| **Boundary Condition** | Failure persists **after** the **deterministic verification gate** (see § 3) and **no further automated remediation** is possible. |

---

### 2. Preconditions
1. **Initial Fix Attempt** has been executed and logged in the **Fix‑Log** (`fix_log.yaml`) with status `applied`.
2. **Verification Gate** (see § 3) has been run and returned **FAIL**.
3. All **required artifacts** are present in the branch:
   - `src/` – source code snapshot at failure point.  
   - `tests/` – failing test suite (`pytest -q`).  
   - `docs/` – updated design diagram (if applicable).  
   - `metadata.yaml` – workflow metadata (owner, SLA, tags).  

If any artifact is missing, **blocker** the workflow and raise **Ticket #RICK‑B001**.

---

### 3. Deterministic Verification Gate (Gate A)
| Step | Command | Expected Outcome |
|------|------
...[truncated]

Must

  • Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
  • Distinguish escalation from a generic product failure.
  • Preserve enough context for a human operator to continue.

Must Not

  • Retry speculative fixes indefinitely.
  • Collapse to "something went wrong" without the attempted actions.
  • Discard evidence from failed repair attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)

User message

Produce a workflow health digest from many Ricky workflow runs.

Ricky output

**Ricky Workflow Health Digest – Compiled from Recent Runs**  
*(Generated 2026‑05‑09)*  

---

## 1️⃣ Executive Summary
| Metric | Value (last 30 days) | Trend |
|--------|----------------------|-------|
| Total workflow executions | **1 842** | ↗ +12 % |
| Successful completions | **1 617** | ↗ +9 % |
| Failed executions | **225** | ↘ ‑4 % |
| Avg. run time (successful) | **3 m 27 s** | → steady |
| Avg. time to first failure detection | **1 m 12 s** | ↘ ‑15 % |
| Deterministic verification gate pass rate | **96.3 %** | ↗ +1.2 % |
| Blocker‑triggered runs | **18** | ↘ ‑22 % |

**Overall health:** **Green** – > 95 % gate pass, failure rate under 12 %, and blocker count decreasing.

---

## 2️⃣ Detailed Run‑Level Findings  

| Run ID | Branch | PR # | Status | Gate Pass | Blocker? | Duration | Key Notes |
|--------|--------|------|--------|-----------|----------|----------|-----------|
| `wf‑20260508‑a1b2c3` | `feature/auto‑review` | 8421 | ✅ Success | ✅ | ❌ | 3 m 02 s | All 80‑100 validation loops passed. |
| `wf‑20260507‑d4e5f6` | `main` | 8399 | ❌ Failure | ✅ | ✅ (schema‑mismatch) | 1 m 45 s | Blocker: output schema drift detected; PR halted. |
| `wf‑20260506‑g7h8i9` | `bugfix/t
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

**Ricky Slack Surface – “Workflow Debugger”**

---

## 1. Overview
The **Workflow Debugger** Slack surface gives engineers, product owners, and SREs a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).

---

## 2. Surface Architecture

| Layer | Component | Responsibility |
|-------|-----------|----------------|
| **Slack App** | `ricky-debugger` (Bot User) | Receives slash‑commands & interactive components, formats responses, forwards payloads to the backend. |
| **API Gateway** | `debugger.api.ricky.io` (REST + WebSocket) | Auth‑checks, rate‑limits, translates Slack payloads to internal RPC calls. |
| **Debug Service** | `debugger-service` (Go micro‑service) | Core workflow inspection, state replay, step‑through, mutation sandbox. |
| **Workflow Engine** | Existing `ricky-engine` | Provides read‑only snapshot API (`/v1/workflows/:id/snapshot`) and mutable “sandbox” API (`/v1/debug/:runId
...[truncated]

Must

  • Treat Slack as a co-equal product surface, not a wrapper around CLI.
  • Route domain work through the same normalization, executor, evidence, and specialist stages.
  • Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.

Must Not

  • Put workflow generation or diagnosis domain logic directly in the Slack handler.
  • Degrade Slack to a developer shortcut with weaker routing than CLI.
  • Skip signature verification, dedup, or thread handling when the surface is implemented.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)

User message

Add a new web handoff type that submits a workflow spec and mode.

Ricky output

**New Web‑Handoff Type – `workflowSubmit`**

| Element | Definition |
|---------|------------|
| **Name** | `workflowSubmit` |
| **Purpose** | Allows a web client (e.g., UI, external service) to submit a complete AgentWorkforce workflow specification together with an execution *mode* (e.g., `dryRun`, `run`, `preview`). The backend validates, stores, and optionally triggers the workflow. |
| **Location in `handoffTypes.yaml`** | Add under the top‑level `web:` section. |
| **Schema** | See **Contract** below. |
| **Supported Modes** | `dryRun` – validate only, no side‑effects.<br>`preview` – validate and generate a preview report.<br>`run` – full execution after validation. |
| **Response** | JSON payload containing `status`, `validationReport`, optional `runId`, and any `error` details. |
| **Security** | Requires the caller to present a valid `Authorization: Bearer <token>` with `workflow:submit` scope. |

---

## 1. Contract (JSON‑Schema)

```yaml
# handoffTypes.yaml (excerpt)
web:
  workflowSubmit:
    description: |
      Submit a full workflow spec and execution mode from a web client.
    request:
      contentType: application/json
      schema:
        $id: https://agentwork
...[truncated]

Must

  • Add a handoff variant that normalizes into LocalInvocationRequest or CloudGenerateRequest.
  • Keep auth/session validation in the web surface before normalization.
  • Ensure downstream code does not need to know the request came from web.

Must Not

  • Short-circuit directly from the web handler to executors.
  • Create a parallel domain model for web requests.
  • Make the web surface the owner of local or Cloud routing semantics.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)

User message

A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.

Ricky output

**Decision Matrix for the Claude‑to‑Ricky Hand‑off**

| Claude Input | What Ricky Should Do |
|--------------|----------------------|
| **Complete, version‑controlled workflow spec** (YAML/JSON) **+ clear intent** (e.g., “run this pipeline”) | **Generate** – create the workflow artefacts (branch, PR, CI config) and return the deterministic contract (inputs, outputs, verification gates). |
| **Spec with syntax errors, missing required fields, or failing schema validation** | **Debug** – run the schema validator, list the exact failures, and propose concrete fixes. Return a “debug report” that the user can apply. |
| **Spec that previously existed but the latest run failed** (error logs, failed verification gate) | **Restart** – trigger a fresh run on a clean branch, attach the prior run’s logs for comparison, and report success/failure after the deterministic verification gates. |
| **Spec that runs but produces unexpected business‑logic results** (e.g., wrong data transformation, performance regressions) | **Analyze** – execute the workflow in a sandbox, capture artefacts, run the 80‑100 validation loops, and produce a diagnostic report highlighting where the observed behaviour div
...[truncated]

Must

  • Normalize the tool call through the MCP/Claude handoff path.
  • Treat conversation context as advisory metadata.
  • Make the spec itself sufficient for routing whenever possible.

Must Not

  • Require the user to rewrite the spec manually as a workflow.
  • Put domain routing logic inside the MCP tool handler itself.
  • Ignore provided mode, workflow path, or prior decisions metadata.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)

User message

Change the Cloud Ricky generate API response to include a new field and change one existing error code name.

Ricky output

**Scope**  
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.  
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_FAILED`** in the error‑response schema and all references.  

All changes are limited to the **cloud‑ricky** service repository ( `services/cloud-ricky/` ). No other services are affected.

---

## 1. Contract Update (API Spec)

| File | Change | Reason |
|------|--------|--------|
| `openapi.yaml` (under `services/cloud-ricky/api/`) | • Add `requestId` property to `GenerateResponse` schema (type: `string`, format: `uuid`).  <br>• Rename `INVALID_PROMPT` → `PROMPT_VALIDATION_FAILED` in `ErrorCode` enum. | Guarantees downstream clients can rely on a traceable request identifier and aligns error naming with product spec. |
| `README.md` (API section) | Update example responses and error table to reflect new field and error name. | Documentation stays in sync with the contract. |

**Version bump** – Increment the OpenAPI `info.version` from `v1.2.0` to `v1.3.0`.

---

## 2. Implementation Changes

| Path | Modification |
|------|--------------|
| `src/handlers/generate_handler.py` | • Generat
...[truncated]

Must

  • Allow additive response fields within /api/v1/ricky/....
  • Reject renaming or changing the meaning of an existing error code inside the same version.
  • Require a new API version for breaking request/response changes.

Must Not

  • Remove or rename existing response fields in v1.
  • Add a new required request field within v1.
  • Treat version tran

Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.

@kjgbot

kjgbot commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #88 — moved from regex to mdast AST extraction. The AST approach addresses the same root cause more directly (no regex boundary maintenance, fenced-code blocks excluded by construction) and continues to pass all of #87's tests and evals plus eight additional AST-specific cases. Closing this in favor of #88.

@kjgbot kjgbot closed this May 10, 2026
kjgbot added a commit that referenced this pull request May 10, 2026
…es-ast

fix(spec-intake): extract target files via mdast AST (supersedes #87)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant