fix(spec-intake): extract target file paths from backticked prose and ## Target Files blocks#87
fix(spec-intake): extract target file paths from backticked prose and ## Target Files blocks#87kjgbot wants to merge 2 commits into
## Target Files blocks#87Conversation
… `## Target Files` blocks The spec parser's PATH_PATTERN regex required a path to be preceded by whitespace or start-of-line, so paths inside markdown backticks (the natural convention) never matched. A spec like - Update `packages/web/app/api/v1/workflows/run/route.ts` to accept ... extracted exactly zero target files. Worse, lower-precision tokens like `base/head` from prose (`PR number, base/head SHA`) DID match and ended up as the sole `targetFiles` entry. With `targetFiles` effectively empty, the renderer fell into the manifest-driven "single artifact under workflows/generated/" branch — even when the spec named multiple real source files for editing. Three coordinated changes: 1. Extend PATH_PATTERN to accept paths preceded by backticks, single quotes, double quotes, parens, or angle brackets — the typical markdown wrappers — without losing the existing whitespace boundary. 2. Add `looksLikeRealPath` to suppress two-segment prose noise that has no extension and no recognized leading directory prefix (e.g. `base/head`, `my-org/my-repo`). Paths with an extension, three or more segments, or a known prefix (`packages/`, `src/`, `tests/`, `workflows/`, `.github/`, etc.) still pass. The existing `workflows/wave2-product` regression case remains green. 3. Add `extractTargetFilesBlock` to recognize a literal `## Target Files` markdown section. When present, its bullet list takes precedence over prose extraction so spec authors can be explicit about the source-change boundary. Bullets, surrounding backticks, and quotes are stripped. Also surface `target_files` in the local generation `--json` output so callers (and deterministic evals) can verify the extracted scope without parsing the rendered workflow body. Verification: - `src/product/spec-intake/parser.test.ts` adds six cases covering backticked paths, paren/quote-wrapped paths, prose-noise suppression, structured-block precedence, prose fallback, and 3+ segment paths. - `evals/suites/generation-quality/cases.md` adds three deterministic evals (`target-files-from-backticked-prose`, `-from-structured-block`, `-suppresses-prose-noise`) that drive the real CLI and assert `target_files` in the JSON output. - Full vitest suite: 1034/1034 green; full eval suite: 8/8 deterministic pass, 0 failed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
To use Codex here, create a Codex account and connect to github. |
|
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughParser now extracts repository file paths from markdown prose or a prioritized ChangesTarget Files Extraction Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/product/spec-intake/parser.ts`:
- Around line 879-882: The extractTargetFiles function currently falls back to
prose extraction when extractTargetFilesBlock(text) returns an empty array,
which incorrectly ignores a present-but-empty "## Target Files" block; change
the logic so that extractTargetFilesBlock distinguishes "block not present" from
"block present but empty" (e.g., return null or an object with a flag) and make
extractTargetFiles return the empty array when the block is present but empty
instead of running prose extraction; update or add a regression test for
extractTargetFiles that supplies a text with a present-but-empty "## Target
Files" section to assert it returns an empty array.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: ac31f4b5-bf82-4191-b2f9-a88ad2b68dfd
📒 Files selected for processing (5)
evals/suites/generation-quality/cases.jsonlevals/suites/generation-quality/cases.mdsrc/local/entrypoint.tssrc/product/spec-intake/parser.test.tssrc/product/spec-intake/parser.ts
Ricky Eval ReviewRun: Passed: 8 | Needs human: 42 | Reviewable: 42 | Missing output: 0 | Failed: 0 | Skipped: 0 Human Review CasesThese cases passed deterministic checks and include captured Ricky output for a human verdict against their REVIEW
|
| Step | Action | Result |
|---|---|---|
| 1️⃣ | Parse arguments – Ricky reads --mode local and the --spec string. |
The CLI knows you want a local‑mode workflow generation, but you haven’t asked it to execute anything. |
| 2️⃣ | Load the “generate a workflow for package checks” spec – Ricky looks for a matching workflow contract in the repository (e.g., workflows/package-checks.yaml). |
If the contract exists, it is loaded; if not, Ricky will emit a blocker telling you the spec is undefined. |
| 3️⃣ | Render the workflow – Using the contract, Ricky expands the template into a concrete workflow definition (YAML/JSON) that includes: • Inputs (e.g., package list, lint config) • Stages (install, lint, unit‑test, security scan, publish‑dry‑run) • Verification gates (deterministic checks, unit‑test pass thresholds) • Artifacts (logs, test reports, SBOM) |
The workflow is generated and printed to STDOUT (or saved to ./generated/package-checks.yaml if --output is set) |
| ...[truncated] |
**Must**
- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.
**Must Not**
- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>
**User message**
> Render Ricky's first-run CLI onboarding for a new user.
**Ricky output**
```text
**Ricky – First‑Run CLI Onboarding**
Below is the complete, copy‑and‑paste ready onboarding workflow that a new user should run the first time they invoke the `ricky` CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and the **shared authoring rules** (scoped branches/PRs, blocker reporting).
---
## 1. Overview
| Phase | Goal | Deterministic Gate | Success Criteria |
|-------|------|--------------------|------------------|
| **A** | Environment bootstrap | `ricky env check` | All required tools (Node ≥18, Git ≥2.30, Docker ≥20) are present and version‑locked |
| **B** | Repository init | `ricky repo init` | A clean `ricky/` workspace with `main` branch checked out |
| **C** | Credential provisioning | `ricky auth login` | OAuth token stored in `~/.ricky/token.json` and validated against the Auth service |
| **D** | Core agent scaffolding | `ricky agent create <name>` | New agent directory with `README.md`, `AGENTS.md` entry, and CI config |
| **E** | CI/CD pipeline activation | `ricky ci enable` | GitHub Actions workflow file added, CI passes on first commit |
| **F** |
...[truncated]
Must
- Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
- End every branch with a concrete next step.
- Advertise only commands that are currently implemented.
Must Not
- Sound like a launch page or documentation dump.
- Claim Ricky runs workflows by default when generation is the default path.
- Require web or Slack onboarding before CLI use.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)
User message
A user gives Ricky an empty spec or a missing spec file.
Ricky output
**Ricky – Handling an Empty or Missing Specification File**
---
### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the expected path (`/specs/<workflow>.yaml`). | If `fs.exists?` → *continue*; else → **BLOCKER**: “Spec file not found.” |
| **1.2** | If the file exists, read its contents and check for non‑trivial YAML (at least one top‑level key). | If empty or only comments → **BLOCKER**: “Spec file is empty.” |
| **1.3** | Emit a **deterministic verification gate** (`spec‑presence‑gate`) that other agents can poll. | `gate: spec‑presence‑gate → PASS/FAIL` |
*All blockers are logged to the workflow’s **issue tracker** with the tag `#spec‑missing` and automatically assign the “Spec Owner” role.*
---
### 2. Auto‑Generated Placeholder Spec (Scoped Branch)
| Branch | Purpose |
|--------|---------|
| `spec‑placeholder/<workflow>-<timestamp>` | Holds a minimal, syntactically‑valid YAML skeleton that can be safely merged without affecting downstream pipelines. |
**Skeleton template (`<workflow>.yaml`):**
```yaml
# -------------------------------------------------
# Placeholder spec –
...[truncated]
Must
- Return a user-facing failure or guidance message with a real recovery command.
- Distinguish generation failure from execution failure.
- Show stack traces only when verbose diagnostic mode is requested.
Must Not
- Crash with an uncaught exception in normal mode.
- Suggest commands that do not exist.
- Pretend a missing spec was accepted.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)
User message
Render
ricky statuswhen no provider checks have proven Google or GitHub are connected.
Ricky output
**`ricky status` output (no provider checks have proven Google or GitHub are connected)**
=== AgentWorkforce Status Report ===
🟡 Provider Connectivity
• Google : NOT VERIFIED (no successful check)
• GitHub : NOT VERIFIED (no successful check)
🔧 Last Run
• Workflow Engine : idle
• Last Execution : N/A
• Pending Jobs : 0
📦 Repository State
• Branches : master (default)
• Open PRs : 0
• Staged Changes : none
• Google OAuth token not validated
• GitHub personal‑access token not validated
🗒️ Recommendations
- Run
ricky provider check googleto validate Google connectivity. - Run
ricky provider check githubto validate GitHub connectivity. - Once both checks succeed, re‑run
ricky statusto see green indicators.
--- End of Report ---
Must
- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.
Must Not
- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-matcher-registry-backed (generation-quality/manual)
User message
Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.
Ricky output
## GitHub Primitive‑Change Workflow
**Purpose:** Apply a single‑file edit (e.g., README typo fix) to a public repository while guaranteeing reproducible CI, deterministic verification, and traceable hand‑offs.
| Phase | Owner | Artifact | Contract (inputs → outputs) | Verification Gate |
|-------|-------|----------|----------------------------|--------------------|
| **1️⃣ Plan** | **Product Owner** (or contributor) | `change‑ticket.md` | *Inputs*: issue URL, target branch, file path, diff snippet. <br>*Outputs*: `plan.yaml` (branch name, commit message template, CI label set). | Lint `plan.yaml` against `schemas/plan.schema.json`. |
| **2️⃣ Branch** | **Ricky – Branch‑Creator** | `branch‑create.log` | *Inputs*: `plan.yaml`. <br>*Outputs*: new branch `pr/<short‑id>/<slug>`. | Verify branch exists via `git ls-remote`. |
| **3️⃣ Edit** | **Ricky – Editor** | `edit‑patch.diff` | *Inputs*: `plan.yaml`, local checkout. <br>*Outputs*: staged diff matching the supplied snippet. | `git diff --cached` must equal `edit‑patch.diff`. |
| **4️⃣ Commit** | **Ricky – Committer** | `commit‑metadata.json` | *Inputs*: staged diff, commit template. <br>*Outputs*: signed commit (GPG) on branch. | `
...[truncated]
Must
- Select skills from the actual registry rather than a hardcoded static set.
- Record ranked skill matches with confidence and trigger evidence.
- Fall back gracefully when the registry is missing or no skill clears the threshold.
Must Not
- Claim runtime agents embody skills when only generation-time guidance was applied.
- Fail generation solely because optional skill files are missing.
- Hide skill selection evidence from artifacts or JSON output.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)
User message
Generate a workflow where the spec says "use Claude to review and Codex to implement".
Ricky output
**Workflow: Claude‑Review → Codex‑Implementation**
| **Section** | **Content** |
|-------------|-------------|
| **Name** | `review‑then‑implement` |
| **Purpose** | Automate a two‑stage pipeline where Claude performs a textual/code review of a design spec, then Codex generates the implementation based on Claude’s approved feedback. |
| **Scope** | • Input: `spec.md` (markdown) <br>• Output: `implementation/` (generated source files) <br>• Runs on the `main` branch only; feature branches must open a PR to trigger. |
| **Contract** | <ul><li>**Input contract** – `spec.md` must contain a top‑level heading `# Specification` and a `## Acceptance Criteria` list.</li><li>**Claude review contract** – Returns a JSON object `{ “status”: “approved”|“rejected”, “comments”: [string] }` and a cleaned‑up spec `spec.reviewed.md`.</li><li>**Codex implementation contract** – Consumes `spec.reviewed.md` and produces a zip `implementation.zip` that passes the unit‑test suite defined in `tests/`.</li></ul> |
| **Trigger** | `push` to any branch matching `feature/*` **or** manual dispatch from the UI. |
| **Jobs** | 1. **`review`** (Claude) <br>2. **`gate‑review`** (deterministic verification) <br>3.
...[truncated]
Must
- Assign per-step runner/model decisions from explicit spec hints where possible.
- Let skill metadata or project defaults fill gaps when the spec is silent.
- Write
tool-selection.jsonor equivalent audit metadata.
Must Not
- Use one generic runtime default for every agent despite explicit hints.
- Let skill defaults override explicit user/spec runner hints.
- Omit the reason each tool or model was chosen.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)
User message
Use
--refineto sharpen a generated workflow's step task descriptions and acceptance gates.
Ricky output
### Using `--refine` to Sharpen Step Descriptions & Acceptance Gates
`--refine` is a deterministic post‑processing flag that takes an existing workflow JSON/YAML and rewrites the **`task.description`** and **`acceptance.gate`** fields for every step. The goal is to make each step **action‑oriented, measurable, and verifiable** while keeping the original logical flow intact.
Below is a concrete contract, the verification gate you should run after refinement, and a minimal example of the command‑line usage.
---
## 1. Contract for `--refine`
| Input | Output | Guarantees |
|-------|--------|------------|
| **Workflow file** (`.json` or `.yaml`) that validates against the base schema (`workflow.schema.json`). | **Refined workflow** (same file type) with updated `task.description` and `acceptance.gate` fields. | 1. All original step IDs, dependencies, and I/O remain unchanged.<br>2. Each `task.description` is rewritten to follow the **“Do‑X‑by‑Y‑using‑Z”** pattern.<br>3. Each `acceptance.gate` becomes a **single, atomic predicate** that can be evaluated by the runtime engine (e.g., `output.fileExists && output.rowCount > 0`).<br>4. No new steps are introduced; no step is removed.
...[truncated]
Must
- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.
Must Not
- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.behavior-grounded-gates (generation-quality/manual)
User message
Generate a workflow for the
ricky --versionspec.
Ricky output
## Workflow: `ricky --version` Specification & Release
**Owner:** Ricky (AgentWorkforce)
**Target Branch:** `main` → `release/vX.Y.Z` (semantic version bump)
**Scope:** Add/verify the `--version` CLI flag for the `ricky` binary, update documentation, and publish the new version to the artifact registry.
---
### 1️⃣ Contract (Input → Output)
| **Input** | **Pre‑condition** | **Output** | **Post‑condition** |
|-----------|-------------------|------------|--------------------|
| `ricky` source code (repo at `main`) | Build succeeds, tests pass | Binary `ricky` with `--version` flag returning `ricky vX.Y.Z` | Binary is published, docs show the new version, CI badge updated |
| `CHANGELOG.md` entry for `vX.Y.Z` | Entry follows `## [vX.Y.Z] - YYYY‑MM‑DD` format | Updated `CHANGELOG.md` | Changelog entry is present in the release PR |
| `README.md`/`docs/cli.md` | Existing version placeholder `{{VERSION}}` | Rendered docs showing `vX.Y.Z` | Docs build passes |
**Version contract:**
- Semantic version must be **strictly greater** than the previous tag.
- The string returned by `ricky --version` must match the regex `^ricky v\d+\.\d+\.\d+$`.
---
### 2️⃣ Deterministic Verifica
...[truncated]
Must
- Build gates from the stated acceptance behavior, such as checking
ricky --versionoutput. - Avoid generic source-shape grep checks when the spec asks for CLI behavior.
- Keep generated validation meaningful for the current repo shape.
Must Not
- Treat
grep -Eq 'export|function|class|workflow(' dist/ricky.jsas proof of version behavior. - Claim the workflow is proven by source syntax alone.
- Ignore the package-json version resolution order in the spec.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)
User message
Generate a workflow for many independent artifacts with a validation/fix/rerun loop.
Ricky output
**Workflow Title:** Batch‑Artifact Validation & Auto‑Fix Loop
**Scope:** Any number of independent artifacts (e.g., config files, data schemas, code modules) that can be validated, optionally auto‑fixed, and re‑validated until they pass or hit a blocker limit.
---
## 1. Contract & Entry Point
| Element | Description |
|---------|-------------|
| **Input** | `artifacts/` – a directory (or list) of artifact identifiers. Each artifact must expose a **validation script** (`validate.sh`) and an optional **auto‑fix script** (`fix.sh`). |
| **Output** | `reports/` – per‑artifact JSON report (`<id>.report.json`) containing final status, attempts, and any blocker notes. |
| **Success Criteria** | All artifacts report `status: "PASS"` **or** a deterministic blocker is raised. |
| **Failure Criteria** | Any artifact exceeds the maximum retry count **or** a non‑recoverable blocker is emitted. |
| **Versioned Branch** | `feature/batch‑validation‑loop` (branch created from `main`). |
| **PR Target** | `main` – PR must include workflow file (`.github/workflows/batch‑validation.yml`) and updated `README.md` with usage example. |
---
## 2. Workflow Diagram (YAML‑style)
```yaml
name:
...[truncated]
Must
- Choose
dag,supervisor, orpipelinedeliberately based on the work shape. - Explain the pattern choice in artifact metadata or a rationale.
- Use
dagfor validation/fix/rerun loops when dependencies matter.
Must Not
- Default blindly to
dagfor every workflow. - Collapse independent artifact work into one vague agent task.
- Omit verification gates because the chosen pattern seems obvious.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)
User message
Describe how selected workflow-writing skills affected a generated Ricky workflow.
Ricky output
### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow
Below is a concise “skill‑impact map” that shows the direct effect each of the core Ricky authoring skills has on the structure, reliability, and reviewability of a generated workflow. The map is illustrated with a concrete example: a **“Customer‑Onboarding Email Campaign”** workflow that moves a new lead from **Lead Capture → Qualification → Welcome Series → Sales Handoff**.
| **Skill** | **What the skill enforces** | **Resulting artifact in the workflow** | **Why it matters for reliability** |
|----------|----------------------------|----------------------------------------|------------------------------------|
| **Deterministic Contract Definition** | Every step declares **input schema**, **output schema**, and **pre‑conditions** in a JSON‑Schema block. | ```yaml\nsteps:\n - id: qualify\n input: ${lead}\n schema:\n type: object\n required: [email, source]\n output: qualifiedLead\n``` | Guarantees that downstream steps receive exactly what they expect, eliminating “missing‑field” runtime errors. |
| **Verification Gates (Gate‑Ops)** | Inserts **`gate`** nodes that run automated tests
...[truncated]
Must
- Describe skills as generation-time selection, loading, and rendering inputs.
- Point to metadata such as
loaded-skills.txtandskill-application-boundary.json. - Say the workflow instructions were informed by selected skills.
Must Not
- Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
- Treat metadata existence as proof that agents received skill bodies at runtime.
- Overstate current tests beyond the generation boundary.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)
User message
Generate a workflow from a vague product spec and return it to the user.
Ricky output
**Ricky – Workflow Generation Assistant**
Below is a concise “request‑for‑clarification” template you can fill in. Once you provide the missing details, I’ll generate a complete, contract‑driven workflow that includes:
* **Scope & Objectives** – clear success criteria.
* **Inputs / Outputs** – typed artifacts with versioning.
* **Stages & Tasks** – deterministic steps, each with a verification gate.
* **Review & Approval** – required reviewers, PR targets, and blocker handling.
* **Validation Loop** – 80‑to‑100 automated checks per iteration.
* **Branch / PR Boundaries** – isolated feature branches, CI‑gated merges.
---
### 1. Product Spec Clarification
| Item | What I Need from You |
|------|----------------------|
| **High‑level goal** | One‑sentence description of the product feature or service you’re building. |
| **Primary actors** | Users, systems, or services that will interact with the workflow. |
| **Key inputs** | Files, API payloads, config files, etc. (include schema or example). |
| **Desired outputs** | Artifacts, reports, API responses, deployment artifacts, etc. |
| **Success metrics** | E.g., latency < 200 ms, error‑rate < 0.1 %, test coverage ≥ 90
...[truncated]
Must
- Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
- Validate with dry-run or targeted structural checks where possible.
- Return artifacts, warnings, and follow-up commands honestly.
Must Not
- Act like Ricky is a pure code-generation bot that emits workflows without verification.
- Stop at "code compiles" as the proof bar.
- Skip skill-aware workflow authoring guidance for serious workflows.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)
User message
A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.
Ricky output
**Ricky’s immediate response to a failed workflow**
1. **Capture the failure state**
- Pull the latest run‑metadata from the orchestration engine (e.g., Airflow, Temporal, or the internal workflow runner).
- Record:
* The step that timed‑out (`step‑X`) – include the timeout threshold, actual duration, and any partial logs.
* The worker that stayed alive (`worker‑Y`) – capture its heartbeat timestamps, resource usage, and the fact that it emitted **no artifacts**.
- Persist this snapshot in the workflow‑run audit log and tag it with a **“BLOCKER”** label.
2. **Raise a deterministic verification gate**
- Insert a **“Post‑Failure Validation Gate”** into the workflow contract:
```yaml
gates:
- name: post_failure_artifact_check
type: deterministic
condition: |
run.artifacts.exists == true && run.errors.isEmpty == false
on_fail: block
```
- This gate guarantees that any subsequent retry will not proceed until the missing artifacts are either produced or the failure is explicitly acknowledged.
3. **Create a scoped repair branch**
- Branch name: `repair/run-{run_id}-timeout-{timestamp}`.
...[truncated]
Must
- Classify the failure before deciding whether to retry.
- Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
- Preserve raw evidence and uncertainty when the class is not known.
Must Not
- Blindly rerun the whole workflow without a blocker classification.
- Treat every failure as a broken workflow definition.
- Claim Ricky fixed the workflow before rerun evidence proves it.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)
User message
Ricky detects stale
.agent-relay/,.relay/, and.trajectories/state before launching a local workflow.
Ricky output
**Workflow Contract – Stale‑State Guard for Local AgentWorkforce Runs**
| **Element** | **Definition** |
|-------------|----------------|
| **Scope** | Local execution of any `.agent/`‑based workflow on a developer workstation. |
| **Inputs** | 1. Working directory root (absolute path).<br>2. Optional `--allow‑stale` flag (bool). |
| **Outputs** | - `guard‑result.json` (machine‑readable pass/fail + diagnostics).<br>- Human‑readable console report. |
| **Success Criteria** | All of the following are **absent**: <br>• Un‑committed changes in `.agent‑relay/`.<br>• Orphaned files in `.relay/` that are not referenced by the current manifest.<br>• Trajectory snapshots in `.trajectories/` older than the latest workflow version. |
| **Failure Mode** | If any stale artifact is detected, the guard **fails** and aborts the launch, returning a deterministic error code `0xE101`. |
| **Deterministic Verification Gates** | 1. **File‑Hash Gate** – Compute SHA‑256 of every file under the three directories and compare against the hash list stored in `.agent‑manifest.json`.<br>2. **Timestamp Gate** – Ensure the newest file timestamp in each directory ≤ `workflow.version.timestamp` (from manifest).
...[truncated]
Must
- Classify stale local runtime state as an environment contamination issue.
- Recommend quarantine or isolated-run guidance before launch.
- Record the observed paths and the action taken or recommended.
Must Not
- Treat stale runtime state as a workflow logic failure.
- Delete or overwrite state without an explicit safe path or user intent.
- Continue into execution as if the workspace were clean.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)
User message
A run marker says another Ricky or Relay run is already active in this workspace.
Ricky output
**Fixing “Another Ricky or Relay run is already active in this workspace”**
The marker means a lock file (or a lingering process) is still present from a previous execution. Follow the deterministic repair contract below. Each step is atomic, verifiable, and safe to repeat.
---
## 1️⃣ Verify the current state
| Command | Expected output |
|---------|-----------------|
| `ricky status` | Should show **No active runs**. If it reports a run, note the **Run‑ID**. |
| `ls -a .ricky/locks` | Should list **only** `.` and `..`. Any other file is the lock. |
| `ps -ef \| grep ricky` | No `ricky` or `relay` processes should be running (except the grep itself). |
If any of the above shows a leftover run, proceed to the corresponding removal step.
---
## 2️⃣ Clean up a stale lock file
1. **Locate the lock**
```bash
LOCK_FILE=$(find .ricky/locks -type f -name "*.lock" | head -n1)
- Confirm it is stale (no matching PID)
if [ -n "$LOCK_FILE" ]; then PID=$(cat "$LOCK_FILE") if ! kill -0 "$PID" 2>/dev/null; then echo "Stale lock for PID $PID" else echo "Process $PID still alive – aborting cleanup" exit
...[truncated]
**Must**
- Report the active marker, run id, or status path when available.
- Ask the user to inspect, wait for, or explicitly clear the active run.
- Avoid launching a competing run that could corrupt evidence.
**Must Not**
- Silently start another run.
- Hide the existing run marker from the user.
- Treat the conflict as a generic failure with no recovery path.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.auto-fix-bounded-loop</code> (runtime-recovery/manual)</summary>
**User message**
> Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.
**Ricky output**
```text
**Local Workflow Execution – Auto‑Fix Run**
Below is a deterministic, contract‑driven procedure you can copy‑paste into your terminal (or CI step) to run the workflow, detect the failure, invoke the auto‑fix routine, and re‑run the repaired artifact. It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run ID are known.
---
## 1️⃣ Preconditions (contract)
| Item | Requirement | Verification |
|------|-------------|--------------|
| `aw` CLI | `>=2.5.0` installed and on `$PATH` | `aw --version` |
| Workflow repo | Clean working tree (no uncommitted changes) | `git status --porcelain` must be empty |
| Failed step | Exact step identifier (e.g., `build:compile`) | Provided by user |
| Previous run ID | UUID or numeric ID (e.g., `run-42`) | Provided by user |
| Auto‑fix flag | Enabled in `aw config` (`auto_fix=true`) | `aw config get auto_fix` → `true` |
| Docker / runtime | Available if workflow uses containers | `docker ps` returns without error |
If any check fails, abort and report the blocker before proceeding.
---
## 2️⃣ Execution Contract
```bash
#!/usr/bin/env bash
set -euo pipefail
# ---- INPUTS (replace placeh
...[truncated]
Must
- Use a bounded retry budget and summarize every attempt.
- Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
- Resume from the failed step with the previous run id when those values are available.
Must Not
- Edit arbitrary repository source files as the default auto-fix surface.
- Keep retrying after the configured max attempts.
- Lose the single Ricky tracking run id across repair/resume attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)
User message
A user runs
ricky run workflows/foo.ts --no-auto-fixand the workflow fails.
Ricky output
**Quick‑Start Debug Checklist**
1. **Collect the failure report**
```bash
ricky run workflows/foo.ts --no-auto-fix --debug > foo.debug.log 2>&1
The --debug flag prints the full execution trace, validation‑gate results, and any exception stack traces to foo.debug.log.
-
Locate the primary error block
Open the log and look for the first line that starts with✖︎(failed gate) or⚠︎(warning). Note the gate name, expected contract, and actual value. -
Validate the workflow contract
Every workflow must satisfy the Typed‑Contract Specification (TCS) defined inspecs/contracts.md. Run the contract validator directly:ricky contract validate workflows/foo.ts
- If the validator returns 0, the contract is sound; the failure is downstream.
- If it returns non‑zero, the log will list the exact contract mismatches (e.g., missing required output, type‑incompatible parameter, or forbidden side‑effect).
-
Run the failing gate in isolation
Most gates are modular and can be invoked alone:ricky gate run <gate-id> --workflow workflows/foo.ts
Replace `
...[truncated]
**Must**
- Preserve one-attempt behavior when auto-fix is disabled.
- Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
- Make clear that the user chose manual inspection over repair/resume automation.
**Must Not**
- Start a repair loop despite `--no-auto-fix`.
- Suppress the diagnosis because no repair was attempted.
- Present the failure as a completed repair attempt.
**Deterministic checks**
- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0
</details>
<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.in-process-local-runner</code> (runtime-recovery/manual)</summary>
**User message**
> Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.
**Ricky output**
```text
**Ricky’s “Run a Local TypeScript Workflow” Playbook**
*(primary local path = the root of the checked‑out workflow repository)*
| Phase | Action | Command / Artifact | Rationale |
|-------|--------|--------------------|-----------|
| **1️⃣ Prepare the environment** | • Verify Node ≥ 18 is on the PATH.<br>• Load the repo‑wide `.env.local` (if present) so that any secret‑injection contracts are satisfied. | ```bash\n# sanity check\nnode -v # >=18\nnpm -v\nsource .env.local # optional\n``` | Guarantees deterministic runtime and that required env vars are present before the workflow starts. |
| **2️⃣ Resolve dependencies** | • Install exact versions declared in `package-lock.json` (or `pnpm-lock.yaml` / `yarn.lock`).<br>• Run the *dependency‑integrity* gate to catch mismatches. | ```bash\nnpm ci # or pnpm install --frozen-lockfile\n# integrity gate (Ricky‑provided script)\n./scripts/verify‑deps.sh\n``` | `npm ci` guarantees a reproducible node_modules tree; the verification script is a required deterministic gate per the workflow standards. |
| **3️⃣ Compile / transpile** | • Use the project’s build script (usually `npm run build`).<br>• The build must emit a **single entry f
...[truncated]
Must
- Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the
agent-relaybinary on PATH. - Precheck that Node and
@agent-relay/sdkare resolvable for the workflow. - Record the actual spawn command in execution evidence.
Must Not
- Fail solely because
agent-relayis not on PATH when the SDK route is available. - Hide the actual runtime command from evidence.
- Conflate the user-facing reproduction command with the primary internal spawn route.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)
User message
Ricky reaches a boundary after a structural failure persists after a fix attempt.
Ricky output
**Ricky – Boundary‑Escalation Workflow for Persistent Structural Failures**
---
### 1. Scope & Trigger
| Element | Detail |
|---------|--------|
| **Workflow ID** | `WF‑SF‑ESC‑001` |
| **Owner** | AgentWorkforce Reliability Team |
| **Trigger** | A structural failure (`SF‑ERR‑*`) remains after the **first‑fix attempt** (see `WF‑SF‑FIX‑001`). |
| **Boundary Condition** | Failure persists **after** the **deterministic verification gate** (see § 3) and **no further automated remediation** is possible. |
---
### 2. Preconditions
1. **Initial Fix Attempt** has been executed and logged in the **Fix‑Log** (`fix_log.yaml`) with status `applied`.
2. **Verification Gate** (see § 3) has been run and returned **FAIL**.
3. All **required artifacts** are present in the branch:
- `src/` – source code snapshot at failure point.
- `tests/` – failing test suite (`pytest -q`).
- `docs/` – updated design diagram (if applicable).
- `metadata.yaml` – workflow metadata (owner, SLA, tags).
If any artifact is missing, **blocker** the workflow and raise **Ticket #RICK‑B001**.
---
### 3. Deterministic Verification Gate (Gate A)
| Step | Command | Expected Outcome |
|------|------
...[truncated]
Must
- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.
Must Not
- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.analytics-from-structured-evidence (runtime-recovery/manual)
User message
Produce a workflow health digest from many Ricky workflow runs.
Ricky output
**Ricky Workflow Health Digest – Compiled from Recent Runs**
*(Generated 2026‑05‑09)*
---
## 1️⃣ Executive Summary
| Metric | Value (last 30 days) | Trend |
|--------|----------------------|-------|
| Total workflow executions | **1 842** | ↗ +12 % |
| Successful completions | **1 617** | ↗ +9 % |
| Failed executions | **225** | ↘ ‑4 % |
| Avg. run time (successful) | **3 m 27 s** | → steady |
| Avg. time to first failure detection | **1 m 12 s** | ↘ ‑15 % |
| Deterministic verification gate pass rate | **96.3 %** | ↗ +1.2 % |
| Blocker‑triggered runs | **18** | ↘ ‑22 % |
**Overall health:** **Green** – > 95 % gate pass, failure rate under 12 %, and blocker count decreasing.
---
## 2️⃣ Detailed Run‑Level Findings
| Run ID | Branch | PR # | Status | Gate Pass | Blocker? | Duration | Key Notes |
|--------|--------|------|--------|-----------|----------|----------|-----------|
| `wf‑20260508‑a1b2c3` | `feature/auto‑review` | 8421 | ✅ Success | ✅ | ❌ | 3 m 02 s | All 80‑100 validation loops passed. |
| `wf‑20260507‑d4e5f6` | `main` | 8399 | ❌ Failure | ✅ | ✅ (schema‑mismatch) | 1 m 45 s | Blocker: output schema drift detected; PR halted. |
| `wf‑20260506‑g7h8i9` | `bugfix/t
...[truncated]
Must
- Consume normalized
WorkflowRunEvidencerather than raw logs as the primary input. - Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
- Produce concrete recommendations tied to specific workflows, steps, or metrics.
Must Not
- Mutate evidence while analyzing it.
- Return generic advice like "improve your workflow" without references.
- Mix environment failures and workflow-logic failures into one undifferentiated bucket.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)
User message
Design a new Ricky Slack surface for workflow debugging.
Ricky output
**Ricky Slack Surface – “Workflow Debugger”**
---
## 1. Overview
The **Workflow Debugger** Slack surface gives engineers, product owners, and SREs a deterministic, low‑latency UI for inspecting, stepping through, and fixing Ricky‑managed workflows directly from a Slack channel or DM. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** (contract‑first, verification gates, review artifacts, 80‑100 validation loops, blocker reporting, scoped PR boundaries).
---
## 2. Surface Architecture
| Layer | Component | Responsibility |
|-------|-----------|----------------|
| **Slack App** | `ricky-debugger` (Bot User) | Receives slash‑commands & interactive components, formats responses, forwards payloads to the backend. |
| **API Gateway** | `debugger.api.ricky.io` (REST + WebSocket) | Auth‑checks, rate‑limits, translates Slack payloads to internal RPC calls. |
| **Debug Service** | `debugger-service` (Go micro‑service) | Core workflow inspection, state replay, step‑through, mutation sandbox. |
| **Workflow Engine** | Existing `ricky-engine` | Provides read‑only snapshot API (`/v1/workflows/:id/snapshot`) and mutable “sandbox” API (`/v1/debug/:runId
...[truncated]
Must
- Treat Slack as a co-equal product surface, not a wrapper around CLI.
- Route domain work through the same normalization, executor, evidence, and specialist stages.
- Keep Slack-specific formatting, thread handling, and interactive components in the surface layer.
Must Not
- Put workflow generation or diagnosis domain logic directly in the Slack handler.
- Degrade Slack to a developer shortcut with weaker routing than CLI.
- Skip signature verification, dedup, or thread handling when the surface is implemented.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.normalizer-is-only-translation-boundary (surfaces-ingress/manual)
User message
Add a new web handoff type that submits a workflow spec and mode.
Ricky output
**New Web‑Handoff Type – `workflowSubmit`**
| Element | Definition |
|---------|------------|
| **Name** | `workflowSubmit` |
| **Purpose** | Allows a web client (e.g., UI, external service) to submit a complete AgentWorkforce workflow specification together with an execution *mode* (e.g., `dryRun`, `run`, `preview`). The backend validates, stores, and optionally triggers the workflow. |
| **Location in `handoffTypes.yaml`** | Add under the top‑level `web:` section. |
| **Schema** | See **Contract** below. |
| **Supported Modes** | `dryRun` – validate only, no side‑effects.<br>`preview` – validate and generate a preview report.<br>`run` – full execution after validation. |
| **Response** | JSON payload containing `status`, `validationReport`, optional `runId`, and any `error` details. |
| **Security** | Requires the caller to present a valid `Authorization: Bearer <token>` with `workflow:submit` scope. |
---
## 1. Contract (JSON‑Schema)
```yaml
# handoffTypes.yaml (excerpt)
web:
workflowSubmit:
description: |
Submit a full workflow spec and execution mode from a web client.
request:
contentType: application/json
schema:
$id: https://agentwork
...[truncated]
Must
- Add a handoff variant that normalizes into
LocalInvocationRequestorCloudGenerateRequest. - Keep auth/session validation in the web surface before normalization.
- Ensure downstream code does not need to know the request came from web.
Must Not
- Short-circuit directly from the web handler to executors.
- Create a parallel domain model for web requests.
- Make the web surface the owner of local or Cloud routing semantics.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.mcp-claude-context-is-metadata (surfaces-ingress/manual)
User message
A Claude session hands Ricky a workflow spec plus conversation context and asks Ricky to determine whether to generate, debug, restart, analyze, or coordinate.
Ricky output
**Decision Matrix for the Claude‑to‑Ricky Hand‑off**
| Claude Input | What Ricky Should Do |
|--------------|----------------------|
| **Complete, version‑controlled workflow spec** (YAML/JSON) **+ clear intent** (e.g., “run this pipeline”) | **Generate** – create the workflow artefacts (branch, PR, CI config) and return the deterministic contract (inputs, outputs, verification gates). |
| **Spec with syntax errors, missing required fields, or failing schema validation** | **Debug** – run the schema validator, list the exact failures, and propose concrete fixes. Return a “debug report” that the user can apply. |
| **Spec that previously existed but the latest run failed** (error logs, failed verification gate) | **Restart** – trigger a fresh run on a clean branch, attach the prior run’s logs for comparison, and report success/failure after the deterministic verification gates. |
| **Spec that runs but produces unexpected business‑logic results** (e.g., wrong data transformation, performance regressions) | **Analyze** – execute the workflow in a sandbox, capture artefacts, run the 80‑100 validation loops, and produce a diagnostic report highlighting where the observed behaviour div
...[truncated]
Must
- Normalize the tool call through the MCP/Claude handoff path.
- Treat conversation context as advisory metadata.
- Make the spec itself sufficient for routing whenever possible.
Must Not
- Require the user to rewrite the spec manually as a workflow.
- Put domain routing logic inside the MCP tool handler itself.
- Ignore provided mode, workflow path, or prior decisions metadata.
Deterministic checks
- PASS
status: case executed - PASS
maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.cloud-api-versioning (surfaces-ingress/manual)
User message
Change the Cloud Ricky generate API response to include a new field and change one existing error code name.
Ricky output
**Scope**
- Add a new field **`requestId`** (string, UUID) to the successful JSON payload returned by **`POST /v1/generate`**.
- Rename error code **`INVALID_PROMPT`** to **`PROMPT_VALIDATION_FAILED`** in the error‑response schema and all references.
All changes are limited to the **cloud‑ricky** service repository ( `services/cloud-ricky/` ). No other services are affected.
---
## 1. Contract Update (API Spec)
| File | Change | Reason |
|------|--------|--------|
| `openapi.yaml` (under `services/cloud-ricky/api/`) | • Add `requestId` property to `GenerateResponse` schema (type: `string`, format: `uuid`). <br>• Rename `INVALID_PROMPT` → `PROMPT_VALIDATION_FAILED` in `ErrorCode` enum. | Guarantees downstream clients can rely on a traceable request identifier and aligns error naming with product spec. |
| `README.md` (API section) | Update example responses and error table to reflect new field and error name. | Documentation stays in sync with the contract. |
**Version bump** – Increment the OpenAPI `info.version` from `v1.2.0` to `v1.3.0`.
---
## 2. Implementation Changes
| Path | Modification |
|------|--------------|
| `src/handlers/generate_handler.py` | • Generat
...[truncated]
Must
- Allow additive response fields within
/api/v1/ricky/.... - Reject renaming or changing the meaning of an existing error code inside the same version.
- Require a new API version for breaking request/response changes.
Must Not
- Remove or rename existing response fields in v1.
- Add a new required request field within v1.
- Treat version tran
Comment truncated to stay within GitHub limits. Download the ricky-eval-run artifact for the full human-review.md.
|
Superseded by #88 — moved from regex to mdast AST extraction. The AST approach addresses the same root cause more directly (no regex boundary maintenance, fenced-code blocks excluded by construction) and continues to pass all of #87's tests and evals plus eight additional AST-specific cases. Closing this in favor of #88. |
…es-ast fix(spec-intake): extract target files via mdast AST (supersedes #87)
Summary
PATH_PATTERNregex required a leading whitespace boundary, so paths inside markdown backticks (the natural convention) never matched. A spec likeUpdate \packages/web/app/api/v1/workflows/run/route.ts`...extracted **zero** target files. Lower-precision tokens likebase/headfrom prose (PR number, base/head SHA) DID match and ended up as the onlytargetFilesentry — pushing the renderer into the manifest-driven "single artifact underworkflows/generated/`" branch even when the spec named real source files.target_filesin the--jsongeneration output so it's verifiable from the CLI.What changed
PATH_PATTERNto accept paths preceded by backticks, single quotes, double quotes, parens, or angle brackets — typical markdown wrappers — without losing the whitespace boundary.looksLikeRealPathto suppress two-segment prose noise with no extension and no recognized leading directory prefix (e.g.base/head,my-org/my-repo). Paths with an extension, 3+ segments, or a known prefix (packages/,src/,tests/,workflows/,.github/, etc.) still pass. Existingworkflows/wave2-productregression case remains green.extractTargetFilesBlockto recognize a literal## Target Filesmarkdown section. When present, its bullet list takes precedence over prose extraction so spec authors can be explicit about scope. Bullets, backticks, and quotes are stripped.target_filesin the generation JSON output (LocalGenerationStageResult.artifact.target_files) so callers and deterministic evals can verify the extracted scope without parsing the rendered workflow body.Repro / before-and-after
End-to-end against
cloud@codex/msd-shared-sandbox-review-runtime-spec'sdocs/runtimes/specs/msd-shared-sandbox-review-runtime.md:Before:
targetFiles: ["base/head"](single noise match; spec drove the manifest path)After:
— exactly the files the spec names for editing.
Test plan
npx vitest run src/product/spec-intake/parser.test.ts— 35 tests pass (29 existing + 6 new covering backticked paths, paren/quote-wrapped paths, prose-noise suppression, structured-block precedence, prose fallback, and 3+ segment paths)npx vitest run— 1034/1034 green across 49 filesnpm run typecheck— cleannpm run evals— 8/8 deterministic pass (5 existing + 3 new:target-files-from-backticked-prose,target-files-from-structured-block,target-files-suppresses-prose-noise); 42 needs-human (LLM cases, unchanged); 0 failedtarget_filescorrectly populated in the JSON outputOut of scope
targetFilesis empty (the manifest path still applies for specs that don't name files).spec-intakeparser, so it benefits both local and cloud paths).