Skip to content

Fix generated workflow worktree and file gate validation#127

Merged
khaliqgant merged 4 commits into
mainfrom
fix/generated-workflow-worktree-gates
May 18, 2026
Merged

Fix generated workflow worktree and file gate validation#127
khaliqgant merged 4 commits into
mainfrom
fix/generated-workflow-worktree-gates

Conversation

@kjgbot

@kjgbot kjgbot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • keep spec-intent mismatches in the pre-write validation path so the writer can repair invalid artifacts instead of falling straight back
  • extend AST-based workflow analysis to inspect real .step(...) command literals for declared Worktree/Target branch handling
  • reject generated workflows that use test -f on declared worktree directories, directory-looking paths, or glob paths
  • update the agent-relay workflow persona prompts with the worktree setup and file-gate rules

Context

This addresses the next Ricky failures seen while shipping the cloud MCP spawn specs:

  • mixed agent + GitHub primitive workflows must not use GitHubStepExecutor as the global .run executor (existing sanitizer is preserved)
  • specs declaring Worktree: /private/tmp/... need a deterministic git worktree add setup step before implementation/test/git commands
  • deterministic gates must not use test -f for worktree roots, API/package directories, or globs

Tests

  • npm test -- src/product/generation/workforce-persona-writer.test.ts src/product/generation/pipeline.test.ts
  • npm run typecheck (fails on pre-existing src/scheduled-agent.ts errors: missing @agent-relay/agent types and implicit any ctx/event)

@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: b445664a-05f4-40fa-bd51-aa354f2d2487

📥 Commits

Reviewing files that changed from the base of the PR and between a447758 and 6b85bc5.

📒 Files selected for processing (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts

📝 Walkthrough

Walkthrough

This PR enforces Worktree setup and bans unsafe test -f gates: it updates persona prompts, adds AST-based spec-intent mismatch detection for worktree/branch usage and invalid test -f checks, integrates detection into the writer and pipeline validation, and expands tests accordingly.

Changes

Worktree Contract Enforcement

Layer / File(s) Summary
Prompt policy updates for Worktree and test-f rules
personas/agent-relay-workflow.json
System prompts for best, best-value, and minimum tiers are updated to instruct Worktree creation via git worktree add and ban test -f existence checks on directories, worktrees, and glob paths.
Pipeline validation integration for spec-intent checks
src/product/generation/pipeline.ts
Adds validateWorkforcePersonaArtifact and routes all Workforce persona validation (initial write, prewrite repairs, review fix) through it; refactors validation issue accumulation to merge errors/warnings and recompute validity based on blocking issues.
Spec-intent mismatch detection engine for Worktree contracts
src/product/generation/workforce-persona-writer.ts
detectSpecIntentMismatch now accepts SpecIntentLike, derives intent text, extracts stepCommands from AST, and validates declared worktree/branch presence, runtime git worktree add setup ordering, and invalid test -f gates (globs and directory-looking paths).
Writer output diagnostics and policy context
src/product/generation/workforce-persona-writer.ts
Writer runs a first-emit spec-intent diagnosis (debug dump on mismatches), appends per-mismatch warnings to metadata, and embeds deterministic Worktree and tightened test -f guidance in the injected writer policy.
Test coverage for Worktree mismatch detection
src/product/generation/workforce-persona-writer.test.ts
Tests added/expanded to cover step-count floor regression and many Worktree contract scenarios: missing setup, setup only in comments/heredocs, use-before-setup, unrelated git commands, proper setup acceptance (with test -d), and test -f gating violations.

Sequence Diagram

sequenceDiagram
  participant Client
  participant Writer
  participant Pipeline
  participant detectSpecIntentMismatch
  participant Analyzer
  Client->>Writer: request workflow generation (persona)
  Writer->>Pipeline: emit artifact
  Pipeline->>detectSpecIntentMismatch: artifact.content, spec
  detectSpecIntentMismatch->>Analyzer: analyze AST, extract stepCommands
  Analyzer-->>detectSpecIntentMismatch: facts
  detectSpecIntentMismatch-->>Pipeline: mismatch messages
  Pipeline-->>Writer: combined validation result + warnings
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • AgentWorkforce/cloud#732: Addresses the same Worktree handling and test -f gate semantics constraints for workflow generation.

Possibly related PRs

Poem

🐰 I hop through specs with nimble feet,

planting worktrees, tidy and neat.
No test -f where directories hide,
git worktree add sets the path wide.
Mismatches caught — the workflow's complete!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: fixing validation for generated workflows regarding worktree setup and file gate handling, which is the core focus across all modified files.
Description check ✅ Passed The description is directly related to the changeset, providing context about spec-intent mismatches, AST analysis improvements, worktree validation, and test-f rejection rules that align with the actual code changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/generated-workflow-worktree-gates

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 1475-1488: The current checks inspect raw workflowContent and a
flattened commandText which can be fooled by comments/heredocs or commands
appearing after implementation; instead drive worktree/branch validation from
executable steps order: parse and iterate facts.stepCommands (the ordered array
of step objects) and verify a git worktree add step (use
hasGitWorktreeAddCommand on the actual step.command values or better, implement
an AST/tokenized check per step to ignore comments/heredocs) appears before any
implementation/test step that uses the declaredWorktree.path or
declaredWorktree.branch; update the mismatches logic in the block around
declaredWorktree and the analogous checks at the other locations (lines
referenced 1878-1880) to use this ordered-step, AST-safe approach rather than
substring matching on workflowContent/commandText.
- Around line 1882-1898: The function findInvalidFileExistenceGateCommands
currently only rejects globs, exact worktree roots, or paths ending with '/',
allowing directory-like paths such as "packages/web/app/api/v1/slack" to slip
through; update the checks to also treat paths that look like directories even
without a trailing slash by adding a directory-detection condition (e.g., check
the last path segment via path.basename and reject if it contains no
extension/dot), using the same unquotedPath/normalizedPath values; modify the
branch ordering in findInvalidFileExistenceGateCommands (alongside
extractTestFilePaths, unquoteShellToken, stripTrailingSlashes,
containsShellGlob) to push an invalid message for directory-looking tokens and
include the step.stepName and testedPath in the message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a12ddf9-7c35-4336-92ce-f1ce00e0e88d

📥 Commits

Reviewing files that changed from the base of the PR and between abf8903 and 1786524.

📒 Files selected for processing (4)
  • personas/agent-relay-workflow.json
  • src/product/generation/pipeline.ts
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts

Comment thread src/product/generation/workforce-persona-writer.ts Outdated
Comment thread src/product/generation/workforce-persona-writer.ts

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

You’re at about 96% of the monthly reviewed-line limit. You may want to disable incremental reviews to conserve quota. Reviews will continue until that limit is exceeded. If you need help avoiding interruptions, please contact contact@cubic.dev.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread src/product/generation/workforce-persona-writer.ts Outdated

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 6 additional findings in Devin Review.

Open in Devin Review

Comment on lines 1470 to 1471
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Step-count floor threshold inflated by combined spec text instead of description-only length

The detectSpecIntentMismatch function's step-count floor check (description.length > 4_000 at line 1464) was calibrated for spec.description alone (the docstring at line 1410 explicitly says "spec.description length > 4 KB"). After this PR, the local variable description is set to specIntentText(spec) (line 1441), which concatenates spec.description, spec.targetContext, spec.desiredAction.summary, and all constraints, acceptanceGates, and evidenceRequirements texts. In production, a NormalizedWorkflowSpec could have a 3KB description but 1.5KB of structured constraints/gates/evidence, pushing the combined text over 4KB and triggering the step-count floor. This incorrectly flags workflows with <4 steps as "master-executor stubs" for specs whose description is well under the intended 4KB threshold. The error message also reports the combined text size ("spec is X KB"), further misrepresenting the actual description size.

(Refers to lines 1464-1471)

Prompt for agents
The step-count floor check at line 1464 of workforce-persona-writer.ts uses `description.length > 4_000` where `description` is now the combined spec text from `specIntentText(spec)` (which aggregates description + targetContext + constraints + gates + evidence). The threshold was calibrated for spec.description alone (per the docstring at lines 1409-1411).

The fix should either:
1. Use `spec.description?.length ?? 0` specifically for the step-count floor check (keeping the combined text for other checks like PR-shipping and worktree), or
2. Raise the threshold proportionally (e.g., 6-8KB) to account for the larger combined text, or
3. Update the docstring at line 1410 to reflect the new semantics if the broader threshold is intentional.

Option 1 is the safest: store `spec.description` length separately and use it for the step-count floor, while keeping `specIntentText(spec)` for the intent-matching regex checks where the broader view is beneficial.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Ricky Eval Review

Run: .ricky/evals/runs/2026-05-18T19-23-57-589Z-HEAD-provider
Mode: provider
Git SHA: 168b5b2

Passed: 11 | Needs human: 44 | Reviewable: 44 | Missing output: 0 | Failed: 0 | Skipped: 0

Human Review Cases

These cases passed deterministic checks and include captured Ricky output for a human verdict against their Must / Must Not rubric.

REVIEW agent-assistant-boundary.real-reuse-not-rhetorical (agent-assistant-boundary/manual)

User message

Update Ricky docs and code to say it uses Agent Assistant more deeply.

Ricky output

**Scope of Change**  
- **Docs**: Update the README, `AGENTS.md`, and the inline module doc‑strings to state that *Ricky* now delegates core coordination tasks to the **Agent Assistant** library (e.g., `agent_assistant.core.Coordinator`).  
- **Code**: Replace the legacy internal coordination stub (`ricky/coordination.py`) with calls to `AgentAssistantCoordinator` and expose a thin wrapper (`RickyCoordinator`) that satisfies the existing public API.  
- **Tests**: Add unit‑tests that verify the wrapper forwards calls correctly and that the new deterministic verification gate (`verify_agent_assistant_ready`) runs during initialization.  

---

## 1. Workflow Contract (RFC‑0012)

| Item | Description |
|------|-------------|
| **Title** | Deep integration of Agent Assistant into Ricky |
| **Owner** | `@ricky-maintainer` |
| **Stakeholders** | Docs team, CI/CD pipeline, downstream agents that import `ricky.Coordinator` |
| **Start Date** | 2026‑05‑18 |
| **Target Merge** | 2026‑05‑25 (after 2 review cycles) |
| **Deliverables** | - Updated markdown files (`README.md`, `AGENTS.md`, `docs/architecture.md`) <br> - New module `ricky/agent_assistant_wrapper.py` <br> - Updated `setup.cfg` e
...[truncated]

Must

  • Ground claims in real package imports and runtime paths.
  • Distinguish current implementation from target architecture.
  • Identify which Agent Assistant primitive is actually exercised.

Must Not

  • Rename local code to sound Agent Assistant aligned and count that as adoption.
  • Claim broad Agent Assistant native behavior from documentation-only alignment.
  • Blur target architecture with landed behavior.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.turn-context-preserves-ricky-envelope (agent-assistant-boundary/manual)

User message

Evaluate the current Ricky @agent-assistant/turn-context adoption.

Ricky output

**Ricky @agent‑assistant/turn‑context Adoption Review**  
*Prepared 2026‑05‑18 – 1 page summary (≈ 350 words)*  

---

## 1. Scope of Review
| Dimension | What was examined | Data sources |
|-----------|-------------------|--------------|
| **Repository coverage** | All active AgentWorkforce repos that declare `@agent-assistant/turn-context` in `package.json` or `requirements.txt`. | GitHub API (last 90 days), internal dependency graph. |
| **Runtime usage** | Calls to `turnContext.getTurnState()`, `turnContext.sendActivity()`, and custom middleware that reads/writes turn‑state. | Telemetry from the **Ricky‑Telemetry** service (event logs, error rates). |
| **Compliance** | Presence of required contract checks (deterministic verification gates) and review artifacts (design doc, test plan). | CI/CD pipelines, PR templates, CODEOWNERS. |
| **Team health** | Survey of developers on clarity of the contract and blocker frequency. | Quarterly internal survey (N=27). |

---

## 2. Quantitative Findings

| Metric | Target (per spec) | Current | Δ | Interpretation |
|--------|-------------------|---------|---|----------------|
| **Repo adoption** | 100 % of new repos (post‑2024‑Q3) | 84 % (
...[truncated]

Must

  • Preserve request id, source metadata, structured spec, invocation root, mode, stage mode, spec path, metadata, and spec text.
  • Record compact provenance through generation decisions or coordinator metadata.
  • Keep the shared turn context internal to the adapter boundary.

Must Not

  • Move LocalResponse, blocker taxonomy, recovery wording, or execution semantics into the shared turn-context package.
  • Drop Ricky-specific workflow metadata during envelope assembly.
  • Treat turn context as a product decision engine.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.product-core-stays-ricky-owned (agent-assistant-boundary/manual)

User message

Decide whether workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording should move into Agent Assistant.

Ricky output

**Recommendation:** Move **all** of the listed capabilities—workflow generation, validation, debugging, staged CLI UX, and blocker/evidence wording—into the **Agent Assistant** component.

---

## Rationale (aligned with AGENTS.md and workflow standards)

| Capability | Why it belongs in Agent Assistant | Impact on reliability & coordination |
|------------|-----------------------------------|---------------------------------------|
| **Workflow Generation** | Agent Assistant already owns the *authoring* contract (input spec → workflow definition). Centralizing generation avoids duplicated template logic across agents. | Guarantees a single source‑of‑truth for workflow shape; deterministic contracts can be version‑controlled and unit‑tested. |
| **Workflow Validation** | Validation is the deterministic gate that follows generation. Embedding it in the same service enables immediate feedback loops (80‑100 validation cycles) without cross‑service latency. | Early detection of schema violations, missing inputs, or circular dependencies; reduces PR churn. |
| **Debugging (runtime introspection, step‑trace, state replay)** | Debug utilities need access to the exact workflow artefacts pr
...[truncated]

Must

  • Keep product-defining workflow generation, validation, debugging, local UX, and evidence wording Ricky-owned until proof says otherwise.
  • Reuse shared runtime primitives where they reduce duplication without weakening Ricky.
  • Make extraction follow typed, tested, live product proof.

Must Not

  • Generalize workflow-specific behavior prematurely.
  • Adopt moving shared seams merely for architectural purity.
  • Lose the precise local-first staged workflow UX.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.one-slice-at-a-time (agent-assistant-boundary/manual)

User message

Plan the next Agent Assistant adoption slice for Ricky.

Ricky output

**Agent Assistant Adoption Slice – “Ricky v2.0 Roll‑out”**  
*Target: Core Customer‑Success (CS) teams, 3 pilot squads (≈45 users)*  

---

## 1. Slice Objectives
| Goal | Success Metric | Acceptance Threshold |
|------|----------------|----------------------|
| **Reliability** – Deploy the new workflow engine with < 1 % crash‑rate per 1 k executions. | Crash‑rate (Sentry) | ≤ 0.01 % |
| **Coordination** – Reduce hand‑off latency between “Ticket‑Triage” and “Resolution‑Playbook” by 30 %. | Avg. hand‑off time (ms) | ≤ 700 ms |
| **Authoring** – Enable CS power‑users to create a “FAQ‑Update” workflow without dev assistance. | # of self‑served workflows created | ≥ 3 (within 2 weeks) |
| **Feedback Loop** – Capture ≥ 80 % of pilot‑user sentiment in the “Adoption Survey”. | Survey response rate | ≥ 80 % |

---

## 2. Scope & Boundaries
| Boundary | Description |
|----------|-------------|
| **Branch** | `feature/ricky-adoption-slice-v2` (fork from `main`) |
| **PR** | `pr-1234: Adopt slice – add contracts, tests, docs` |
| **Modules Affected** | `workflow_engine/`, `coordination/`, `authoring_ui/`, `metrics/` |
| **Excluded** | Legacy “Email‑Ingestion” pipeline (will be migrated in a l
...[truncated]

Must

  • Pick exactly one real shared seam to evaluate or adopt.
  • Define a live Ricky product path that will prove the adoption.
  • Include regression checks that product messaging, blocker output, and evidence remain truthful.

Must Not

  • Bundle sessions, memory, policy, proactive behavior, and execution extraction into one vague migration.
  • Skip the comparison/evaluation step for mature Ricky-local seams.
  • Treat adoption as successful without a live product-path proof.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW agent-assistant-boundary.future-surfaces-use-shared-runtime (agent-assistant-boundary/manual)

User message

Design future Slack or web support for Ricky using Agent Assistant packages.

Ricky output

## Ricky – Future Support Layer (Slack / Web)

### 1. High‑level Goal
Provide a **self‑service, deterministic support channel** for Ricky users (agents, reviewers, and ops) that lives in Slack and a lightweight web UI. The layer must:

| Requirement | Rationale |
|-------------|-----------|
| **Contract‑first API** | Guarantees stable request/response shapes for downstream agents. |
| **Deterministic verification gates** | Prevents “flaky” assistance and ensures compliance with the **AgentWorkforce reliability SLA (99.9 % success, ≤ 200 ms latency)**. |
| **Review artifacts & audit trail** | Every support interaction is versioned, signed, and stored for compliance and continuous‑improvement loops. |
| **80‑100 validation loops** | Automated regression suite that runs on every PR to catch regressions early. |
| **Blocker reporting** | Immediate escalation to the “Ricky‑Ops” channel when a gate fails. |
| **Scoped branch/PR boundaries** | Each feature (Slack, Web, or shared core) lives in its own repo branch with a dedicated PR template. |

---

## 2. Architecture Overview

+-------------------+ +-------------------+ +-------------------+
| Slack Bot (Bolt) |
...[truncated]


**Must**

- Prefer shared surfaces, webhook-runtime, sessions, and routing primitives for future non-CLI interaction where mature.
- Keep local CLI behavior product-local unless shared adoption is proven harmless.
- Explain which behavior is future/target architecture versus implemented today.

**Must Not**

- Preemptively add memory, policy, or proactive packages without a real Ricky product requirement.
- Let future surface abstractions distort the current CLI contract.
- Duplicate a mature Agent Assistant capability locally without justification.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.generation-default-not-execution</code> (cli-behavior/manual)</summary>

**User message**

> A user runs `ricky --mode local --spec "generate a workflow for package checks"` without `--run`.

**Ricky output**

```text
**What happens when you run**

```bash
ricky --mode local --spec "generate a workflow for package checks"

without the --run flag?

Step Description Outcome
1️⃣ Parse arguments – Ricky reads --mode local and the free‑form --spec string. The spec is stored as the desired workflow contract but no execution is triggered.
2️⃣ Spec‑only mode – In the absence of --run, Ricky enters validation‑only mode. It treats the request as a design‑time operation: generate the workflow definition, run static checks, and report the contract.
3️⃣ Workflow generation – Ricky invokes the Workflow Generator component with the supplied spec. A YAML/JSON workflow file (e.g., package-checks.yml) is produced in the temporary workspace.
4️⃣ Deterministic verification gates – The generated workflow passes through the built‑in verification pipeline:
Schema validation (AGENTS schema v2)
Dependency graph sanity (no cycles, all steps reachable)
Contract compliance (inputs/outputs match the spec)
If any gate fails, Ricky prints a concise error block and exits with a no
...[truncated]

**Must**

- Say generation is the default and execution was not requested.
- Print the generated artifact path, workflow id, spec digest, and next run command.
- Avoid showing execution evidence for a generation-only request.

**Must Not**

- Imply the workflow ran automatically.
- Present a generation-only result as execution success.
- Hide the opt-in commands for running the artifact.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>cli.first-run-copy-is-compact-and-truthful</code> (cli-behavior/manual)</summary>

**User message**

> Render Ricky's first-run CLI onboarding for a new user.

**Ricky output**

```text
**Ricky – First‑Run CLI Onboarding**

Below is the complete, ready‑to‑run onboarding workflow for a brand‑new user of the Ricky CLI. It follows the **Ricky repository conventions** (AGENTS.md), the **workflow standards** (deterministic gates, review artifacts, 80‑100 validation loops), and respects **branch/PR boundaries**.

---

## 1️⃣ Repository Layout (for reference)

ricky/
├─ .github/
│ └─ workflows/
│ └─ onboarding.yml # CI gate for first‑run onboarding
├─ docs/
│ └─ onboarding.md # Human‑readable guide (this file)
├─ src/
│ └─ cli/
│ └─ main.ts # Entry point
├─ tests/
│ └─ onboarding.test.ts # 100‑loop validation suite
├─ .ricky/
│ └─ config.yaml # User‑specific config (generated)
├─ package.json
└─ README.md


---

## 2️⃣ Onboarding Contract (YAML)

```yaml
# .ricky/onboarding_contract.yaml
version: 1.0
name: Ricky First‑Run CLI Onboarding
description: |
  Bootstraps a new user’s environment, validates toolchain,
  generates a scoped config, and registers the user with the
  central coordination service.
steps:
  - id: check_prereqs
    name: Verify System Prerequisites
    comm
...[truncated]

Must

  • Show compact Ricky branding and clear Local / BYOH, Cloud, Both, and Just explore choices.
  • End every branch with a concrete next step.
  • Advertise only commands that are currently implemented.

Must Not

  • Sound like a launch page or documentation dump.
  • Claim Ricky runs workflows by default when generation is the default path.
  • Require web or Slack onboarding before CLI use.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.recovery-guidance-no-stack-traces (cli-behavior/manual)

User message

A user gives Ricky an empty spec or a missing spec file.

Ricky output

**Ricky’s Standard Response to an Empty or Missing Spec File**

---

### 1. Immediate Detection & Blocker Report
| Step | Action | Output |
|------|--------|--------|
| **1.1** | Verify the presence of the spec file at the path declared in the workflow manifest. | If `fs.existsSync(path)` → *continue*; else → **BLOCKER**: “Spec file `{{path}}` not found.” |
| **1.2** | If the file exists, read its contents and check for non‑whitespace characters. | If `content.trim().length === 0` → **BLOCKER**: “Spec file `{{path}}` is empty.” |
| **1.3** | Emit a **GitHub Issue** (or Jira ticket) with the blocker details, tagging the spec owner and the workflow lead. | Issue title: *“[BLOCKER] Missing/Empty Spec for {{workflow_name}}”* |

> **Rule:** No downstream steps (code generation, CI, or deployment) may run while a blocker is open.

---

### 2. Automated Placeholder Generation (Optional “Scaffold” Mode)

If the spec owner authorizes a scaffold, Ricky can create a minimal, syntactically‑valid placeholder that satisfies the contract checks and allows the pipeline to continue to the *validation‑loop* stage.

| File | Minimal Content |
|------|-----------------|
| `{{spec_path}}` | ```yaml\n# 
...[truncated]

Must

  • Return a user-facing failure or guidance message with a real recovery command.
  • Distinguish generation failure from execution failure.
  • Show stack traces only when verbose diagnostic mode is requested.

Must Not

  • Crash with an uncaught exception in normal mode.
  • Suggest commands that do not exist.
  • Pretend a missing spec was accepted.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW cli.status-does-not-invent-provider-state (cli-behavior/manual)

User message

Render ricky status when no provider checks have proven Google or GitHub are connected.

Ricky output

**`ricky status` output (no provider checks have proven Google or GitHub are connected)**  

=== AgentWorkforce Status Report ===

🟢 Core Services
• Scheduler : RUNNING
• Workflow Engine : RUNNING
• Artifact Store : RUNNING
• Logging & Metrics : RUNNING

🔴 Provider Connections
• Google (OAuth) : NOT CONNECTED ← no successful health‑check
• GitHub (API Token) : NOT CONNECTED ← no successful health‑check

⚙️ Configuration
• Active Workflows : 12
• Pending Deploys : 0
• Last Sync : 2026‑05‑18 09:12 UTC

🟡 Alerts & Recommendations
• Provider “Google” is unreachable or credentials are missing.
• Provider “GitHub” is unreachable or credentials are missing.
• Run ricky provider connect google or ricky provider connect github to re‑authenticate.
• Verify network access and token scopes, then re‑run ricky healthcheck.

🟣 Next Steps

  1. Confirm OAuth client/secret for Google and PAT for GitHub are present in the secret store.
  2. Execute ricky provider test google and ricky provider test github to validate connectivity.
  3. Once both return “CONNECTED”, re‑run ricky status to see green indic
    ...[truncated]

**Must**

- Report unknown or not-connected provider state honestly.
- Update provider status only from explicit provider checks or Cloud status results.
- Give concrete setup guidance for Cloud when relevant.

**Must Not**

- Mark Google or GitHub connected because guidance text was shown.
- Invent a provider connection URL or OAuth flow.
- Show empty fields with no recovery guidance when config is missing.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.skill-matcher-registry-backed</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for a GitHub primitive change and explain which skills Ricky selected.

**Ricky output**

```text
**Workflow: “GitHub Primitive Change”**

| Step | Owner | Action | Input | Output | Verification Gate |
|------|-------|--------|-------|--------|--------------------|
| 1️⃣ | **Ricky (Coordination)** | **Create a scoped branch** | `main` (latest stable) | `feature/<short‑desc>` branch | `git checkout -b` succeeds; branch name matches regex `^feature\/[a-z0-9-]+$` |
| 2️⃣ | **Ricky (Branch‑Policy)** | **Sync with upstream** | `feature/<short‑desc>` | Updated branch with latest `main` | `git merge --ff-only origin/main` passes; no conflicts |
| 3️⃣ | **Developer** | **Implement primitive change** (e.g., rename a variable, add a comment, bump a version) | Code editor, test harness | Modified files | Local lint & format (`npm run lint` / `black -q .`) returns 0 |
| 4️⃣ | **Ricky (Deterministic Verification)** | **Run unit‑test suite** | Modified repo | Test results | All tests pass (`pytest -q` or `npm test` returns 0) |
| 5️⃣ | **Ricky (Pre‑PR Review Bot)** | **Static analysis & security scan** | Code diff | SARIF report | No new high‑severity findings; CI status = `passed` |
| 6️⃣ | **Developer** | **Open Pull Request** | `feature/<short‑desc>` → `main` | PR #\<n\> | PR title follow
...[truncated]

Must

  • Select skills from the actual registry rather than a hardcoded static set.
  • Record ranked skill matches with confidence and trigger evidence.
  • Fall back gracefully when the registry is missing or no skill clears the threshold.

Must Not

  • Claim runtime agents embody skills when only generation-time guidance was applied.
  • Fail generation solely because optional skill files are missing.
  • Hide skill selection evidence from artifacts or JSON output.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.tool-selector-honors-spec-hints (generation-quality/manual)

User message

Generate a workflow where the spec says "use Claude to review and Codex to implement".

Ricky output

**Workflow: Claude‑Review → Codex‑Implementation**

| Phase | Owner | Action | Input | Output | Success Gate |
|-------|-------|--------|-------|--------|--------------|
| **1. Requirement Ingestion** | *Product Owner* | Submit a **Feature Ticket** (Jira/Asana) with clear acceptance criteria. | Ticket ID, user story, non‑functional constraints. | Ticket status = *Ready for Review*. |
| **2. Claude Review** | *Claude (LLM‑reviewer)* | Run `claude-review` job against the ticket description. | • Structured Review Report (JSON) <br>• Identified ambiguities, edge‑cases, security concerns. | Report contains `reviewStatus: "approved"` **or** `reviewStatus: "needsClarification"` with explicit comments. |
| **3. Clarification Loop** *(if needed)* | *Product Owner* | Address every `needsClarification` comment, update ticket. | Updated ticket. | Ticket status = *Ready for Review* → re‑run Phase 2. |
| **4. Codex Implementation** | *Codex (LLM‑coder)* | Execute `codex-implement` using the **approved** Review Report as prompt. | Review Report, ticket metadata. | Generated code package (repo branch) + `implementationManifest.json`. |
| **5. Deterministic Verification Gate** | *CI System* | Run `
...[truncated]

Must

  • Assign per-step runner/model decisions from explicit spec hints where possible.
  • Let skill metadata or project defaults fill gaps when the spec is silent.
  • Write tool-selection.json or equivalent audit metadata.

Must Not

  • Use one generic runtime default for every agent despite explicit hints.
  • Let skill defaults override explicit user/spec runner hints.
  • Omit the reason each tool or model was chosen.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.refine-is-opt-in-and-bounded (generation-quality/manual)

User message

Use --refine to sharpen a generated workflow's step task descriptions and acceptance gates.

Ricky output

Below is a compact “how‑to” for using the **`--refine`** option when you want to tighten a workflow’s step‑level task descriptions and acceptance gates. The pattern follows the **Ricky** repository conventions (AGENTS.md, workflow standards, shared authoring rules, and product specs).

---

## 1️⃣ When to invoke `--refine`

| Situation | Why you need refinement |
|-----------|------------------------|
| **Vague task wording** – e.g. “process data” | Makes it hard for downstream agents to know exact inputs/outputs. |
| **Missing deterministic gate** – e.g. “check success” without a concrete predicate | Prevents automated verification and can cause silent failures. |
| **Over‑broad acceptance criteria** – e.g. “quality is good” | Leads to flaky runs and noisy metrics. |
| **Branch/PR scope drift** – step changes spill into unrelated modules | Breaks the scoped‑branch rule and makes review noisy. |

If any of the above apply, run the workflow generator with `--refine`.

---

## 2️⃣ Command syntax

```bash
awf generate \
  --template path/to/template.yaml \
  --output path/to/workflow.yaml \
  --refine \
  --target-step <STEP_ID>   # optional, defaults to all steps
  • --refine te
    ...[truncated]

**Must**

- Keep the deterministic unrefined path as the default.
- Bound the refinement pass by timeout, token budget, and editable regions.
- Re-run validation after refinement and fall back to the deterministic artifact on unsafe edits or provider failure.

**Must Not**

- Change the workflow graph, agent assignments, or side-effect scope during refinement.
- Fail the whole generation if optional refinement times out.
- Hide warnings when the deterministic artifact is returned unchanged.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>generation-quality.behavior-grounded-gates</code> (generation-quality/manual)</summary>

**User message**

> Generate a workflow for the `ricky --version` spec.

**Ricky output**

```text
**Workflow: `ricky --version` Specification & Delivery**  
*(Ricky AgentWorkforce – v0.9.0)*  

---  

### 1. Goal  
Produce a deterministic, test‑driven implementation of the `ricky --version` CLI flag that:

* Returns the current Ricky package version (semantic‑version string).  
* Exits with status 0 on success, non‑zero on unexpected failure.  
* Emits the version to **stdout** only (no extra whitespace).  
* Is covered by unit‑ and integration‑tests, CI verification gates, and a documented release artifact.  

---  

### 2. Scope & Boundaries  

| Item | Inclusion | Exclusion |
|------|-----------|-----------|
| **Code** | `src/cli.ts` (or `src/main.ts`), version constant, flag parser, error handling. | Any unrelated sub‑commands, UI components, or network calls. |
| **Docs** | `README.md` usage snippet, `docs/cli.md` entry, changelog entry. | Full user guide rewrite. |
| **Tests** | Unit test for flag parsing, integration test invoking the built binary. | Load‑testing, performance benchmarks. |
| **Release** | Bump version in `package.json`, generate `ricky-vX.Y.Z.tar.gz`. | Publishing to npm registry (handled by separate release pipeline). |

---  

### 3. Contract (Determin
...[truncated]

Must

  • Build gates from the stated acceptance behavior, such as checking ricky --version output.
  • Avoid generic source-shape grep checks when the spec asks for CLI behavior.
  • Keep generated validation meaningful for the current repo shape.

Must Not

  • Treat grep -Eq 'export|function|class|workflow(' dist/ricky.js as proof of version behavior.
  • Claim the workflow is proven by source syntax alone.
  • Ignore the package-json version resolution order in the spec.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.pattern-selection-deliberate (generation-quality/manual)

User message

Generate a workflow for many independent artifacts with a validation/fix/rerun loop.

Ricky output

**Workflow: “Batch‑Artifact‑Validate‑Fix‑Rerun”**  
*Purpose*: Process a large set of independent artifacts (e.g., data files, code modules, model checkpoints) through a deterministic validation → optional fix → rerun loop until each artifact reaches a *pass* state or is marked a *hard blocker*.

---

## 1. Scope & Branching

| Scope | Branch | PR Target |
|-------|--------|-----------|
| **Artifact ingestion & metadata** | `feat/artifact‑ingest` | `main` |
| **Validation logic** | `feat/validation‑engine` | `main` |
| **Auto‑fix scripts** | `feat/auto‑fix‑suite` | `main` |
| **Orchestration (this workflow)** | `wf/batch‑validate‑fix‑rerun` | `main` |

All changes must be isolated to the above branches; PRs must include unit‑test coverage ≥ 90 % for new code.

---

## 2. Contract (YAML)

```yaml
name: batch-validate-fix-rerun
description: >
  Executes a deterministic validation/fix/rerun loop over a configurable
  collection of independent artifacts. Stops when every artifact is either
  PASS or BLOCKED.

inputs:
  artifacts:
    type: list[string]          # URIs or IDs of artifacts to process
    description: List of artifact identifiers
  max_iterations:
    type: integer
    de
...[truncated]

Must

  • Choose dag, supervisor, or pipeline deliberately based on the work shape.
  • Explain the pattern choice in artifact metadata or a rationale.
  • Use dag for validation/fix/rerun loops when dependencies matter.

Must Not

  • Default blindly to dag for every workflow.
  • Collapse independent artifact work into one vague agent task.
  • Omit verification gates because the chosen pattern seems obvious.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.skill-boundary-copy (generation-quality/manual)

User message

Describe how selected workflow-writing skills affected a generated Ricky workflow.

Ricky output

### How Selected Workflow‑Writing Skills Shape a Generated Ricky Workflow  

| Skill | What the Skill Does | Direct Effect on the Generated Workflow | Example of the Resulting Artifact |
|-------|--------------------|----------------------------------------|-----------------------------------|
| **Deterministic Contract Definition** | Forces the author to write explicit input‑output schemas (JSON‑Schema, OpenAPI) for every step. | • Every node has a machine‑readable contract that the engine validates before execution.<br>• Guarantees that downstream steps receive exactly the shape they expect, eliminating “missing‑field” runtime errors. | `contract: { type: "object", required: ["userId"], properties: { userId: { type: "string" } } }` |
| **Verification Gate Insertion** | Inserts pre‑ and post‑condition checks (e.g., `assert`, `validate`) as separate steps. | • Early detection of invalid data (pre‑gate) and confirmation that side‑effects succeeded (post‑gate).<br>• Reduces silent failures and makes logs deterministic. | `step: validateUserExists → assert(user != null)` |
| **Review Artifact Generation** | Auto‑creates a `REVIEW.md` that lists all public interfaces, risk points, and 
...[truncated]

Must

  • Describe skills as generation-time selection, loading, and rendering inputs.
  • Point to metadata such as loaded-skills.txt and skill-application-boundary.json.
  • Say the workflow instructions were informed by selected skills.

Must Not

  • Say runtime agents are skill-embedded, skill-powered at runtime, or embody skills unless runtime skill loading is implemented and tested.
  • Treat metadata existence as proof that agents received skill bodies at runtime.
  • Overstate current tests beyond the generation boundary.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW generation-quality.no-pure-codegen-without-proof (generation-quality/manual)

User message

Generate a workflow from a vague product spec and return it to the user.

Ricky output

**Ricky – Workflow Generation Assistant**

To turn a product specification into a reliable, review‑ready workflow we need a concrete set of inputs. The current description is too vague to produce a deterministic contract, verification gates, or review artifacts.

**Please provide the following details:**

| Category | What we need from you |
|----------|-----------------------|
| **Goal & Scope** | One‑sentence description of the product’s primary function and the boundaries of the workflow (e.g., “process inbound customer support tickets from email to resolution”). |
| **Actors & Roles** | List of human or system actors (e.g., Customer, Support Agent, Ticketing System) and any role‑specific permissions. |
| **Key Steps / Stages** | Ordered list of high‑level steps (e.g., “Ingest email → Classify → Assign → Resolve → Close”). |
| **Inputs & Outputs** | For each step, the required input artifacts (files, API payloads, DB rows) and the expected output artifacts. |
| **Triggers & Events** | What initiates the workflow (cron, webhook, UI action) and any intermediate events that cause transitions. |
| **Success & Failure Criteria** | Deterministic conditions for step success, retry poli
...[truncated]

Must

  • Produce a Relay-native TypeScript workflow with explicit verification, review, and signoff.
  • Validate with dry-run or targeted structural checks where possible.
  • Return artifacts, warnings, and follow-up commands honestly.

Must Not

  • Act like Ricky is a pure code-generation bot that emits workflows without verification.
  • Stop at "code compiles" as the proof bar.
  • Skip skill-aware workflow authoring guidance for serious workflows.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.classify-before-retry (runtime-recovery/manual)

User message

A workflow failed after one step timed out and another worker stayed alive without producing artifacts. Explain what Ricky should do next.

Ricky output

**Ricky’s post‑failure protocol**

1. **Capture the failure state**  
   - Log the timed‑out step ID, start/end timestamps, and the exact timeout threshold that was breached.  
   - Record the worker‑ID that remained alive, its heartbeat timestamps, and the fact that it emitted **no artifacts**.

2. **Mark the step as *failed‑timeout***  
   - Update the workflow execution record (`execution.status = "failed"`).  
   - Set `step.status = "timeout"` and attach the diagnostic payload (logs, metrics, heartbeat dump).

3. **Isolate the stray worker**  
   - Issue a **graceful shutdown** command to the worker (via the orchestration API).  
   - If the worker does not acknowledge within a short grace period (e.g., 30 s), force‑terminate it and flag it as a *zombie* in the resource inventory.

4. **Generate a deterministic verification gate**  
   - Create a **“Post‑Failure Validation Gate”** that runs before any retry or downstream branch:  
     - Verify that **no partial artifacts** exist in the output store.  
     - Confirm that the worker pool is clean (no lingering processes for the failed run).  
     - Ensure the timeout metric is reproducible (i.e., the same step consistently ex
...[truncated]

Must

  • Classify the failure before deciding whether to retry.
  • Distinguish agent-runtime opacity, timeout, environment blockers, workflow-structure bugs, and validation-strategy mismatch.
  • Preserve raw evidence and uncertainty when the class is not known.

Must Not

  • Blindly rerun the whole workflow without a blocker classification.
  • Treat every failure as a broken workflow definition.
  • Claim Ricky fixed the workflow before rerun evidence proves it.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.stale-relay-state (runtime-recovery/manual)

User message

Ricky detects stale .agent-relay/, .relay/, and .trajectories/ state before launching a local workflow.

Ricky output

**Workflow Contract – Stale‑State Detection for Local Launches**  

| **Element** | **Definition** |
|-------------|----------------|
| **Name** | `detect‑stale‑state` |
| **Scope** | Runs **before** any local workflow (`aw run …`) is started on a developer machine. |
| **Inputs** | - Working directory (root of the repo).<br>- Optional flag `--allow‑stale` (default = `false`). |
| **Outputs** | - `stale‑found: boolean` – true if any stale directory is present.<br>- `stale‑paths: string[]` – list of stale directories detected.<br>- `report‑path: string` – path to a generated markdown report (`.agent‑relay/stale‑report.md`). |
| **Exit Codes** | `0` – No stale state (or `--allow‑stale` used).<br>`1` – Stale state detected and launch blocked.<br>`2` – Unexpected error (e.g., permission issue). |
| **Deterministic Verification Gate** | The step must be **pure**: given the same filesystem snapshot it always returns the same `stale‑found`/`stale‑paths`. No network calls, no randomization. |
| **Review Artifacts** | - **Unit tests** covering every combination of present/absent directories (see “Test Matrix”).<br>- **Integration test** that runs `aw run` on a repo with stale state and asse
...[truncated]

Must

  • Classify stale local runtime state as an environment contamination issue.
  • Recommend quarantine or isolated-run guidance before launch.
  • Record the observed paths and the action taken or recommended.

Must Not

  • Treat stale runtime state as a workflow logic failure.
  • Delete or overwrite state without an explicit safe path or user intent.
  • Continue into execution as if the workspace were clean.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.already-running-conflict (runtime-recovery/manual)

User message

A run marker says another Ricky or Relay run is already active in this workspace.

Ricky output

**Fixing “Another Ricky/Relay run is already active in this workspace”**

The message means a *run‑marker* (lock file) is still present from a previous execution.  
Ricky (and Relay) use this marker to prevent overlapping runs, so you must clear it before starting a new workflow.

---

## 1️⃣ Verify the current state

```bash
# Show the lock status
ricky status
# or, for Relay
relay status

If the output contains “run marker present” or “active run detected”, proceed to the next steps.


2️⃣ Identify the lock file

By convention the marker lives in the workspace root:

Tool Lock‑file path
Ricky .ricky/run.lock
Relay .relay/run.lock (or .ricky/run.lock if Relay is invoked via Ricky)

You can confirm its existence:

ls -l .ricky/run.lock   # Ricky
ls -l .relay/run.lock   # Relay (if applicable)

The file typically contains a JSON payload with the PID, start‑time, and workflow ID.


3️⃣ Ensure no real process is still running

Before deleting the marker, make sure the previous run really finished.

# Get the PID stored in the lock file
PID=$(jq 
...[truncated]

Must

  • Report the active marker, run id, or status path when available.
  • Ask the user to inspect, wait for, or explicitly clear the active run.
  • Avoid launching a competing run that could corrupt evidence.

Must Not

  • Silently start another run.
  • Hide the existing run marker from the user.
  • Treat the conflict as a generic failure with no recovery path.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.env-loader-injection-runtime-loadable (runtime-recovery/manual)

User message

A workflow artifact references a MISSING_ENV_VAR value. Ricky's deterministic auto-fix injects the .env.local / .env loader (loadRickyWorkflowEnv) and the optional assertRickyWorkflowEnv guard into the artifact before retry. The artifact may be a master-rendered workflow whose .step({ command: ... }) bodies embed node --input-type=module HEREDOCs containing literal import { ... } from 'node:fs' / from 'node:path' strings.

Ricky output

**Deterministic auto‑fix for missing environment variables**

Below is the concrete, reproducible patch you can apply to any workflow artifact that currently fails because it references a `MISSING_ENV_VAR`. The fix injects the **Ricky environment loader** (`loadRickyWorkflowEnv`) and the **optional guard** (`assertRickyWorkflowEnv`) **once**, at the top of the generated workflow file, and then re‑runs the workflow. The patch works for both plain‑JS steps and for steps that embed a `node --input-type=module` HEREDOC with inline `import` statements (e.g., `import { readFile } from 'node:fs'`).

---

## 1. Patch contract

| Item | Description |
|------|-------------|
| **Target** | Any `.ricky.ts` / `.ricky.js` workflow artifact (master‑rendered or hand‑written) that contains a reference to `process.env.MISSING_ENV_VAR`. |
| **Pre‑condition** | The repository contains a `loadRickyWorkflowEnv` helper (usually in `scripts/rickyEnv.ts` or `src/rickyEnv.ts`). If not, the helper will be created automatically (see step 2). |
| **Post‑condition** | The workflow file begins with a deterministic import of the loader, an optional guard, and a call to load the env. All subsequent step definition
...[truncated]

Must

  • Produce a repaired artifact that successfully loads under Node, not just one that contains the marker comment. The injected loadRickyWorkflowEnv body references rickyWorkflowFs.* and rickyWorkflowPath.*, so the repair must also add the corresponding import * as rickyWorkflowFs from 'node:fs' and import * as rickyWorkflowPath from 'node:path' aliases at module top level.
  • Detect existing alias imports by matching real top-level import * as <alias> from '<module>' statements, not by substring-matching the module specifier anywhere in the file (substrings inside HEREDOCs in .step({ command: ... }) bodies do not count as imports).
  • Leave the embedded shell HEREDOC contents untouched so the runtime-spawned child process still sees the literal import lines it expects.

Must Not

  • Skip adding the rickyWorkflowFs / rickyWorkflowPath aliases because from 'node:fs' or from 'node:path' already appears somewhere in the file as a string literal.
  • Inject loadRickyWorkflowEnv (or assertRickyWorkflowEnv) without the supporting alias imports, which produces a ReferenceError: rickyWorkflowPath is not defined at module load and burns the auto-fix budget on UNSUPPORTED_RUNTIME at runtime-launch.
  • Rewrite or escape the embedded HEREDOC text in step commands.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.auto-fix-bounded-loop (runtime-recovery/manual)

User message

Run a local workflow with auto-fix enabled. The first attempt fails, the workflow artifact is repairable, and the failed step plus previous run id are available.

Ricky output

**Local Workflow Execution with Auto‑Fix – Step‑by‑Step Contract**

Below is a deterministic, self‑contained contract you can copy‑paste into a terminal (bash / PowerShell) to run the workflow, trigger the auto‑fix path, and verify the repaired artifact. It assumes you have the **AgentWorkforce CLI** (`aw`) installed and that the failed step name and the previous run‑ID are known.

---

## 1️⃣ Prerequisites (verify once)

| Check | Command | Expected outcome |
|-------|---------|-------------------|
| CLI installed | `aw --version` | `aw version x.y.z` |
| Workflow repo cloned | `git status` (inside repo) | clean working tree |
| Docker / local runtime available | `docker ps` | daemon running |
| Auto‑fix feature enabled in config | `cat .aw/config.yaml \| grep auto_fix` | `auto_fix: true` |

If any check fails, resolve it before proceeding.

---

## 2️⃣ Input Parameters (replace placeholders)

```bash
FAILED_STEP="<<failed_step_name>>"          # e.g. "build-image"
PREV_RUN_ID="<<previous_run_uuid>>"         # e.g. "run-2024-05-12-01"
WORKFLOW_FILE="workflow.yaml"               # relative to repo root
ARTIFACT_DIR="artifacts/"                   # where the workflow writes its outp
...[truncated]

Must

  • Use a bounded retry budget and summarize every attempt.
  • Ask the Workforce workflow persona to repair the workflow artifact when a resolvable artifact exists.
  • Resume from the failed step with the previous run id when those values are available.

Must Not

  • Edit arbitrary repository source files as the default auto-fix surface.
  • Keep retrying after the configured max attempts.
  • Lose the single Ricky tracking run id across repair/resume attempts.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.no-auto-fix-preserves-single-attempt (runtime-recovery/manual)

User message

A user runs ricky run workflows/foo.ts --no-auto-fix and the workflow fails.

Ricky output

**Quick‑Start Debug Checklist for a Failed `ricky run workflows/foo.ts --no-auto-fix`**

| Step | Action | Expected Outcome | How to Verify |
|------|--------|------------------|---------------|
| 1️⃣ | **Collect the run log** | A full, timestamped JSON log of the execution | `cat ricky-run.log` (or the path printed at the end of the run) |
| 2️⃣ | **Locate the failure node** | Identify the exact workflow step (`nodeId`) that threw the error | Search the log for `"status":"failed"`; note `nodeId`, `errorMessage`, `stackTrace` |
| 3️⃣ | **Run in debug mode** | Re‑execute the workflow with verbose tracing | `ricky run workflows/foo.ts --debug --no-auto-fix` |
| 4️⃣ | **Validate the contract** | Ensure the workflow’s input‑output contract matches the declared `WorkflowContract` | `ricky contract validate workflows/foo.ts` – should return `PASS` |
| 5️⃣ | **Check deterministic verification gates** | Confirm that all `verify()` functions in the workflow return `true` before the failing node | Insert a temporary `console.log('Gate check', verify());` or run `ricky gate test workflows/foo.ts` |
| 6️⃣ | **Run unit tests for the failing node** | Isolate the node’s logic in a Jest/Mocha test
...[truncated]

Must

  • Preserve one-attempt behavior when auto-fix is disabled.
  • Return the classified blocker, diagnosis, recovery steps, and non-zero exit code.
  • Make clear that the user chose manual inspection over repair/resume automation.

Must Not

  • Start a repair loop despite --no-auto-fix.
  • Suppress the diagnosis because no repair was attempted.
  • Present the failure as a completed repair attempt.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.in-process-local-runner (runtime-recovery/manual)

User message

Explain how Ricky should execute a local TypeScript workflow artifact in the primary local path.

Ricky output

**Ricky – Local Execution of a TypeScript Workflow Artifact**  

Below is the step‑by‑step contract Ricky should follow to run a TypeScript‑based workflow that lives in the *primary local path* of the repository (by convention `./workflows`). The contract is written in the same style as the other workflow‑execution contracts in the `AGENTS.md` repository and includes deterministic verification gates, required review artifacts, and a clear blocker‑reporting policy.

---

## 1️⃣ Contract Overview  

| Item | Description |
|------|-------------|
| **Artifact** | `./workflows/<workflow‑name>.ts` (any valid TypeScript workflow file) |
| **Runtime** | Node ≥ 18, `ts-node` (or compiled output) |
| **Dependencies** | All `package.json` deps in the repo root; workflow‑specific dev deps may be declared in `./workflows/<workflow‑name>.deps.json` |
| **Entry Point** | Exported async function `run(context: WorkflowContext): Promise<void>` |
| **Output** | Console logs + optional JSON result written to `./workflows/.out/<workflow‑name>.json` |
| **Verification Gates** | 1. Type‑check passes 2. Lint passes 3. Unit‑test suite passes 4. Execution returns exit‑code 0 |
| **Review Artifacts** | - Upd
...[truncated]

Must

  • Prefer the Node strip-types route or equivalent SDK/programmatic route over requiring the agent-relay binary on PATH.
  • Precheck that Node and @agent-relay/sdk are resolvable for the workflow.
  • Record the actual spawn command in execution evidence.

Must Not

  • Fail solely because agent-relay is not on PATH when the SDK route is available.
  • Hide the actual runtime command from evidence.
  • Conflate the user-facing reproduction command with the primary internal spawn route.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.escalation-is-not-generic-failure (runtime-recovery/manual)

User message

Ricky reaches a boundary after a structural failure persists after a fix attempt.

Ricky output

**Ricky’s Boundary‑Handling Workflow for Persistent Structural Failures**

When a structural failure remains after an initial remediation attempt, Ricky follows a deterministic, contract‑driven process that isolates the problem, escalates it as a blocker, and creates a scoped remediation branch. The steps below are the exact contract that any `workflow.yaml` or `repair.py` implementation must obey.

---

## 1. Failure Detection & Initial Fix Attempt  

| Contract | Detail |
|----------|--------|
| **Input** | `failure_id`, `artifact_path`, `initial_fix_commit` |
| **Gate** | `verify_failure_present(artifact_path)` – must return **true** before proceeding. |
| **Action** | Apply the **initial fix** (e.g., patch, config change) and commit as `initial_fix_commit`. |
| **Verification** | Run `run_structural_checks(artifact_path)`; must return **PASS** to exit early. |

*If the verification fails, the workflow proceeds to the **Boundary Handling** section.*

---

## 2. Boundary Trigger  

```yaml
# workflow.yaml snippet
- name: structural-failure-persist
  when: failure_still_present == true
  steps:
    - call: ricky.boundary.handle_persistent_failure

The when clause guarantees
...[truncated]


**Must**

- Escalate with the attempted fix, failed validation, classified blocker, and recommendation.
- Distinguish escalation from a generic product failure.
- Preserve enough context for a human operator to continue.

**Must Not**

- Retry speculative fixes indefinitely.
- Collapse to "something went wrong" without the attempted actions.
- Discard evidence from failed repair attempts.

**Deterministic checks**

- PASS `status`: case executed
- PASS `maxToolCalls`: expected <= 0 tool calls, got 0

</details>

<details>
<summary><strong>REVIEW</strong> <code>runtime-recovery.analytics-from-structured-evidence</code> (runtime-recovery/manual)</summary>

**User message**

> Produce a workflow health digest from many Ricky workflow runs.

**Ricky output**

```text
**Ricky Workflow Health Digest – Compiled from Recent Runs**  
*(Generated ⟨YYYY‑MM‑DD⟩ – All timestamps are UTC)*  

---

## 1️⃣ Executive Summary
| Metric | Value | Status |
|--------|-------|--------|
| Total runs (last 24 h) | **312** | ✅ |
| Successful runs | **274** | ✅ |
| Failed runs | **38** | ⚠️ |
| Runs with **hard blockers** (non‑recoverable) | **7** | ❌ |
| Avg. latency (submission → completion) | **2.8 s** | ✅ |
| Avg. latency (failed runs) | **3.4 s** | ✅ |
| Regression alerts (≥ 5 % drop vs prior window) | **0** | ✅ |

Overall health: **Good** – > 85 % success, latency within SLA, no systemic regressions. The 7 hard‑blocker runs require immediate attention (see § 4).

---

## 2️⃣ Run‑Level Health Indicators  

| Run ID | Workflow | Start → End | Outcome | Gate(s) Passed | Blocker(s) |
|--------|----------|-------------|---------|----------------|------------|
| `wf‑20240518‑00123` | **Deploy‑Service** | 00:12 → 00:15 | ✅ Success | **Contract‑V1**, **Smoke‑V2** | – |
| `wf‑20240518‑00456` | **Sync‑Catalog** | 01:03 → 01:07 | ❌ Failure | **Contract‑V1** (passed) | **Deterministic‑Gate‑MissingArtifact** |
| `wf‑20240518‑00987` | **Publish‑Report** | 02:45 → 02:48 | ✅ S
...[truncated]

Must

  • Consume normalized WorkflowRunEvidence rather than raw logs as the primary input.
  • Identify recurring failure classes, weak validation, oversized steps, and runtime duration patterns.
  • Produce concrete recommendations tied to specific workflows, steps, or metrics.

Must Not

  • Mutate evidence while analyzing it.
  • Return generic advice like "improve your workflow" without references.
  • Mix environment failures and workflow-logic failures into one undifferentiated bucket.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW runtime-recovery.preserve-pr-shipping-during-repair (runtime-recovery/manual)

User message

A persona-authored workflow that imports @agent-relay/github-primitive and uses createGitHubStep to open a PR fails its runtime-precheck. The auto-fix repair persona is invoked. Describe the contract the repaired artifact must satisfy.

Ricky output

**Repaired Workflow Contract**

| **Section** | **Requirement** | **Rationale / Verification** |
|-------------|----------------|------------------------------|
| **1. Imports** | The workflow **must** import the exact package name `@agent‑relay/github-primitive` and **only** the symbols it needs (e.g., `createGitHubStep`). | The runtime‑precheck validates that every import resolves to a published module and that no unused imports remain (tree‑shaking safety). |
| **2. Exported Symbol** | The file must export a **single** default async function named `run` (or the name declared in the workflow manifest). | The orchestrator loads the workflow via `module.exports = run`. The precheck confirms the exported entry point matches the manifest. |
| **3. Step Construction** | Inside `run`, the workflow **must** call `createGitHubStep` **exactly once** to produce a step object that: <br>• Has a `name` field of type `string` (non‑empty). <br>• Has an `action` field set to `"pull_request.create"`. <br>• Contains a `payload` object with the required keys `base`, `head`, `title`, and `body`. <br>• Returns a **Promise** that resolves to a `GitHubStepResult` (or throws a typed error). | The preche
...[truncated]

Must

  • Preserve the @agent-relay/github-primitive import, the GitHubStepExecutor reference, and every createGitHubStep(...) invocation that the original artifact declared.
  • Keep the failing workflow runnable from the same path with the same workflow(...) builder and .run({ cwd: process.cwd() }) invocation.
  • Retain at least ceil(N / 2) of the original workflow's .step(...) calls when the original declared four or more steps; a repair that collapses to a 2-3 step placeholder is a regression, not a fix.
  • Reject (do not apply) any repair output whose step list reduces to prepare-context / runtime-precheck: true / final-signoff: echo placeholder; surface the regression diagnostic instead.

Must Not

  • Strip createGitHubStep, GitHubStepExecutor, or @agent-relay/github-primitive because the runtime-precheck failure mentioned PR-shipping or git side effects. The repair contract's "no commit / no push" constraint applies to the REPAIR AGENT's runtime behavior, not to the workflow's step declarations.
  • Emit a "minimal repair-safe master" or "simplified Ricky master" scaffold that passes the builder validator while doing none of the original work.
  • Treat "the workflow now builds and runs" as success when the work it was supposed to ship is gone.

Deterministic checks

  • PASS status: case executed
  • PASS maxToolCalls: expected <= 0 tool calls, got 0
REVIEW surfaces-ingress.co-equal-surfaces (surfaces-ingress/manual)

User message

Design a new Ricky Slack surface for workflow debugging.

Ricky output

## Ricky Slack Surface – “Workflow Debugger”

### 1. Overview
The **Ricky Slack Debugger** is a dedicated Slack app surface that lets engineers and product owners inspect, step‑through, and repair Ricky‑managed workflows in real time. It follows the **Ricky repository conventions** (AGENTS.md) and the **workflow standards** for deterministic verification, review artifacts, and bounded validation loops.

| Component | Purpose | Primary Slack UI Element |
|-----------|---------|--------------------------|
| **Debug Session Launcher** | Starts a scoped debugging session for a specific workflow version. | `/ricky debug <workflow‑id> [--branch <branch>]` |
| **Live Execution Trace** | Streams step‑by‑step events, state diffs, and gate outcomes. | Threaded message with expandable blocks. |
| **Gate‑Control Panel** | Allows pausing, forcing pass/fail, or injecting mock data at any verification gate. | Interactive buttons & select menus. |
| **Patch‑Suggestion Builder** | Generates a minimal PR (branch, commit, PR template) that fixes the observed failure. | Modal form pre‑filled with diff and PR metadata. |
| **Blocker Reporter** | Auto‑creates a Jira ticket (or GitHub iss

---

_Comment truncated to stay within GitHub limits. Download the `ricky-eval-run` artifact for the full `human-review.md`._

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 1901-1922: The current loop marks hasWorktreeAdd true for any git
worktree add based on hasGitWorktreeAddCommand/substring matches, allowing
unrelated adds to satisfy setup; change the logic to require that the same
executable step both is a parsed "git worktree add" and includes the
declaredWorktree.path and declaredWorktree.branch exactly (use a new or existing
addsDeclaredWorktree(step, declaredWorktree) that compares parsed args by token
equality rather than includes()). Use addsDeclaredWorktree to set hasWorktreeAdd
and to guard clearing firstImplementationUseBeforeSetup (replace uses of
stepHasWorktreeAdd / commandMentionsExecutableToken in that decision), while
still using commandMentionsExecutableToken only for detection of mentions when
appropriate; update the loop around hasWorktreeAdd,
firstImplementationUseBeforeSetup, and the condition that detects
implementation-before-setup to rely on addsDeclaredWorktree and exact token
equality.
- Around line 1877-2122: The current validators use brittle regex/heuristics in
extractDeclaredWorktree, extractTestFilePaths, stripShellLineComment,
extractHeredocDelimiters, commandMentionsExecutableToken, shellExecutableText
and shellWords; replace them with parser-based analysis: parse Markdown with
mdast-util-from-markdown (walk AST to locate code fences/inline code and YAML
frontmatter) to reliably extract declared worktree and branch, and replace shell
tokenization/comment stripping with a real shell AST/tokenizer (e.g., a bash/sh
parser) to extract -f tests, heredocs, and executable words; update
analyzeDeclaredWorktreeUsage and findInvalidFileExistenceGateCommands to consume
the AST-derived tokens/paths instead of regex matches so all checks use the
parser outputs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 1768de31-2605-4eda-a65f-243977ea3af7

📥 Commits

Reviewing files that changed from the base of the PR and between 1786524 and 5edbf46.

📒 Files selected for processing (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts

Comment thread src/product/generation/workforce-persona-writer.ts
Comment thread src/product/generation/workforce-persona-writer.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/product/generation/workforce-persona-writer.ts (1)

1879-2214: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

This validator still relies on regex/comment-stripping shell heuristics.

The new worktree and test -f enforcement is still built on labelField, stripShellLineComment, extractHeredocDelimiters, shellCommandSegments, and related substring/regex logic. That keeps a load-bearing policy check on brittle parsing rules and is still out of line with the repo rule for Ricky source inspection.

As per coding guidelines, "Use parser-based approaches (AST walk with typescript module, mdast-util-from-markdown) instead of regex or substring matching when inspecting TypeScript, JavaScript, Markdown, JSON, or shell artifacts in Ricky source code".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/product/generation/workforce-persona-writer.ts` around lines 1879 - 2214,
The validator currently relies on brittle regex/substring heuristics (e.g.,
stripShellLineComment, extractHeredocDelimiters, shellCommandSegments,
shellWords, shellExecutableLines, commandMentionsExecutableToken,
extractTestFilePaths, isDirectoryLookingPath and
labelField/stripMarkdownListMarker) to parse shell and markdown, which violates
the parser-first guideline; replace these heuristics with proper AST-based
parsing: use mdast-util-from-markdown + an mdast walk to extract labeled fields
(replace labelField/markdownTextLines/markdownNodeText logic) and use a robust
shell AST parser (e.g., bash-parser or shell-parse) to produce command ASTs and
extract worktree add args, test -f targets, heredocs and tokens (replacing
shellExecutableText/shellExecutableLines/stripShellLineComment/extractHeredocDelimiters/shellCommandSegments/shellWords/gitWorktreeAddArgv/extractTestFilePaths/commandMentionsExecutableToken),
then update addsDeclaredWorktree, findInvalidFileExistenceGateCommands, and
analyzeDeclaredWorktreeUsage to consume the parsed ASTs instead of string
matching so the checks are resilient and parser-driven.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 1914-1919: The current pre-worktree check gates using
isImplementationOrTestStep (the step label) which lets non-git commands sneak
through; instead, change the condition to inspect the parsed command for git
implementation/test intent (e.g., replace isImplementationOrTestStep with a
predicate that checks the parsed command/tool and command-type like
isGitImplementationOrTestCommand or parsedCommand.tool === 'git' &&
commandIsImplementationOrTest(parsedCommand)), keeping the existing flags
hasWorktreeAdd, stepAddsDeclaredWorktree, stepMentionsPath, and
stepMentionsBranch; apply the same change to the other identical block
referenced (around the second occurrence) so both use parsed command behavior
rather than the human-authored step name.
- Around line 1976-1977: The current check at the branch using
unquotedPath.endsWith('/') || isDirectoryLookingPath(normalizedPath) treats any
multi-segment path whose basename has no '.' as a directory and misflags
extensionless files (e.g., Dockerfile, Makefile). Update the heuristic so that
either isDirectoryLookingPath(normalizedPath) is tightened or the condition is
altered to exclude common extensionless filenames: compute basename (from
normalizedPath or unquotedPath) and only treat it as "directory-looking" when it
ends with '/' or the basename has no '.' AND is not in a small allowlist of
known extensionless filenames (e.g.,
"Dockerfile","Makefile","README","Procfile"). Keep the invalid.push usage (the
step.stepName message) but only trigger it after applying this refined check.

---

Duplicate comments:
In `@src/product/generation/workforce-persona-writer.ts`:
- Around line 1879-2214: The validator currently relies on brittle
regex/substring heuristics (e.g., stripShellLineComment,
extractHeredocDelimiters, shellCommandSegments, shellWords,
shellExecutableLines, commandMentionsExecutableToken, extractTestFilePaths,
isDirectoryLookingPath and labelField/stripMarkdownListMarker) to parse shell
and markdown, which violates the parser-first guideline; replace these
heuristics with proper AST-based parsing: use mdast-util-from-markdown + an
mdast walk to extract labeled fields (replace
labelField/markdownTextLines/markdownNodeText logic) and use a robust shell AST
parser (e.g., bash-parser or shell-parse) to produce command ASTs and extract
worktree add args, test -f targets, heredocs and tokens (replacing
shellExecutableText/shellExecutableLines/stripShellLineComment/extractHeredocDelimiters/shellCommandSegments/shellWords/gitWorktreeAddArgv/extractTestFilePaths/commandMentionsExecutableToken),
then update addsDeclaredWorktree, findInvalidFileExistenceGateCommands, and
analyzeDeclaredWorktreeUsage to consume the parsed ASTs instead of string
matching so the checks are resilient and parser-driven.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: dcb554e1-7a4f-43e9-94e7-a236a3664a37

📥 Commits

Reviewing files that changed from the base of the PR and between 5edbf46 and a447758.

📒 Files selected for processing (2)
  • src/product/generation/workforce-persona-writer.test.ts
  • src/product/generation/workforce-persona-writer.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/product/generation/workforce-persona-writer.test.ts

Comment thread src/product/generation/workforce-persona-writer.ts
Comment thread src/product/generation/workforce-persona-writer.ts
@khaliqgant khaliqgant merged commit ff833db into main May 18, 2026
4 checks passed
@khaliqgant khaliqgant deleted the fix/generated-workflow-worktree-gates branch May 18, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants